Kernel bug surfaces again.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel bug surfaces again.

Alec Leamas
Hi out there!

Some bad news. Our old friend "The kernel bug" has surfaced again.
Basically, it makes lircd unoperable. Preliminary findings is that the
bug is present in kernel 4.2.8 and 4.1.13 (the latter the Debian jessie
kernel).

I have filed a bug at [1] and is now trying to bisect the kernel to find
out when this happened.

Instructions below how to check if your kernel is affected.

Cheers!

--alec

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1260862

---------------------------------------------------------

To test if your kernel is affected:

- Connect a capture device supported by the kernel working in LIRC_MODE2
(i. e., a device which provides timing data).

- Check that there is  a /sys/class/rc device and that it works in mode
'lirc:

     # echo 'lirc' > /sys/class/rc/rc0/protocols

- Start mode2, sending output to a file with something like:

     # mode2 --driver default --device /dev/lirc0 > foo.log

- Push buttons on the remote with long delays between each press (> second).

- After some keypresses kill mode2 and fire up your editor with the
foo.log file. Look for things like this:

space 16777215
space 554825
pulse 2750


The culprit is the two consecutive spaces. They only happen after a long
space like above, so they are easy to spot.

If there is this space-space sequence there is a bug. Otherwise, the
kernel works.

DS

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Reply | Threaded
Open this post in threaded view
|

Re: Kernel bug surfaces again.

Bengt Martensson-2
On 01/14/16 16:57, Alec Leamas wrote:

> Hi out there!
>
> Some bad news. Our old friend "The kernel bug" has surfaced again.
> Basically, it makes lircd unoperable. Preliminary findings is that the
> bug is present in kernel 4.2.8 and 4.1.13 (the latter the Debian jessie
> kernel).
>...
>
> space 16777215
> space 554825
> pulse 2750
>
>
> The culprit is the two consecutive spaces.

Annoying. But even if the bug is found and fixed today (and the culprit
tarred and feathered ;.)) the broken kernels will stay around for,
probably, years. So it might be an idea to try to fix (some of) the
programs reading "mode2" to cope with the problem -- regardless of the
fixing of the kernel bug. I do not know exactly how much needs to be
fixed, but I imagine that it cannot be that much or that hard. Formally,
the semantics of the "mode2" is changed so that consecutive
spaces(pulses) are considered as one space(pulse) with the duration of
the sum of the individual spaces(pulses).

I just fixed the Mode2Importer in IrScrutinizer,  7 added lines 4 deleted...

Just a thought...

Bengt

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Reply | Threaded
Open this post in threaded view
|

Re: Kernel bug surfaces again.

Alec Leamas
On 16/01/16 11:14, Bengt Martensson wrote:

> On 01/14/16 16:57, Alec Leamas wrote:
>> Hi out there!
>>
>> Some bad news. Our old friend "The kernel bug" has surfaced again.
>> Basically, it makes lircd unoperable. Preliminary findings is that the
>> bug is present in kernel 4.2.8 and 4.1.13 (the latter the Debian jessie
>> kernel).
>> ...
>>
>> space 16777215
>> space 554825
>> pulse 2750
>>
>>
>> The culprit is the two consecutive spaces.
>
> Annoying. But even if the bug is found and fixed today (and the culprit
> tarred and feathered ;.)) the broken kernels will stay around for,
> probably, years. So it might be an idea to try to fix (some of) the
> programs reading "mode2" to cope with the problem -- regardless of the
> fixing of the kernel bug.

Indeed, Besides short of time....


Cheers!

--alec


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Reply | Threaded
Open this post in threaded view
|

Re: Kernel bug surfaces again.

drothlis
In reply to this post by Bengt Martensson-2
> Formally,
> the semantics of the "mode2" is changed so that consecutive
> spaces(pulses) are considered as one space(pulse) with the duration of
> the sum of the individual spaces(pulses).

I experimented piping the output of `mode2` into `ts %Y-%m-%dT%H:%M%.S`,
and I get output like this:

    2016-01-26T17:30:56.385552 pulse 227
    2016-01-26T17:30:56.533884 space 16777215
    2016-01-26T17:30:56.686400 space 303744
    2016-01-26T17:30:56.686472 pulse 1089

Note that the time elapsed between the first pulse and the second space
is 300848, i.e. basically the value reported by the second space. So
rather than adding the 2 spaces together, it would seem more appropriate
to ignore the value 16777215 (which is 2^24 - 1).

These measurements are consistent -- whenever I get 2 consecutive spaces
it's always exactly the value 16777215 followed by a space with a number
that matches empirical measurements. Note that 16777215 is 16
*seconds* and it is clearly wrong.

This is with kernel 4.2.0-25-generic on Ubuntu 15.10.

Dave.
Reply | Threaded
Open this post in threaded view
|

Re: Kernel bug surfaces again.

Alec Leamas
On 26/01/16 19:46, drothlis wrote:

>> Formally,
>> the semantics of the "mode2" is changed so that consecutive
>> spaces(pulses) are considered as one space(pulse) with the duration of
>> the sum of the individual spaces(pulses).
>
> I experimented piping the output of `mode2` into `ts %Y-%m-%dT%H:%M%.S`,
> and I get output like this:
>
>      2016-01-26T17:30:56.385552 pulse 227
>      2016-01-26T17:30:56.533884 space 16777215
>      2016-01-26T17:30:56.686400 space 303744
>      2016-01-26T17:30:56.686472 pulse 1089
>
> Note that the time elapsed between the first pulse and the second space
> is 300848, i.e. basically the value reported by the second space. So
> rather than adding the 2 spaces together, it would seem more appropriate
> to ignore the value 16777215 (which is 2^24 - 1).

These results follows the failing patch [1] which indeed inserts an
extra (-1 & LIRC_VALUE_MASK) i. e., 16777215.

You are basically right that it would be better to discard the 16777215
space. However, this involves a one token look-ahead which I don't see
any easy way to implement cleanly. The attempted fix is at [2] which at
least makes irrecord and the regression tests work again.  This just
drops the last space which actually is the proper one, but the parser
seems to accept the (to) long sync pulse as a valid.

I'm not completely happy with this patch, but it's what we have up to
now.  Alternative solutions welcome!


Cheers!

--alec

[1] https://bugzilla.redhat.com/attachment.cgi?id=1115061&action=diff
[2] https://sourceforge.net/p/lirc/git/ci/f41013ced3d4


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140