[Bug 579932] New: audio and video playback hangs after kernel update
http://bugzilla.novell.com/show_bug.cgi?id=579932 http://bugzilla.novell.com/show_bug.cgi?id=579932#c0 Summary: audio and video playback hangs after kernel update Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: x86-64 OS/Version: openSUSE 11.2 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: estellnb@gmail.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; de; rv:1.9.1.6) Gecko/20091201 SUSE/3.5.6-1.1.1 Firefox/3.5.6 After updating to kernel 2.6.33-rc7-2.99.14.0943949-desktop the audio and video playbck hangs often for quite a while although the CPU usage is very low. Reproducible: Always -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c1
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c2
--- Comment #2 from Elmar Stellnberger
echo w > /proc/sysrq-trigger; dmesg >dmesg.X
-- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c3
--- Comment #3 from Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c4
--- Comment #4 from Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c5
Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c6
Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c7
--- Comment #7 from Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c8
--- Comment #8 from Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c9
--- Comment #9 from Jiri Slaby
sh.. wrong place for Comment 7. Why does bugzilla always switch the current bug after a post?! This is a real annoyance!!
Turn it off in your bugzilla preferences... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c10
Jiri Slaby
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c11
--- Comment #11 from Jiri Slaby
After updating to kernel 2.6.33-rc7-2.99.14.0943949-desktop the audio and video playbck hangs often for quite a while although the CPU usage is very low.
BTW updating from what version? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c12
Elmar Stellnberger
http://bugzilla.novell.com/show_bug.cgi?id=579932
http://bugzilla.novell.com/show_bug.cgi?id=579932#c13
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c14
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c15
--- Comment #15 from Andreas Nordal
From kernel.org, I downloaded and compiled kernels 2.6.32.20 (which lacks Nouveau) and 2.6.35.3 without Nouveau. In both cases, the desktop activities noted above worked as they should (without needing to move the mouse like a maniac or execute ´yes´ in the background). However, the hang is still
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c16
--- Comment #16 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c17
--- Comment #17 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c18
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c19
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c20
--- Comment #20 from Jiri Slaby
Good news! Quick testing of 2.6.35.3: - clocksource=jiffies _completely_ did the trick!!! - clocksource=nolapic_timer did not, although the hangs were less frequent.
nolapic_timer is a standalone parameter. Try without "clocksource=". What is you clocksource anyway (dmesg|grep clock should tell)? If it is tsc, you may also try clocksource=hpet to have high precision timers.
Specifically, with clocksource=jiffies: .. - On the downside, rt-benchmark was no longer schedulable if intervals were shorter than about 0.001 seconds.
Which is expected when HZ is set to 1000 HZ (jiffies are updated in this frequency). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c21
--- Comment #21 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c22
--- Comment #22 from Jiri Slaby
The unaffected 2.6.32.20 is of course impossible to hang, yet Powertop works flawless.
Actually what clock source is user there? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c23
--- Comment #23 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c24
--- Comment #24 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c25
--- Comment #25 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c26
--- Comment #26 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c27
--- Comment #27 from Jiri Slaby
Well, there is an upstream thread which may explain this: http://lkml.org/lkml/2010/9/9/426
Actually not quite. The patch is not in 2.6.34, it's only in 2.6.35. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c28
--- Comment #28 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c29
--- Comment #29 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c30
--- Comment #30 from Andreas Nordal
Well, there is an upstream thread which may explain this: http://lkml.org/lkml/2010/9/9/426
The symptoms are different. That report is about increasingly frequent hangs, in which user input is delayed. Here, we are talking about poisson distributed hangs which resume immediately on user input. Where is the patch? I want to try it anyway; if they have fixed hpet, it might accidentally have solved this problem too... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c31
--- Comment #31 from Andreas Nordal
The patch is not in 2.6.34, it's only in 2.6.35. 2.6.35.7, which was released yesterday, is still broken.
@Jiri : Are kernel developers informed, or will they be? Sorry for my stupid question. As the changelog of 2.6.34.7 reveals, you must be a kernel developer.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c32
Jiri Slaby
The patch is not in 2.6.34, it's only in 2.6.35. 2.6.35.7, which was released yesterday, is still broken.
Could you try kernel-vanilla 2.6.36-rc kernel: http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.3/ if the recent hpet fixes help you? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c33
--- Comment #33 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c34
--- Comment #34 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c35
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c36
--- Comment #36 from Andreas Nordal
I will try 2.6.36-rc6 from kernel.org later. No luck. Same shit.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c37
--- Comment #37 from Jiri Slaby
I will try 2.6.36-rc6 from kernel.org later. No luck. Same shit.
Ok. I raised the problem upstream and they need: Can we get dmesg, output of proc/timer_list and output of /proc/acpi/processor/CPU0/power for a working and a non working kernel please ? I think a working kernel was 2.6.32, wasn't it? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c38
--- Comment #38 from Andreas Nordal
I think a working kernel was 2.6.32, wasn't it? I have some news. Turns out it wasn't bulletproof that either, just less fragile.
There is another kernel that seems to be immune against rt-benchmark, which is the one on the Gentoo live-CD (2.6.34-gentoo-r6). I need to check that it isn't using any of the cheatcodes we have found. Then I will provide. The news: Between 2.6.32 and 2.6.33-rc7, a series of events have made the broken clocksource more evident. The following is a log of my bisection (starting with good=v2.6.32 and bad=v2.6.33-rc7). Notice, as we go back in time, the vulnerabilites in different areas vanishes (it gets increasingly harder to hang the kernel): steps| resol- | reason left | ution | -----+--------+------------ 13 | bad | very noticeable audio hangs 12 | bad | very noticeable audio hangs 11 | bad | hang during boot. No audio playback hang reproducible. 10 | bad | hang using wget, hang during shutdown. 9 | bad | hang using wget, hang during shutdown. 8 | bad | hang during boot. 7 | bad | hang during boot. 6 | bad | hang during boot (before HALd). 5 | bad | hang during boot (before HALd). Hang during shutdown. 4 | bad | idle hangs (59s, 50s, 91s). No hang using wget reproducible. | | Hang during shutdown. 3 | bad | hang 229s under heavy rt-benchmarking. No hang for 30min | | idling. No boot/shutdown hang reproducible. 2 |untested| See where this is going? Went testing 2.6.32.20 instead. Testing 2.6.32.20 again (more patience this time), I got it to hang! Here is how: 1) As root do `rt-benchmark d0.05` 2) As user do ´rt-benchmark t0.0001´, preferrably the new version. 3) Wait 5 minutes (while the insane interval of 0.0001s hangs 2.6.35 in an instant, this kernel is hard to hang) 4) The first instance of rt-benchmark have probably hanged a few hundred seconds by now. Interrupt the computer. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c39
--- Comment #39 from Jiri Slaby
3 | bad | hang 229s under heavy rt-benchmarking. No hang for 30min | | idling. No boot/shutdown hang reproducible.
I think rt-benchmark could be a different issue. I would mark this as good. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c40
--- Comment #40 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c41
Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c42
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c43
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c44
--- Comment #44 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c45
--- Comment #45 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c46
--- Comment #46 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c47
--- Comment #47 from Andreas Nordal
It seems that with "nolapic_timer" your processor always runs in C0 state. It's not clear to me whether powertop agrees or not. With "nolapic_timer":
PowerTOP version 1.13 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) ( 0,4%) polling 7,7ms (99,6%) C1 mwait 0,0ms ( 0,0%) C2 mwait 0,0ms ( 0,0%) C6 mwait 0,0ms ( 0,0%) Wakeups-from-idle per second : 129,2 interval: 10,0s no ACPI power usage estimate available ----- According to kio sysinfo, my 2.1 GHz processor clocks itself down to the idling frequency of 800 MHz, just as without "nolapic_timer". I asked Multicom today on email how to update BIOS. Sorry for not remembering correctly, but my laptop is a Multicom Compal JFL92+. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c48
--- Comment #48 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c49
--- Comment #49 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c50
--- Comment #50 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c51
--- Comment #51 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c52
--- Comment #52 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c53
--- Comment #53 from Jiri Slaby
So, my advise for you: -inform the Kernel Team http://bugzilla.kernel.org/
See c#46.
-use any distro with a kernel older than 2.6.20 until the bug will be fixed.
I don't think you need to go even back then. You can just use the kernel from some older distro.
I'm almost sure that "OpenSUSE 10.2" will work fine on your laptop...
That you won't find anywhere. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c54
--- Comment #54 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c55
--- Comment #55 from Nik Swiridow
Which is the last "good" kernel then? I remember babysitting Opensuse 10.2 during boot/shutdown. That was not a
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c56
--- Comment #56 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c57
--- Comment #57 from Andreas Nordal
Which is the last "good" kernel then? I found a live-CD with Kubuntu 9.10 and kernel "2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64". Not good.
During rt-benchmarking and webbrowsing, this old kernel hangs as easily as with Opensuse 11.3 (yes, that top priority task was not scheduled for 196 seconds before my patience was up). However, it refuses to hang while playing audio (which is quite the opposite of os11.3). If anything is clear, it must be that the audio susceptibility was introduced later. Specifically between 2.6.32 and 2.6.33-rc7 (as originally found by Elmar and verified by me). If that is what we are after, I should be able to bisect it. I am willing to believe that all kernels using the necessary hpet, nohz, and whatnot, are "bad" on computers like mine. It's just that some kernels hang willingly (possibly eternally) like 2.6.31-14 and 2.6.35.7, and some will only hang under heavy rt-benchmarking and wake up by itself (typically within a second), like 2.6.32.20 and the kernels of Fedora 14 and Kubuntu 10.10 (which is what I have now). If in newer kernels, the hangs are short, infrequent, maybe even tolerable for most users, maybe old kernels were like that also... We need to test some ancient live-CDs. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c58
--- Comment #58 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c59
Andreas Nordal
There are many other options that you can try: "noapic", "hpet=disable", "nohz=off", "nolapic".
I have tried these using the 2.6.35-25-generic kernel in Kubuntu 10.10. It really felt like "noapic" worked: * No audio susceptibility * No susceptibility to heavy rt-benchmarking But it didn't: * rt-benchmark running as root was not scheduled for 977 ms when the machine was idling. This is not supposed to happen, right? * Could these messages (from the bottom of dmesg) be indicative of this bug, or is it just me?: [ 133.022562] CE: hpet increased min_delta_ns to 7500 nsec [ 345.032576] CE: hpet increased min_delta_ns to 11250 nsec [ 3988.069868] CE: hpet increased min_delta_ns to 16875 nsec [ 4218.910129] CE: hpet increased min_delta_ns to 25312 nsec [ 7493.620160] CE: hpet increased min_delta_ns to 37968 nsec [14360.431022] CE: hpet increased min_delta_ns to 56952 nsec "hpet=disable" did not work: * frequent hangs shorter than 0.5s (none above 1s, but did not test very long, did not test idling) * susceptible to rt-benchmark and audio playback nohz=off: same as hpet=disable "nolapic" works, meaning that reproducing an idle hang is at least beyond my patience (sorry for not being more scientific). Nothing going on in dmesg either. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c60
--- Comment #60 from Nik Swiridow
* Could these messages (from the bottom of dmesg) be indicative of this bug, or is it just me?: [ 133.022562] CE: hpet increased min_delta_ns to 7500 nsec
It means that the hpet timer is unstable. With "nolapic" you will probably have only one CPU. Another possible kernel option: "nmi_watchdog=1" "nmi_watchdog=2" And I still wonder if there are any good working old kernels? "OS 10.2" Live-DVD is available: http://ftp.hosteurope.de/mirror/ftp.opensuse.org/discontinued/10.2/iso/dvd/o... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c61
--- Comment #61 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c62
--- Comment #62 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c63
--- Comment #63 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932 https://bugzilla.novell.com/show_bug.cgi?id=579932#c64 --- Comment #64 from Andreas Nordal2011-03-19 18:56:57 UTC --- Terminal session with 2.6.18.2: cat /proc/interrupts CPU0 CPU1 0: 111042 110534 IO-APIC-edge timer 1: 202 216 IO-APIC-edge i8042 8: 21 17 IO-APIC-edge rtc 9: 224 220 IO-APIC-level acpi 12: 60 65 IO-APIC-edge i8042 14: 3282 3273 IO-APIC-edge ide0 74: 1769 1744 IO-APIC-level ehci_hcd:usb2, uhci_hcd:usb5, sdhci:slot0 82: 9 13 IO-APIC-level uhci_hcd:usb4 90: 1 2 IO-APIC-level ohci1394 98: 108 104 IO-APIC-level HDA Intel 106: 3573 0 PCI-MSI eth0 169: 0 0 IO-APIC-level uhci_hcd:usb3 185: 7247 7290 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb7 193: 89 81 IO-APIC-level uhci_hcd:usb6, libata NMI: 0 0 LOC: 220727 221012 ERR: 0 MIS: 0 grep . -r /sys/devices/system/cpu/cpu0/cpuidle/ grep: /sys/devices/system/cpu/cpu0/cpuidle: No such file or directory #Note: The only file in cpu0 directory is cpufreq Problem with sound? Not the usual; there is no sound. But video playback is smooth. (Video playback is a weaker indicator) > With "nolapic" you will probably have only one CPU. Confirmed. > Another possible kernel option: > "nmi_watchdog=1" > "nmi_watchdog=2" I get very strange behavior with OpenSUSE 11.4 (yes, I'm back to suse now): 1. After installation, it hung long and often when playing audio (something like every 10 seconds), and I could let it hang for as long as I wanted, though didn't try more than maybe 10 seconds. Pretty much like 11.3. Didn't test rt-benchmark. 2. "nmi_watchdog=1" had no effect on audio hangs. Didn't test rt-benchmark. 3. "nmi_watchdog=2" dramatically reduced the frequency of audio hangs (like once in 15 minutes), and the computer resumes from hangs by itself (after a varying delay typically < 1s). Rt-benchmark failed to set scheduling priority. 4. Back to no workaround: Behaves exactly like "nmi_watchdog=2"! Is this strange? I double-checked the kernel command line with dmesg. The behavior of "nmi_watchdog=2" (with respect to audio hangs) resembled that of Fedora 14 and Kubuntu 10.10. Rt-benchmark does not work properly in OpenSUSE 11.4. I have not seen this problem before. By failing to set scheduling priority, it has lost its credibility. What fails (in C): struct sched_param param = {0}; param.sched_priority = sched_get_priority_max(SCHED_FIFO); sched_setscheduler(0, SCHED_FIFO, ¶m) //The last line sets errno=EPERM Operation not permitted -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c65
--- Comment #65 from Nik Swiridow
grep: /sys/devices/system/cpu/cpu0/cpuidle: No such file or directory #Note: The only file in cpu0 directory is cpufreq
Please use "cat /proc/acpi/processor/CPU0/power" instead. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c66
--- Comment #66 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c67
--- Comment #67 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c68
--- Comment #68 from Andreas Nordal
Some other old Live CD:
The 2.6.17 kernel of Ubuntu 6.10 x86_64 is good. Schedulability of rt-benchmark was similar to OpenSUSE 10.2, approaching suitability for hard real-time at intervals of 0.008s (did not hang while I was eating dinner). I was unable to test audio playback, because X did not work (even with "safe graphics" startup option) and it had no commandline player (that I was aware of). The package manager was outdated and failed to install build-essential. No package manager & no compiler => hopeless. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c69
--- Comment #69 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c70
--- Comment #70 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c71
--- Comment #71 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c72
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c73
--- Comment #73 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c74
--- Comment #74 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c75
--- Comment #75 from Andreas Nordal
We need to be very careful in never intermangling different issues when reporting bugs. That worries me too. Sorry for extending the scope of your bugreport, but I suspect your Scaleo has a more general problem, and I need your verification.
Nonetheless I do continue to regard audio/video playback as the ideal test case for this hpet/apic related issue. An ideal testcase (or rather general) is what I tried to make with rt-benchmark. As written in my first post, it was not only about audio/video for me.
We must not leave things like this That's the spirit! Sorry for letting that one go. Guess I should contact you if I see anything suspicious;)
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c76
--- Comment #76 from Nik Swiridow
C2: type[C2] usage[00012248] duration[00000000000077420436] *C3: type[C3] usage[00347715] duration[00000000003032816785]
Looks like everything is OK with Kernel 2.6.18. As you can see, processor enters in C2/C3 states that is important for laptop. Well, Kernels 2.6.17 - 18 are "good". What's about Ubuntu 7.04-7.10? + Ubuntu 6.10 http://old-releases.ubuntu.com/releases/edgy/ ? Ubuntu 7.04 http://old-releases.ubuntu.com/releases/feisty/ ? Ubuntu 7.10 http://old-releases.ubuntu.com/releases/gutsy/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c77
--- Comment #77 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c78
--- Comment #78 from Nik Swiridow
Ubuntu 7.04 and 7.10 are also good.
Could you test the 32-bit version of Ubuntu 7.10 also? And "not working" kernels such as 2.6.27 with the option "nohz=off"... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c79
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c80
--- Comment #80 from Nik Swiridow
Strange 1: Ubuntu 7.10 i686 is bad. (did not test 7.04 i686)
It is interesting to compare configurations for 32/64-bit Ubuntu 7.10 Kernels. Please, show the output of cat /boot/config-`uname -r` | egrep "TIMER|HZ" or even attach the whole config files. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c81
--- Comment #81 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c82
--- Comment #82 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c83
--- Comment #83 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c84
--- Comment #84 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c85
--- Comment #85 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c86
--- Comment #86 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c87
--- Comment #87 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c88
--- Comment #88 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c89
--- Comment #89 from Nik Swiridow
All works fine now
Okay. I had intended to give you a link with more simple instruction :) http://www.howtoforge.com/kernel_compilation_suse_p2 (Page 1 is a bit obsolete, but overall it is a very good guide)
but I have a feeling that the battery is draining faster than normal.
Have you reinstalled the proprietary Nvidia driver for the new kernel? Please check all your power management settings (Display, Video Card, Processor). NOHZ=off really can slightly reduce powersaving (especially when HZ=1000). But on the other hand the processor uses C2/C6 states now. So, more testing needed (CPU temperature in idle etc.)
Maybe it is difficult to pass these tests...
Maybe. Though SMI latency = 549us is quite high. Anyway, only vendor can fix it... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Have you reinstalled the proprietary Nvidia driver for the new kernel? No, that was it. Clearly. The laptop was hot underneath too. After installing
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c90
Andreas Nordal
SMI latency = 549us is quite high. Anyway, only vendor can fix it... Good to know. But 549µs latency does not explain why the kernel hangs indefinately, unless it makes some assumption? Maybe the assumption does not hold for Phoenix bioses of the 2008 era...
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c91
--- Comment #91 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c92
--- Comment #92 from Andreas Nordal
Music playback hangs almost instantaneously. When running the rt-benchmark program it outputs a new timestamp as soon as I move the mouse upon a thereby finished hang Yes, that is indeed how things behave on my computer too.
The duration of the last hang is at least 827.771259993s - 766.438843543s = 61.33241645s, max 1s more (the period). The text "Not scheduled since" should really be "Should have run at". I should correct this awkwardness.
From my experience, there are 2 severities of this bug. It seems your tested kernel is of the kind that hangs willingly, and will not easily resume by itself. It may actually hang forever, depending on the tasks running; playing a video may make it autoresume hundreds of times per second.
Other kernels hang seldomly and autoresumes within a second (typically much less than a second). To detect short hangs, set the deadline of rt-benchmark accordingly (think of it as a filter). Setting it too low is not dangerous, but gives false positives. Experiment to find a realistic deadline for your system (not necessarily a sharp limit). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c93
--- Comment #93 from Andreas Nordal
From http://en.wikipedia.org/wiki/HPET Since HPET compares the actual timer value and the programmed target value on equality rather than "greater or equal", interrupts can be missed if the target time has already passed when the comparator value is written into the chip's register. In the presence of non-maskable interrupts (such as System Management Interrupts) that do not have a hard upper bound on their execution time, this race condition requires time-consuming re-checks of the timer after setup and is hard to avoid completely. The difficulties are exacerbated if the comparator value is not synchronized with the timer immediately, but delayed by one or two ticks, as some chipsets do.
Could it be that the kernel goes to sleep with the alarm clock set to some time in the past? I guess SMI="System Management Interrupt", so that my 549µs SMI latency is relevant. But even if my bios is junk, wouldn't this problem still require a hole in some "time-consuming re-checks of the timer after setup", according to Wikipedia? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c94
--- Comment #94 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c95
--- Comment #95 from Andreas Nordal
Andreas, are you not satisfied with your custom kernel? In theory, I am satisfied. I find myself using other kernels too, but the workarounds are okay.
I am just upset about the potential frustration that this is going to cause to other users. Only experts will find out what's wrong. Not to mention that this problem is dangerous! Imagine sitting in a rocket controlled by Linux, we better hope the kernel can not enter an eternal hang then ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c96
--- Comment #96 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c97
--- Comment #97 from Andreas Nordal
And what does intel_idle say about your processor? dmesg | grep "intel_idle"
[ 0.545904] intel_idle: MWAIT substates: 0x3122220 [ 0.545906] intel_idle: does not run on family 6 model 23 This was with my Compal laptop again, with kernel 2.6.39.2-36-desktop (geeko@buildhost) from the tumbleweed repos. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c98
--- Comment #98 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c99
--- Comment #99 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c100
--- Comment #100 from Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c101
--- Comment #101 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c102
--- Comment #102 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c103
Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c104
--- Comment #104 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c105
--- Comment #105 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c106
--- Comment #106 from Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c107
Andreas Nordal
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c108
Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c109
--- Comment #109 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c110
--- Comment #110 from Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c111
--- Comment #111 from Andreas Nordal
I'm not sure we can do much about that in the kernel, unless there is at least one NOHZ kernel that worked for you on that machine. Do you have such a kernel by chance? No, it looks like the last working kernel predates NOHZ (comment #82).
The problem at hand is that NOHZ=on doesn't work, though, isn't it? Yes, but Nik found that NOHZ=off had the side effect of disabling C2/C3 states (comment #85). The nohz= kernel option is therefore a choice between two evils.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c112
--- Comment #112 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c113
Rafael Wysocki
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c114
--- Comment #114 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c115
--- Comment #115 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c116
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c117
--- Comment #117 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c118
Rafael Wysocki
C2 is the 2nd idle state. The external I/O Controller Hub blocks interrupts to the processor. And so on with C3, C4, etc. I'll discuss this further down in this paper. By the way, there is nothing preventing the OS from busy waiting in its idle state, and thus keeping the processor in C0, as did older operating systems.
Yes, that's what processor.max_cstate=1 kernel command line option does. So, please use it if you have this issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c119
--- Comment #119 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c120
--- Comment #120 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c121
--- Comment #121 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c122
--- Comment #122 from Rafael Wysocki
Rafael, what's the upstream fix? Isn't processor.max_cstate=1 just a workaround?
The issue is an upstream problem and I'm not aware of any fixes other than limiting the set of C-states the CPUidle driver can use. The problem here is that the CPU cannot be woken up from deeper idle states by the clock event device we're using (HPET in this particular case if I'm not mistaken), but I'm not aware of any way to figure out which clock event device should be used instead (and, moreover, I'm not aware of any way to learn in advance that HPET will not work). Anyway, this problem should be reported upstream, so that we can involve more developers with expertise in this particular area in the discussion. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c123
--- Comment #123 from Rafael Wysocki
If this is true a processor.max_wait_cstate=1 should apply for all Core 2 Duo systems.
This definitely is not the case. It usually isn't necessary to use processor.max_cstate=1 on those systems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c124
--- Comment #124 from Andreas Nordal
Anyway, this problem should be reported upstream, so that we can involve more developers with expertise in this particular area in the discussion.
Let's do that. But first, try to find a duplicate. Searching for "nohz" on bugzilla.kernel.org, I just found this strikingly similar report: https://bugzilla.kernel.org/show_bug.cgi?id=12118 Similar, in terms of symptoms, workarounds and hardware. This too is a Core 2 Duo T8100 like mine (don't know about bios). Should we continue from there? Lastly, to help determine whether this is a BIOS bug (for other people reading this) let's sum up the hardware statistics: I have: Multicom Compal JFL92+ Intel Core 2 Duo T8100 Phoenix BIOS version 1.16 (link broken, but 1.18 was released 2008-aug-4) Aaron Burgemeister, whos report is marked as duplicate of this, has: HP Pavilion dv6700 Notebook PC AMD Turion(tm) 64 X2 Mobile Technology TL-60 Hewlett-Packard BIOS version F.25 (released 2007-11-29) Elmar, I can't find your system info in here. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c125
--- Comment #125 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c126
--- Comment #126 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c127
--- Comment #127 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c128
--- Comment #128 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c129
--- Comment #129 from Nik Swiridow
Please test with caution!
I've posted Bandini's (another happy owner of Amilo Xi 2550) report here just
for your information. I didn't test it myself.
Please, read this topic carefully
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/620455
--- Comment #130 from Nik Swiridow
Please test with caution!
I've posted Bandini's (another happy owner of Amilo Xi 2550) report here just for your information. I didn't test it myself. Please, read this topic carefully https://bugs.launchpad.net/ubuntu/+source/linux/+bug/620455 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c129
--- Comment #129 from Nik Swiridow
Please test with caution!
I've posted Bandini's (another happy owner of Amilo Xi 2550) report here just
for your information. I didn't test it myself.
Please, read this topic carefully
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/620455
--- Comment #130 from Nik Swiridow
Please test with caution!
I've posted Bandini's (another happy owner of Amilo Xi 2550) report here just for your information. I didn't test it myself. Please, read this topic carefully https://bugs.launchpad.net/ubuntu/+source/linux/+bug/620455 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c131
--- Comment #131 from Andreas Nordal
If this is resolved upstream; could someone please add the respective URL at bugzilla.kernel.org.
Let's continue at https://bugzilla.kernel.org/show_bug.cgi?id=12118 which you have already done ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c132
--- Comment #132 from Nik Swiridow
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c133
--- Comment #133 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c134
--- Comment #134 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c135
--- Comment #135 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c136
--- Comment #136 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c137
Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c139
--- Comment #139 from Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c140
--- Comment #140 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c141
--- Comment #141 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c142
--- Comment #142 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c143
--- Comment #143 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c144
--- Comment #144 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c145
--- Comment #145 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c146
Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c147
Mike Galbraith
This looks like something for the sched people. Adding Mike.
So, a couple of observations I was able to make; I want to highly emphasize here that I'm no sched guy.
* AFAICT, your synthetic benchmark sets sleep intervals of about 1 ms which is very close to the scheduling latency I was able to observe through watching vmstat while building the kernel. It showed 800-900 context switches per second which is something a little more than 1ms. I'm not sure the stock kernel can even handle such small sleep intervals.
Well, that's kind of a trick question. A PREEMPT kernel can handle that depending on hardware/firmware and how tight your jitter tolerance is. My old Q6600 box running a PREEMPT/250 Hz kernel can easily meet a 1KHz periodic demand even under hefty load iff I don't expect too much worst case wise. On an isolated core as below, I can expect a whole lot more.. marge:~ # cgexec -g cpuset:rtcpus jitter -c 3 -p 99 -t 30 -f 1000 -d 10 -t 25 CPU3 priority: 99 timer freq: 1000 Hz (1000000 ns) tolerance: 25 usecs, stats interval: 10 secs jitter: 3.82 min: 2.81 max: 6.63 mean: 3.00 stddev: 0.13 jitter: 4.47 min: 2.82 max: 7.29 mean: 3.00 stddev: 0.13 jitter: 2.96 min: 2.81 max: 5.77 mean: 3.00 stddev: 0.12 jitter: 5.06 min: 2.81 max: 7.87 mean: 3.00 stddev: 0.14 jitter: 3.13 min: 2.82 max: 5.95 mean: 3.00 stddev: 0.13 jitter: 28.05 min: 2.84 max: 30.88 mean: 13.91 stddev: 5.28 27 > 25 us hits min: 26.12 max: 30.88 mean: 27.49 stddev: 1.22 jitter: 32.72 min: 3.78 max: 36.51 mean: 16.40 stddev: 0.96 25 > 25 us hits min: 26.56 max: 36.51 mean: 27.55 stddev: 1.97 jitter: 32.61 min: 3.84 max: 36.45 mean: 16.41 stddev: 1.02 31 > 25 us hits min: 26.12 max: 36.45 mean: 27.54 stddev: 2.34 jitter: 22.54 min: 5.12 max: 27.66 mean: 16.41 stddev: 0.71 18 > 25 us hits min: 26.42 max: 27.66 mean: 26.89 stddev: 0.31 ..but above, you can plainly see where I killed the cpuhog that was keeping CPU3 away from cstates. The firmware on this box is ok (no SMI), but the core2 CPU itself isn't particularly wonderful. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c148
--- Comment #148 from Borislav Petkov
Well, that's kind of a trick question. A PREEMPT kernel can handle that depending on hardware/firmware and how tight your jitter tolerance is. My old Q6600 box running a PREEMPT/250 Hz kernel can easily meet a 1KHz periodic demand even under hefty load iff I don't expect too much worst case wise.
Hehe, I think this is exactly the problem: reportedly, current kernel leads to "degraded music playback and gaming quality". And yep, it must very well be hardware/firmware-dependent because audio runs just fine on my box here, even under heavy load (I don't play games :-)).
On an isolated core as below, I can expect a whole lot more..
Hmm, I was wondering whether NO_HZ_FULL would help even on an SMI-plagued hw? That would be an interesting thing to try.
marge:~ # cgexec -g cpuset:rtcpus jitter -c 3 -p 99 -t 30 -f 1000 -d 10 -t 25 CPU3 priority: 99 timer freq: 1000 Hz (1000000 ns) tolerance: 25 usecs, stats interval: 10 secs
jitter: 3.82 min: 2.81 max: 6.63 mean: 3.00 stddev: 0.13 jitter: 4.47 min: 2.82 max: 7.29 mean: 3.00 stddev: 0.13 jitter: 2.96 min: 2.81 max: 5.77 mean: 3.00 stddev: 0.12 jitter: 5.06 min: 2.81 max: 7.87 mean: 3.00 stddev: 0.14 jitter: 3.13 min: 2.82 max: 5.95 mean: 3.00 stddev: 0.13 jitter: 28.05 min: 2.84 max: 30.88 mean: 13.91 stddev: 5.28 27 > 25 us hits min: 26.12 max: 30.88 mean: 27.49 stddev: 1.22
jitter: 32.72 min: 3.78 max: 36.51 mean: 16.40 stddev: 0.96 25 > 25 us hits min: 26.56 max: 36.51 mean: 27.55 stddev: 1.97
jitter: 32.61 min: 3.84 max: 36.45 mean: 16.41 stddev: 1.02 31 > 25 us hits min: 26.12 max: 36.45 mean: 27.54 stddev: 2.34
jitter: 22.54 min: 5.12 max: 27.66 mean: 16.41 stddev: 0.71 18 > 25 us hits min: 26.42 max: 27.66 mean: 26.89 stddev: 0.31
..but above, you can plainly see where I killed the cpuhog that was keeping CPU3 away from cstates. The firmware on this box is ok (no SMI),
How am I to read this? As in "jitter grows when the core is allowed to go in deeper C-states?"
but the core2 CPU itself isn't particularly wonderful.
Why? I don't think the CPU itself has any fault in this - it is the glue around it which makes it misbehave, like C-states and such. If you keep it powered on throughout with idle=poll, for example, C-states influence should be gone. I even heard RT-folk do that in certain cases. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c149
--- Comment #149 from Mike Galbraith
(In reply to comment #147)
Well, that's kind of a trick question. A PREEMPT kernel can handle that depending on hardware/firmware and how tight your jitter tolerance is. My old Q6600 box running a PREEMPT/250 Hz kernel can easily meet a 1KHz periodic demand even under hefty load iff I don't expect too much worst case wise.
Hehe, I think this is exactly the problem: reportedly, current kernel leads to "degraded music playback and gaming quality".
And yep, it must very well be hardware/firmware-dependent because audio runs just fine on my box here, even under heavy load (I don't play games :-)).
Here too. If it's _not_ one of those crippled up hpet inflicted boxen (can't help at all there), I'd fire up tracer with wakeup_rt tracer and see what's going on if an rt task doesn't hit the CPU in short order. A full ms wakeup latency is definitely not considered to be in short order for an rt task ;-) You can forget that 100us constraint though, it's not really achievable under load with a non-rt kernel, and certainly not when you add tracing overhead on top. If THP is on, turn that off (I traced >600us held in kernel by that in NOPREEMPT kernel iirc), and run some normal load where sound doesn't play well.
On an isolated core as below, I can expect a whole lot more..
Hmm, I was wondering whether NO_HZ_FULL would help even on an SMI-plagued hw? That would be an interesting thing to try.
There's nothing at all you can do for SMIs afaik, unless your BIOS will let you turn the darn things off. My core2 box won't even enter tickless mode. Per Frederic, the unstable tsc precludes that. Booting tsc=reliable didn't help though...
jitter: 22.54 min: 5.12 max: 27.66 mean: 16.41 stddev: 0.71 18 > 25 us hits min: 26.42 max: 27.66 mean: 26.89 stddev: 0.31
..but above, you can plainly see where I killed the cpuhog that was keeping CPU3 away from cstates. The firmware on this box is ok (no SMI),
How am I to read this? As in "jitter grows when the core is allowed to go in deeper C-states?"
Yeah.
but the core2 CPU itself isn't particularly wonderful.
Why? I don't think the CPU itself has any fault in this - it is the glue around it which makes it misbehave, like C-states and such. If you keep it powered on throughout with idle=poll, for example, C-states influence should be gone. I even heard RT-folk do that in certain cases.
Ok, the combo of old core2 cpu and its associated old glue isn't wonderful for rt. I have no idea what specific silicon gnomes in specific cubicles do from 9 to 5 ;-) An Intel guy told me core2 had latency issues, maybe he was talking about glue, dunno. Funny thing with my core2 box is just keeping it away from cstates isn't enough, it has to be kept kept churning and burning to eliminate the idle jitter. Hohum, kind off topic, but mentioned wrt that 100us thing, it all contributes, and as you can see from my box, a substantial chunk of that unofficial, but frequently mentioned and pulled out of thin air, target latency number is pre-consumed. If I don't set cpufreq to performance, there goes a bunch more. Add nohz, etc etc... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c150
--- Comment #150 from Borislav Petkov
If it's _not_ one of those crippled up hpet inflicted boxen (can't help at all there),
I don't think anyone can, especially if the HPET is updated/managed in SMM. And by the looks of it, the box is using HPET: [ 3.761653] Monitor-Mwait will be used to enter C-1 state [ 3.761675] Monitor-Mwait will be used to enter C-3 state [ 3.761681] Marking TSC unstable due to TSC halts in idle [ 3.761706] Switching to clocksource hpet it switches away from tsc since it gets turned off in some C-state != C0. Elmar, can you upload a recent dmesg from booting on this box and can you also try booting with "hpet=disable" on the kernel command line and see whether you can repro then? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c151
--- Comment #151 from Borislav Petkov
If it's _not_ one of those crippled up hpet inflicted boxen (can't help at all there),
I don't think anyone can, especially if the HPET is updated/managed in SMM. And by the looks of it, the box is using HPET: [ 3.761653] Monitor-Mwait will be used to enter C-1 state [ 3.761675] Monitor-Mwait will be used to enter C-3 state [ 3.761681] Marking TSC unstable due to TSC halts in idle [ 3.761706] Switching to clocksource hpet it switches away from tsc since it gets turned off in some C-state != C0. Elmar, can you upload a recent dmesg from booting on this box (both SUSE and 3.9.2 kernel). Also, can you try booting with "hpet=disable" on the kernel command line and see whether you can repro then? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c152
--- Comment #152 from Borislav Petkov
If it's _not_ one of those crippled up hpet inflicted boxen (can't help at all there), I'd fire up tracer with wakeup_rt tracer and see what's going on if an rt task doesn't hit the CPU in short order. A full ms wakeup latency is definitely not considered to be in short order for an rt task ;-)
Ok.
You can forget that 100us constraint though, it's not really achievable under load with a non-rt kernel, and certainly not when you add tracing overhead on top. If THP is on, turn that off (I traced
600us held in kernel by that in NOPREEMPT kernel iirc), and run some normal load where sound doesn't play well.
Well, I'd still like to stay within the reasonable here since it is the stock kernel and we're talking about audio (I'm not saying RT folk is unreasonable :-)). If it can stomach higher latencies without becoming noticeable, then we're fine.
There's nothing at all you can do for SMIs afaik, unless your BIOS will let you turn the darn things off. My core2 box won't even enter tickless mode. Per Frederic, the unstable tsc precludes that. Booting tsc=reliable didn't help though...
That's impossible since the TSC halts in a C-state and if HPET is using SMM (or SMM is happening anyway because some idiotic vendors decided to do power management in it; oh, and with UEFI we'll be entering the BIOS even more, <pukes>) then you're right, no way we can avert SMM.
Ok, the combo of old core2 cpu and its associated old glue isn't wonderful for rt. I have no idea what specific silicon gnomes in specific cubicles do from 9 to 5 ;-)
I think we're better off if we never know. (I'm trying hard to forget some stuff I saw :-)).
An Intel guy told me core2 had latency issues, maybe he was talking about glue, dunno.
Yeah, probably.
Funny thing with my core2 box is just keeping it away from cstates isn't enough, it has to be kept kept churning and burning to eliminate the idle jitter. Hohum, kind off topic, but mentioned wrt that 100us thing, it all contributes, and as you can see from my box, a substantial chunk of that unofficial, but frequently mentioned and pulled out of thin air, target latency number is pre-consumed. If I don't set cpufreq to performance, there goes a bunch more. Add nohz, etc etc...
Hmm, ok, I think I can see what the latency issues could be: C-states entry and exit need to do a bunch of stuff behind the scenes, architecture-wise, and there you can have your possible delays. That's why you want to keep your cores in C0 - no idle entry at all. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c153
--- Comment #153 from Mike Galbraith
(In reply to comment #149)
If it's _not_ one of those crippled up hpet inflicted boxen (can't help at all there), I'd fire up tracer with wakeup_rt tracer and see what's going on if an rt task doesn't hit the CPU in short order. A full ms wakeup latency is definitely not considered to be in short order for an rt task ;-)
Ok.
You can forget that 100us constraint though, it's not really achievable under load with a non-rt kernel, and certainly not when you add tracing overhead on top. If THP is on, turn that off (I traced
600us held in kernel by that in NOPREEMPT kernel iirc), and run some normal load where sound doesn't play well.
Well, I'd still like to stay within the reasonable here since it is the stock kernel and we're talking about audio (I'm not saying RT folk is unreasonable :-)). If it can stomach higher latencies without becoming noticeable, then we're fine.
Audio playback is generally so heavily buffered that it's pretty boring, can and must tolerate a LOT of scheduling latency because it's done by plain old SCHED_OTHER tasks that have no idea or control over when they'll be preempted, or for how long. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c154
--- Comment #154 from Borislav Petkov
Well, I'd still like to stay within the reasonable here since it is the stock kernel and we're talking about audio (I'm not saying RT folk is unreasonable :-)). If it can stomach higher latencies without becoming noticeable, then we're fine.
Audio playback is generally so heavily buffered that it's pretty boring, can and must tolerate a LOT of scheduling latency because it's done by plain old SCHED_OTHER tasks that have no idea or control over when they'll be preempted, or for how long.
Yep, and reportedly it still causes degradation in audio performance. I'd be nice to have a good reproducer for this observation though ... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c155
--- Comment #155 from Mike Galbraith
(In reply to comment #153)
Well, I'd still like to stay within the reasonable here since it is the stock kernel and we're talking about audio (I'm not saying RT folk is unreasonable :-)). If it can stomach higher latencies without becoming noticeable, then we're fine.
Audio playback is generally so heavily buffered that it's pretty boring, can and must tolerate a LOT of scheduling latency because it's done by plain old SCHED_OTHER tasks that have no idea or control over when they'll be preempted, or for how long.
Yep, and reportedly it still causes degradation in audio performance. I'd be nice to have a good reproducer for this observation though ...
My little Toshiba Satellite lappy is blessed with a tsc that stops, so uses hpet clocksource (unless I boot processor.max_cstate=1). Audio playback is peachy, as is playing a DVD serfing etc, both with and without a competing load, ie w. wo. entering C-[23]. read_hpet() is pretty horrible kernel overhead wise, but everything works just fine with openSUSE-12.3 kernel or mainline. Hohum.. my hpet is not the right flavor of trash. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c156
--- Comment #156 from Mike Galbraith
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c157
--- Comment #157 from Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c158
--- Comment #158 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c159
--- Comment #159 from Elmar Stellnberger
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c160
--- Comment #160 from Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c161
--- Comment #161 from Andreas Nordal
Can anyone describe exactly what the issue is and how I can reproduce it
This is about system-wide hangs (no task making progress), that: * resumes immediately on user input (from keyboard, mouse, laptop lid, plugging AC power) * easily lasts for hours on some kernels if your patience permits, to speculatively small (centisecs) on others. In one occasion, X had the effect of limiting my hangs to just up under 1 second. * occur under light load — an effective remedy is to open a terminal and issue `yes`. A 100% effective workaround is "processor.max_cstate=1". * occurs more frequently when playing audio, running rt-benchmark, and running wget. How to reproduce: 1. Watch glxgears (or something that updates the screen) while playing audio. 2. Observe that glxgears hangs whenever your audio player hangs, and that sound is repeated for the duration of this hang. 3. If you're lucky, you may test your patience by letting it hang, and when you feel for it, move your mouse 1mm, and observe that all tasks resume. If you're less lucky and only see/hear small hangs, you may still observe that keeping your mouse in motion prevents hanging. The role of rt-benchmark is to measure duration of hangs. My affected laptop has a broken screen, so I'm no longer able to test. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c162
--- Comment #162 from Mike Galbraith
How to reproduce: 1. Watch glxgears (or something that updates the screen) while playing audio. 2. Observe that glxgears hangs whenever your audio player hangs, and that sound is repeated for the duration of this hang.
I'm doing that as I write. glxgears is spinning away smoothly, music doesn't sound wonderful coming out of laptop speakers, but it's skip free no matter what I do, unless I run absurd loads of course. My little Toshiba Satellite has the right BIOS and the right flavor of tsc to reproduce, but it is lacking the key ingredient, broken hpet hardware. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c163
--- Comment #163 from Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c164
--- Comment #164 from Mike Galbraith
Btw, Mike, you could try to boot with hpet=disable so that you can fallback to the acpi_pm timer and see whether you can see delays
Sure. Works fine for me. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c165
Borislav Petkov
https://bugzilla.novell.com/show_bug.cgi?id=579932
https://bugzilla.novell.com/show_bug.cgi?id=579932#c166
--- Comment #166 from Nik Swiridow
participants (1)
-
bugzilla_noreply@novell.com