[Bug 685777] New: kernel wedging under load ...
https://bugzilla.novell.com/show_bug.cgi?id=685777 https://bugzilla.novell.com/show_bug.cgi?id=685777#c0 Summary: kernel wedging under load ... Classification: openSUSE Product: openSUSE 11.4 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: mmeeks@novell.com QAContact: qa@suse.de Found By: --- Blocker: --- I can no longer compile LibreOffice on 11.4 - under load (same load as openSUSE 11.3 survived interactively) it wedges the machine tight. The disk appears to stop responding, and while that resets, I get no interactivity (of course). I don't use swap (since I have an SSD and have 3GB of RAM). my /var/log/messages is plagued with: Apr 7 10:07:52 lenovo-w500 kernel: [ 104.537048] hub 1-0:1.0: connect-debounce failed, port 3 disabled prolly un-related; but I get some flurry of oom-killer activity in the logs: pr 7 10:01:45 lenovo-w500 kernel: [51723.303571] Pid: 4462, comm: cc1plus Not tainted 2.6.37.1-1.2-desktop #1 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303572] Call Trace: Apr 7 10:01:45 lenovo-w500 kernel: [51723.303594] [<c02062a3>] try_stack_unwind+0x173/0x190 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303599] [<c0204ebf>] dump_trace+0x3f/0xe0 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303603] [<c020630b>] show_trace_log_lvl+0x4b/0x60 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303606] [<c0206338>] show_trace+0x18/0x20 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303611] [<c068d44a>] dump_stack+0x6d/0x72 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303616] [<c02da1e4>] dump_header+0x84/0x1e0 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303620] [<c02da7b0>] oom_kill_process+0x90/0x190 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303624] [<c02dab77>] out_of_memory+0xd7/0x200 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303628] [<c02defc8>] __alloc_pages_nodemask+0x678/0x690 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303642] [<c030e417>] alloc_pages_current+0x77/0xd0 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303646] [<c02e14e1>] __do_page_cache_readahead+0xf1/0x220 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303650] [<c02e18ee>] ra_submit+0x1e/0x30 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303653] [<c02d9747>] filemap_fault+0x347/0x410 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303660] [<c02f5212>] __do_fault+0x52/0x510 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303664] [<c02f93f9>] handle_mm_fault+0x169/0x410 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303668] [<c0692ff0>] do_page_fault+0x170/0x4b0 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303672] [<c06909c6>] error_code+0x5a/0x60 Apr 7 10:01:45 lenovo-w500 kernel: [51723.303690] [<081aad20>] 0x81aad20 other interesting things I've not seen recently: Apr 7 10:02:05 lenovo-w500 rtkit-daemon[1735]: The canary thread is apparently starving. Taking action. Apr 7 10:02:06 lenovo-w500 rtkit-daemon[1735]: Demoting known real-time threads. Apr 7 10:02:07 lenovo-w500 rtkit-daemon[1735]: Demoted 0 threads. my kernel is: Linux lenovo-w500 2.6.37.1-1.2-desktop #1 SMP PREEMPT 2011-02-21 10:34:10 +0100 i686 i686 i386 GNU/Linux rpm -q --changelog kernel-desktop | head * Mon Feb 21 2011 tiwai@suse.de - ALSA: caiaq - Fix possible string-buffer overflow (bnc#672499, CVE-2011-0712). - commit f6a72cc hwinfo attached - a modern Lenovo W500 laptop. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c1
--- Comment #1 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c2
--- Comment #2 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c3
Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c4
--- Comment #4 from Mel Gorman
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c5
--- Comment #5 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c6
--- Comment #6 from Mel Gorman
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c7
--- Comment #7 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c8
--- Comment #8 from Mel Gorman
Mel ! thanks so much for looking into that. Meanwhile I had had a similar problem with just overloading the machine by other means I think.
I wonder - so, it is entirely possible that my problems are related to the Intel SSD I have, is it possible that that exacerbates the paging issues ?
What is the reproduction scenario? The speed of the SSD could be masking the fact that the machine is almost out-of-memory but not enough to trigger the OOM killer. If you have no swap configured and anonymous memory is occupying a high percentage of memory (e.g. 85%) then a significant percentage of time will be spent paging to and from the SSD. On a slower disk, the machine would become extremely unresponsive. This thrashing would be visible as a high page in/out rate in "vmstat -n 1". The percentage of memory that is anonymous can be determined from the nr_active_anon and nr_inactive_anon fields in /proc/vmstat . Can you tell me if this is the case? If so, it's not a bug in the kernel unless it is a memory leak that is causing the lack of memory. Just attaching the contents of /proc/vmstat when the machine is running very slow would be helpful in determining what is going on here.
Is it possible you have a new kernel I can try to see if the issues is fixed?
I didn't build a new kernel with the backport yet. However, unless there is sufficient free memory and kswapd is still using a lot of CPU, the patches will not help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c9
Mel Gorman
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c10
--- Comment #10 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c11
--- Comment #11 from Michael Meeks
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c12
--- Comment #12 from Mel Gorman
https://bugzilla.novell.com/show_bug.cgi?id=685777
https://bugzilla.novell.com/show_bug.cgi?id=685777#c13
--- Comment #13 from Michael Meeks
participants (1)
-
bugzilla_noreply@novell.com