[Bug 791094] New: khugepaged crash time to time
https://bugzilla.novell.com/show_bug.cgi?id=791094 https://bugzilla.novell.com/show_bug.cgi?id=791094#c0 Summary: khugepaged crash time to time Classification: openSUSE Product: openSUSE 12.1 Version: Final Platform: i686 OS/Version: openSUSE 12.1 Status: NEW Severity: Critical Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: admin@dmbsoft.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=514367) --> (http://bugzilla.novell.com/attachment.cgi?id=514367) Screendump User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 The system (Acer Aspire M3201) crashed some time (now 3-4 per week). There are no log entries, only this screenshot is available. Best regards Dirk Reproducible: Couldn't Reproduce -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c
Jiaying ren
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c1
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c2
--- Comment #2 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c3
--- Comment #3 from Michal Hocko
The module was loaded but not in use. For this reason it is not installed the last driver.
root@dmbsrv-12:45-~# lsmod | grep nvidia nvidia 10988100 0 root@dmbsrv-12:46-~# rmmod nvidia
But I have add in /etc/modprobe.d/99-local.conf "blacklist nvidia".
OK, Let's see if the problem triggers with nvidia out of the game. It is quite possible that it will as you say it wasn't in use but better be sure about that.
Should is usefull to make a
echo never > /sys/kernel/mm/transparent_hugepage/defrag
to avoid this problem?
This would effectively put whole THP out of game, including the crashing kernel thread. So it would be preferable at always to find out what is going on. Btw. could you install kdump package and enable it to get a crash dump when this happens. It would be much easier to debug the issue then. My suspicion is that we are just missing a bug fix. I will try to search for anything interesting in the meantime. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c4
--- Comment #4 from Michal Hocko
Should is usefull to make a
echo never > /sys/kernel/mm/transparent_hugepage/defrag
to avoid this problem?
This would effectively put whole THP out of game, including the crashing kernel thread. So it would be preferable at always to find out what is going on.
Bahh /me should learn to read - though it was /sys/kernel/mm/transparent_hugepage/enabled. Anyway disabling the kernel thread would workaround the problem obviously. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c5
--- Comment #5 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c6
--- Comment #6 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c7
--- Comment #7 from Michal Hocko
Hmm, my compiler disagrees with your Code: .... part but it is just a different version. Could please send me disassemble around the faulting place?
Just in case you do not know how to do this then you have 2 options. The easiest way to get it is to decompress /boot/vmlinux-VERSION.gz and gdb vmlinux-VERSION (and disassemble isolate_migratepages and scroll to offset +522).
And it seems that the other half of the comment got lost somewhere. So just for the record. The other option is to use objdump -d vmlinux-VERSION | less and search for the function. Anyway I do not think this is necessary for now. The patch will follow in the comment. Let me know if I should build a testing kernel for you. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c8
--- Comment #8 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c9
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c10
--- Comment #10 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c11
--- Comment #11 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c12
--- Comment #12 from Mel Gorman
Mel, I guess the above patch should fix this but it would be great if you could have a look if you have some spare time.
I think the patch is a very strong candidate for fixing this particular bug. The reported bug and what the patch originally fixed feel very similar. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c13
--- Comment #13 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c14
--- Comment #14 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c15
--- Comment #15 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c16
Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c17
--- Comment #17 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c18
--- Comment #18 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c19
--- Comment #19 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c20
--- Comment #20 from Michal Hocko
All running fine. Is anything to configure for kdump?
There is a gui in Yast to enable it. I guess this should be sufficient but you can double check that crashkernel=FOO was added into your kernel boot command line in grub config. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c21
--- Comment #21 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c22
--- Comment #22 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c23
--- Comment #23 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c24
--- Comment #24 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c25
--- Comment #25 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c26
--- Comment #26 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c27
--- Comment #27 from Michal Hocko
The output is
root=/dev/disk/by-label/SUSE11.4-System repair=1 nosplash showopts nmi_watchdog=0 crashkernel=256M-:128M vga=0x318
This looks good.
The system show this not as crash than as normal reboot
root pts/0 x.x.x.x Mon Dec 3 14:10 - 16:10 (02:00) reboot system boot 3.1.10-1-pae Mon Dec 3 15:08 - 16:13 (01:04)
Maybe the network/kernel generate 100% of all 3 cpu that the system not resonse (network and console)? If there a way to say the kernel should only use one or two cores an leave the other untouched?
I am not sure I understand what you meant by that. (In reply to comment #26)
If you have no idea, I can /sys/kernel/mm/transparent_hugepage/defrag set to never. If in 2 weeks no crash occour, than khugepaged make the problem, crash the system, than by other reasons and the thread is obsolete...
I am quite positive that the patch fixes the issue. Maybe there is something else going on here. But we need to get a crashdump or at least a kernel trace. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c28
--- Comment #28 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c29
--- Comment #29 from Michal Hocko
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c30
--- Comment #30 from Dirk-Michael Brosig
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c31
--- Comment #31 from Michal Hocko
Yes of course. But how/what I must configure that a trace or dump is written over the serial console?
Dump is not written over serial console but you can enable logging via serial console by adding something like console=ttyS0,115200 to your kernel boot command line parameters. If you do not have a serial port you can also try netconsole ( Documentation/networking/netconsole.txt in the kernel source will tell you more about the setup and usage). Thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=791094
https://bugzilla.novell.com/show_bug.cgi?id=791094#c32
Michal Hocko
participants (1)
-
bugzilla_noreply@novell.com