[Bug 465039] New: ksoftirqd takes 100% cpu, unable to reboot properly

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c1 Summary: ksoftirqd takes 100% cpu, unable to reboot properly Product: openSUSE 11.1 Version: Final Platform: i586 OS/Version: openSUSE 11.1 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: vmicho@gmail.com QAContact: qa@suse.de Found By: --- After a while of using my notebook the ksoftirqd begins to use all my remaining cpu time. The only way to return to a normal state is by reboot. But also the reboot fails (it hangs somewhere, I cannot see the console. I'm using nvidia's video driver and have black console as soon as X starts - I can eventually set the nv or vesa driver in xorg.conf and then see). The worst part is it hangs during shutdown before unmounting HDDs. alt+ctrl+del nor sysrq does not work so I need to hard boot (power off). (Therefore I set this bug as critical, otherwise it can be major) I have no idea what causes it. Maybe network? Time in /var/log/messages shows only dhcpd doing its stuff + knetworkmanager somewhat appears on the top list. It hangs on both wlan or lan I searched for a while. Some guy have similar problem with a Clevo notebook with very similar specs. I had no such problem with previous suse (11.0) Any ideas? My ideas for now can be to try these : - use for a while only the vesa/nv driver - install vanilla kernel - download & install - disable dhcp daemon (easiest I think) Here is top of the top processes: top - 22:35:21 up 1 day, 4:05, 7 users, load average: 1.66, 1.76, 1.52 Tasks: 142 total, 3 running, 139 sleeping, 0 stopped, 0 zombie Cpu0 : 2.0%us, 6.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 92.0%si, 0.0%st Cpu1 : 4.0%us, 0.0%sy, 0.3%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4088564k total, 3940616k used, 147948k free, 66132k buffers Swap: 4409800k total, 28k used, 4409772k free, 3204660k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4 root 15 -5 0 0 0 R 99 0.0 24:24.67 ksoftirqd/0 3049 root 20 0 360m 69m 9772 S 4 1.7 31:08.22 X 9720 micho 39 19 128m 47m 14m S 4 1.2 14:14.83 operapluginwrap 9628 micho 20 0 511m 382m 17m S 2 9.6 12:52.75 opera 10710 micho 20 0 156m 56m 29m S 2 1.4 11:58.14 amarokapp 3853 micho 20 0 35884 13m 8840 S 1 0.3 2:55.25 knetworkmanager 8838 root 15 -5 0 0 0 S 1 0.0 0:00.96 events/1 3984 micho 20 0 37716 16m 10m R 0 0.4 1:51.73 konsole 11588 root 20 0 99112 57m 24m S 0 1.4 0:51.00 y2base 1 root 20 0 1008 356 308 S 0 0.0 0:01.20 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.14 migration/0 7 root 15 -5 0 0 0 S 0 0.0 0:05.14 events/0 /var/log/messages around the fatal time (I think it occured somewhere after the "MARK"): Jan 9 21:46:30 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67 Jan 9 21:46:31 linux-6vsc dhclient: DHCPACK from 192.168.1.1 Jan 9 21:46:31 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1625 seconds. Jan 9 22:06:31 linux-6vsc -- MARK -- Jan 9 22:13:36 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67 Jan 9 22:13:36 linux-6vsc dhclient: DHCPACK from 192.168.1.1 Jan 9 22:13:36 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1726 seconds. My system: Linux linux-6vsc 2.6.27.7-9-pae #1 SMP 2008-12-04 18:10:04 +0100 i686 i686 i386 GNU/Linux I used kde4. Now I have kde3. sys_vendor = "CLEVO CO." sys_product = "M570TU" Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz 4GB ddr3 ram nvidia 9800gt Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02) Hope somebody will help. I have no idea what ksoftirqd if for. Thanks in advance. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User meissner@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c1 Marcus Meissner <meissner@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |vmicho@gmail.com --- Comment #1 from Marcus Meissner <meissner@novell.com> 2009-01-10 02:07:01 MST --- can you get output of: cat /proc/interrupts to see if an interrupt is triggered very often -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c2 --- Comment #2 from Michal Veselenyi <vmicho@gmail.com> 2009-01-11 02:28:15 MST --- Hi here it is (I'm running nearly the entire day and ksoftfirqd is normal for now). I'll provide another one (maybe also a graph) when the ksoftirqd problem occurs (I supopose that is really needed).
cat /proc/interrupts CPU0 CPU1 0: 26863620 0 IO-APIC-edge timer 1: 254 0 IO-APIC-edge i8042 6: 0 0 IO-APIC-edge lirc_ite8709 8: 1 0 IO-APIC-edge rtc0 9: 8972 0 IO-APIC-fasteoi acpi 12: 564 0 IO-APIC-edge i8042 16: 224750 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia 18: 1191186 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0 19: 1121921 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 22: 3042821 0 IO-APIC-fasteoi HDA Intel 23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6 216: 2369317 0 PCI-MSI-edge iwl3945 217: 6094614 0 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 4722382 16643134 Local timer interrupts RES: 1373132 3216824 Rescheduling interrupts CAL: 1511391 2413558 function call interrupts TLB: 32209 34157 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c3 --- Comment #3 from Michal Veselenyi <vmicho@gmail.com> 2009-01-11 11:46:22 MST --- Hi again. So I have here the latest interrupts before I rebooted. 0: 37799577 0 IO-APIC-edge timer 1: 309 0 IO-APIC-edge i8042 6: 47784 0 IO-APIC-edge lirc_ite8709 8: 1 0 IO-APIC-edge rtc0 9: 11984 0 IO-APIC-fasteoi acpi 12: 1074 0 IO-APIC-edge i8042 16: 276919 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia 18: 1315728 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0 19: 1579638 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 22: 3305783 0 IO-APIC-fasteoi HDA Intel 23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6 216: 3033526 0 PCI-MSI-edge iwl3945 217: 9101163 0 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 6434802 22944923 Local timer interrupts RES: 1790955 4134157 Rescheduling interrupts CAL: 1564485 2469460 function call interrupts TLB: 43600 45062 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 I'm also joining the chart from collected /proc/interrupts each minute during ~870 minutes (with a simple script) created with OOo. basically I see nothing very unusual on it. Just to note: 870th minute is around 18:30. Resume from s2ram is at 375th minute (9:15 morning). I closed all browsers and went out for skiing at around 12:30 (I left only Azureus open). So the ksoftirqd problem appeared when I was out. There is little change in slope at minute 450 (11:30), except the "eth0" curve (3rd from top). In var/log/messages there is nothing unusual, only classic dhcprequest stuff (same as in previous post). I also realized now that the minutes aren't extra accurate (just a sleep 60) which can add some error. regards. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c4 --- Comment #4 from Michal Veselenyi <vmicho@gmail.com> 2009-01-11 11:48:11 MST --- Created an attachment (id=264360) --> (https://bugzilla.novell.com/attachment.cgi?id=264360) chart of /proc/interrupts Temporal chart of /proc/interrupts -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 Cyril Hrubis <chrubis@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |kernel-maintainers@forge.pr |ovo.novell.com |ovo.novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c5 Michal Veselenyi <vmicho@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|vmicho@gmail.com | --- Comment #5 from Michal Veselenyi <vmicho@gmail.com> 2009-02-14 06:14:21 MST --- Hello. I have some good news. I downloaded/compiled/installed the newest kernel from kernel.org (2.6.28.2) and I'm running on it for some week now without any problem. In fact, there were lots of changes concerning softirqd in the 2.6.28 release. Maybe an upgrade of the current opensuse kernel (2.6.27.7-9.1) in repositories would fix this for anybody else. Regards. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User estellnb@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c6 --- Comment #6 from Elmar Stellnberger ATK <estellnb@gmail.com> 2009-03-13 05:33:54 MST --- Created an attachment (id=279389) --> (https://bugzilla.novell.com/attachment.cgi?id=279389) proc.interrupts -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User estellnb@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c7 --- Comment #7 from Elmar Stellnberger ATK <estellnb@gmail.com> 2009-03-13 05:35:56 MST --- Created an attachment (id=279390) --> (https://bugzilla.novell.com/attachment.cgi?id=279390) var.log.messages -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User szotsaki@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c8 --- Comment #8 from Ákos Szőts <szotsaki@gmail.com> 2009-03-13 10:08:12 MST --- Created an attachment (id=279486) --> (https://bugzilla.novell.com/attachment.cgi?id=279486) /var/log/messages file I also suffer from this bug. uname -ir: 2.6.27.19-3.2-default x86_64 /proc/interrupts: CPU0 CPU1 0: 71832 72828 IO-APIC-edge timer 1: 5 7 IO-APIC-edge i8042 8: 1 0 IO-APIC-edge rtc0 9: 0 1 IO-APIC-fasteoi acpi 12: 72 64 IO-APIC-edge i8042 14: 1690 1616 IO-APIC-edge ata_piix 15: 0 0 IO-APIC-edge ata_piix 16: 581 168 IO-APIC-fasteoi nvidia 17: 26663 10586 IO-APIC-fasteoi ata_piix, eth0, b43 18: 0 0 IO-APIC-fasteoi mmc0 19: 1 1 IO-APIC-fasteoi ohci1394 20: 3236 2221 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4, ehci_hcd:usb7 21: 8821 2932 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb5, HDA Intel 22: 0 0 IO-APIC-fasteoi ehci_hcd:usb3, uhci_hcd:usb6 NMI: 0 0 Non-maskable interrupts LOC: 53608 57620 Local timer interrupts RES: 13893 16887 Rescheduling interrupts CAL: 1241 296 function call interrupts TLB: 216 229 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 0 lspci: 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c) 00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c) 00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02) 00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02) 00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02) 00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2) 00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation GeForce 8400M GS (rev a1) 03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02) 03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05) 03:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22) 03:01.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12) 03:01.3 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12) 0c:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01) I attached the /var/log/messages file. I have a dual core Intel Core2 CPU, and one of the cores was totally used by ksoftirqd and the network died after some time (`ping` also had an error with some sort of sendmessage buffer). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User szotsaki@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c9 Ákos Szőts <szotsaki@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |szotsaki@gmail.com --- Comment #9 from Ákos Szőts <szotsaki@gmail.com> 2009-03-13 10:28:21 MST --- I can reproduce this bug with Eclipse and Aptana (and probably with Last.fm). I have an Eclipse installed with Aptana and the latter wants to download some MBs of update. At 21% it stops as Last.fm also does. Closing Eclipse and Last.fm the CPU usage caused by ksoftirqd/1 decreases to 0-1%. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User vmicho@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c10 --- Comment #10 from Michal Veselenyi <vmicho@gmail.com> 2009-03-14 08:09:23 MST --- Hi. Did you try to update to the newest kernel (2.6.28.x)? It solved this for me (at least I didn't have any problems so far). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User estellnb@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c11 Elmar Stellnberger ATK <estellnb@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |estellnb@gmail.com --- Comment #11 from Elmar Stellnberger ATK <estellnb@gmail.com> 2009-03-15 10:55:44 MST --- In deed kernel 2.6.28-next-20090107-20090107.18-default from http://download.opensuse.org/repositories/Kernel:/linux-next/openSUSE_11.1/ seems to resolve the issue. If I start the self compiled partgui(/usr/sbin/piguicqt) then it will simply return an error with the new kernel while it has always caused a light variant of the 100%-cpu-ksoftirqd bug with the old kernel fortunately not triggering any disk access (which makes things much worse). However linux-next is not an option for me since it does not awake from s2ram at me as pm-suspend.log revealed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User estellnb@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c12 --- Comment #12 from Elmar Stellnberger ATK <estellnb@gmail.com> 2009-03-15 11:03:46 MST --- Created an attachment (id=279654) --> (https://bugzilla.novell.com/attachment.cgi?id=279654) erroneous partgui that kann trigger the ksoftirqd bug Here I have uploaded an erroneously self-compiled version of partgui that can trigger a light version of the ksoftirqd bug featuring 100% cpu load but no disk access. Note that the cause for the ksoftirqd overload during normal operation will be different from that kind of artificially triggered one (and more severe because of hdd-access overload). To test with it type:
make install (on a 64bit machine) /usr/sbin/piguicqt (as root)
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.

https://bugzilla.novell.com/show_bug.cgi?id=465039 User estellnb@gmail.com added comment https://bugzilla.novell.com/show_bug.cgi?id=465039#c13 --- Comment #13 from Elmar Stellnberger ATK <estellnb@gmail.com> 2009-03-15 11:14:37 MST --- Created an attachment (id=279655) --> (https://bugzilla.novell.com/attachment.cgi?id=279655) proc.interrupts for partgui triggered overhang -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com