[Bug 384661] New: harddisk driver hangs after several suspend to RAM cycles
https://bugzilla.novell.com/show_bug.cgi?id=384661 Summary: harddisk driver hangs after several suspend to RAM cycles Product: openSUSE 11.0 Version: Beta 1 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: teheo@novell.com ReportedBy: seife@novell.com QAContact: kernel-maintainers@forge.provo.novell.com CC: pavel@novell.com, fseidel@novell.com Found By: Development While doing a s2ram cycle test on an old compaq armada E500 (which was very reliable wrt. s2ram in the past), i found, that after ~10-20 cycles, the machine will hang "soft" (mouse in X11 still moving, but no processes can be spawned). I investigated and found this on console 10: klogd: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xa frozen klogs: ata1: ACPI event And that's it, machine dead. With sysrq-"show blocked tasks", i found that most of the tasks were somewher in "journal_*", "log_wait_commit", ..., all in [jbd] Will attach lspci after reboot. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c1 --- Comment #1 from Stefan Seyfried <seife@novell.com> 2008-04-29 03:09:27 MST --- linux:~ # lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03) 00:04.0 CardBus bridge: Texas Instruments PCI1225 (rev 01) 00:04.1 CardBus bridge: Texas Instruments PCI1225 (rev 01) 00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:08.0 Multimedia audio controller: ESS Technology ES1978 Maestro 2E (rev 10) 00:09.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 09) 00:09.1 Serial controller: Agere Systems LT WinModem 01:00.0 VGA compatible controller: ATI Technologies Inc Rage Mobility P/M AGP 2x (rev 64) linux:~ # hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: IC25N020ATCS04-0 Serial Number: CSH206D9GGLJ1B Firmware Revision: CA2OA71A Standards: Used: ATA/ATAPI-5 T13 1321D revision 3 Supported: 5 4 3 & some of 6 Configuration: Logical max current cylinders 16383 17475 heads 16 15 sectors/track 63 63 -- CHS current addressable sectors: 16513875 LBA user addressable sectors: 39070080 device size with M = 1024*1024: 19077 MBytes device size with M = 1000*1000: 20003 MBytes (20 GB) Capabilities: LBA, IORDY(can be disabled) bytes avail on r/w long: 4 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: 128 DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=240ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * Advanced Power Management feature set Power-Up In Standby feature set Address Offset Reserved Area Boot SET_MAX security extension * Device Configuration Overlay feature set * SMART error logging * SMART self-test Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count not supported: enhanced erase 22min for SECURITY ERASE UNIT. HW reset results: CBLID- above Vih Device num = 0 determined by the jumper Checksum: correct -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c2 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |seife@novell.com --- Comment #2 from Tejun Heo <teheo@novell.com> 2008-04-29 03:11:52 MST --- Does libata.noacpi=1 work around the problem? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c3 --- Comment #3 from Tejun Heo <teheo@novell.com> 2008-04-29 03:17:53 MST --- Also, when it soft-locks like that, does waiting for several minutes make any difference? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c4 --- Comment #4 from Stefan Seyfried <seife@novell.com> 2008-04-29 04:08:55 MST --- No, it was hanging for > 1 hour when i found it ;-) It works well with piix - now 39 cycles. I'll reboot and retry with libata.noacpi=1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c5 --- Comment #5 from Stefan Seyfried <seife@novell.com> 2008-04-29 05:55:12 MST --- It still hung with libata.noacpi=1, without those ata1:-messages, but it might be a different issue, namely a variation of bug 382516, so please hold on, i'm investigating. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c6 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rjwysocki@sisk.pl AssignedTo|teheo@novell.com |pavel@novell.com Status|NEEDINFO |NEW Info Provider|seife@novell.com | --- Comment #6 from Stefan Seyfried <seife@novell.com> 2008-04-29 06:28:52 MST --- It is apparently not the disk driver hanging, sorry for the false assumption :-) A sysrq-S makes the disk write something, clearly audible. Pavel, it is something else, on a single-processor machine, that always worked well (old enough that it even worked back when APM was still the method of choice ;) Unfortunately i have no idea what the problem is. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c7 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|harddisk driver hangs after several suspend to |non-smp machine hangs after several suspend to |RAM cycles |RAM cycles --- Comment #7 from Stefan Seyfried <seife@novell.com> 2008-04-29 06:38:43 MST --- This time, i can still spawn processes... But the time is also standing still on that machine. I did already try nohz=off and this did also fail (but maybe a different failure?). It looks timer-related, since a "sleep 1" only returns after i press a key. Timer interrupts are still increasing. Keyboard does no longer repeat. On the console, cursor does not blink. No process in state D. No idea how to continue ;-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User rjwysocki@sisk.pl added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c8 --- Comment #8 from Rafael Wysocki <rjwysocki@sisk.pl> 2008-05-01 16:25:07 MST --- Well, I'd try without NOHZ, HIGH_RES_TIMERS, etc. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User pavel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c9 Pavel Machek <pavel@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |seife@novell.com --- Comment #9 from Pavel Machek <pavel@novell.com> 2008-05-06 02:45:19 MST --- So it still hangs, piix or libata? with nohz/high_res_timers off? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c10 --- Comment #10 from Stefan Seyfried <seife@novell.com> 2008-05-06 04:35:22 MST --- I'll try next weekend. The machine is at home - but i'm in the office again this week :-) I did try both piix and libata, so i think the disk driver is not to blame (it also still seems to work fine once the machine soft-locks) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User pavel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c11 --- Comment #11 from Pavel Machek <pavel@novell.com> 2008-05-06 06:32:24 MST --- I tried to run refrigerator test (echo testproc > disk, then while true; do echo disk > /sys/power/state ; sleep 1; done ) overnight, and did not see bad effects.. this is 2.6.26-rc1. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 Greg Kroah-Hartman <gregkh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- QAContact|kernel-maintainers@forge.provo.novell.com |qa@suse.de -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c12 --- Comment #12 from Stefan Seyfried <seife@novell.com> 2008-05-20 17:19:09 MST --- i did not forget this one, i even brought the machine into the office, i just have not managed to start testing it. sorry :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c13 --- Comment #13 from Stefan Seyfried <seife@novell.com> 2008-06-05 12:11:25 MDT --- test and testproc does not seem to work with the 11.0 kernel, the machine just suspends and then says "please power down manually". This is with the "-pae"-kernel. Will just run the s2ram test. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c14 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|seife@novell.com | Resolution| |WORKSFORME --- Comment #14 from Stefan Seyfried <seife@novell.com> 2008-06-06 03:02:23 MDT --- So right now it suspended and resumed 376 times until i aborted. This is with the latest 11.0 kernel. Either it was fixed in the mean time or it was some kind of heisenbug. I will restart the loop with the default kernel (non-pae), but i guess this can be closed as WORSKFORME for now... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c15 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WORKSFORME | --- Comment #15 from Stefan Seyfried <seife@novell.com> 2008-06-06 11:00:59 MDT --- No, with kernel-default (non-PAE), it was reproduced with only 2 cycles :-( Will need to investigate further. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 Pavel Machek <pavel@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User pavel@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c16 --- Comment #16 from Pavel Machek <pavel@novell.com> 2008-06-17 03:48:36 MDT --- It is strange that PAE/non-PAE has effect like this. Any news? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 Pavel Machek <pavel@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |seife@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P4 - Low -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c17 --- Comment #17 from Stefan Seyfried <seife@novell.com> 2008-09-19 09:00:01 MDT --- I will retry on 11.1 kernels "soon"... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=384661 User seife@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=384661#c18 Stefan Seyfried <seife@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|seife@novell.com | Resolution| |WONTFIX --- Comment #18 from Stefan Seyfried <seife@novell.com> 2008-10-15 09:39:01 MDT --- Actually, I will not get to this soon. I will do it sooner or later, but there is no use in having this bug open all the time. Once I retried it with a newer kernel and can still reproduce it, I will either reopen this bug or file a new one. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com