[Bug 461183] New: ATA "soft resetting link" filling up logs
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c1 Summary: ATA "soft resetting link" filling up logs Product: openSUSE 11.2 Version: Alpha 0 Platform: x86 OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: kairo@kairo.at QAContact: qa@suse.de Found By: --- I updated Factory yesterday, and today, after booting this new version and running it for some time, I noticed that the harddisk made a noise as if it was writing something all the time, so I looked into /var/log/messages and found this: Dec 20 15:54:02 robert kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 20 15:54:02 robert kernel: ata9.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 20 15:54:02 robert kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Dec 20 15:54:02 robert kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) Dec 20 15:54:02 robert kernel: ata9.00: status: { DRDY ERR } Dec 20 15:54:02 robert kernel: ata9: soft resetting link Dec 20 15:54:02 robert kernel: ata9.00: configured for MWDMA2 Dec 20 15:54:02 robert kernel: ata9.01: configured for UDMA/100 Dec 20 15:54:02 robert kernel: ata9: EH complete Dec 20 15:54:02 robert kernel: sd 8:0:1:0: [sdb] Write Protect is off Dec 20 15:54:02 robert kernel: sd 8:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Dec 20 15:54:02 robert kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 20 15:54:02 robert kernel: ata9.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 20 15:54:02 robert kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Dec 20 15:54:02 robert kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) Dec 20 15:54:02 robert kernel: ata9.00: status: { DRDY ERR } Dec 20 15:54:02 robert kernel: ata9: soft resetting link Dec 20 15:54:02 robert kernel: ata9.00: configured for MWDMA2 Dec 20 15:54:02 robert kernel: ata9.01: configured for UDMA/100 Dec 20 15:54:02 robert kernel: ata9: EH complete Those ata9 messages are repeated all over, multiple times per second, sometimes some sd 8:0:1:0 [sdb] are mixed in between. This all sounds quite similar to http://markmail.org/message/deozivoi23ioovyj but this is a 2.6.27 kernel if I'm not mistaken - uname -a says: Linux robert 2.6.27.8-1-pae #1 SMP 2008-12-08 03:55:28 +0100 i686 i686 i386 GNU/Linux The rpm packages are: kernel-source-2.6.27.8-1.1 kernel-pae-2.6.27.8-1.1 kernel-pae-extra-2.6.27.8-1.1 kernel-pae-base-2.6.27.8-1.1 from the openSUSE Factory repo. The interesting parts from lspci: 00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller (rev 02) 03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) 03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) The interesting parts from lsmod: Module Size Used by fuse 52488 1 ide_pci_generic 3428 0 ide_core 99412 1 ide_pci_generic ata_generic 4484 0 pata_jmicron 2876 3 cdrom 32288 1 sr_mod sg 29376 0 sd_mod 31424 12 crc_t10dif 1704 1 sd_mod ext3 124712 1 jbd 56764 1 ext3 mbcache 8132 1 ext3 reiserfs 221756 6 ata_piix 16460 0 libata 161112 4 ata_generic,pata_jmicron,ata_piix,ahci dock 11988 1 libata sda is a SATA drive on the ICH8R, sdb is an IDE drive on the JMicron adapter, and I have a CD/DVD drive on each of those adapters as well. Is there any way I can find out which of the drives is on the ata9 it's complaining about? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c1 --- Comment #1 from Robert Kaiser <kairo@kairo.at> 2008-12-20 08:15:12 MST --- oops, missed those two lines from lsmod, I guess ahci is important to know as well: ahci 28488 10 scsi_mod 149804 4 sr_mod,sg,sd_mod,libata -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c2 --- Comment #2 from Robert Kaiser <kairo@kairo.at> 2008-12-20 09:31:09 MST --- I now switched to kernel-pae-2.6.27.4-10.12 from Kernel:/HEAD/openSUSE_Factory/ and it works fine with the same hardware, no excessive logging. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.provo.novell.com |teheo@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c3 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |kairo@kairo.at --- Comment #3 from Tejun Heo <teheo@novell.com> 2009-01-01 20:50:37 MST --- Can you please attach /var/log/boot.msg? Also, how reproducible is the problem? I don't think there's any difference which can trigger or prevent the problem between the kernels you used. Can you please try to determine how deterministic the problem is? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c4 Robert Kaiser <kairo@kairo.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|kairo@kairo.at | --- Comment #4 from Robert Kaiser <kairo@kairo.at> 2009-01-26 05:34:39 MST --- Created an attachment (id=267585) --> (https://bugzilla.novell.com/attachment.cgi?id=267585) boot.msg I just upgraded Factory again, this time to 2.6.27.13-1-pae, and the bug is still there, see this attached boot.msg file. The ata9 device is indeed a PATA one, i.e. one on the jmicron chipset (this is an ASUS P5B Deluxe mainboard) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c5 --- Comment #5 from Robert Kaiser <kairo@kairo.at> 2009-01-26 06:50:09 MST --- The 2.6.27.13-1-pae was from yesterday's Kernel:HEAD, I also tried 2.6.27.13-2 from today's Kernel:HEAD, 2.6.27.10-2.5 from current Factory repo, and even 2.6.27.7-9.1 from the 11.1 repo, all with the same result of seeing this bug. (As expected it also doesn't matter if the proprietary NVidia module is present or not, but one never knows with non-OSS modules, so I tried that as well.) The hardware should be OK, or else going back to .27.4 in comment #2 wouldn't have fixed the problem. As this appears even with the 11.1 kernel and the ASUS P5B series (all with the same SATA/PATA chips AFAIK) isn't that uncommon, I wonder that this isn't reported more often, but people seeing it might not notice it unless they actually need access to their PATA drives and wouldn't connect a constant churn on the main harddisk to this failure unless they know to look into /var/log/messages. Given the mailing list message I linked into comment #0, which has the same failure on an early 2.6.28, I suspect someone might have brought this infection to the openSUSE .27 kernel with backporting some new stuff to make the 11.1 release. Unfortunately, I can't verify that, as the Kernel:Vanilla OBS repo doesn't have any near-to-recent kernel, only a .27-rc3 version. The constant writing of this error to /var/log/messages a few times a second makes the whole system slow to a crawl, I could only stop this by turning off syslog for now, as unfortunately I also don't have the .27.4 kernel around any more that worked without problems until I upgraded Factory again yesterday. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c6 --- Comment #6 from Robert Kaiser <kairo@kairo.at> 2009-02-02 10:47:40 MST --- When I disconnected the DVD/CDRW combo drive from the PATA cable, the problem stopped. I'm talking about this drive (lines from the attached boot.msg): ata9.00: ATAPI: TOSHIBA DVD-ROM SD-R1002, 1030, max MWDMA2 ata9.00: configured for MWDMA2 scsi 8:0:0:0: CD-ROM TOSHIBA DVD-ROM SD-R1002 1030 PQ: 0 ANSI: 5 Were there any changes in the PATA CD/DVD driver code between 2.6.27.4 and 2.6.27.7? If the bug is there, we might have a problem with such drives in general - and remember, this also happens with the openSUSE 11.1 kernel. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c7 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #7 from Tejun Heo <teheo@novell.com> 2009-02-03 20:48:04 MST --- Thanks for testing. That part of code (SFF HSM) has been quite stable lately and I don't have any related bug report on openSUSE or upstream either, so I'm a bit lost. I'll look at the diff between 2.6.27.4 and 7 and see whether I can find something. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c8 --- Comment #8 from Tejun Heo <teheo@novell.com> 2009-02-03 21:07:40 MST --- Created an attachment (id=269847) --> (https://bugzilla.novell.com/attachment.cgi?id=269847) libata-diff-2.6.27.4-2.6.27.7 Okay, here's diff between vanilla 2.6.27.4 and 2.6.27.7. There isn't any difference which can affect what you're experiencing. Also, I don't see anything which could have affected it in openSUSE patches and I didn't make much change between those two releases. Is it possible for you to try vanilla 2.6.27.4 and 2.6.27.7 and see whether the problem is reproducible there? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c9 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |kairo@kairo.at --- Comment #9 from Tejun Heo <teheo@novell.com> 2009-02-03 21:09:12 MST --- Also, can you try some patches on top of the problematic kernel? If you're not familiar with kernel building, I can build rpms for you but it usually goes faster for both sides if you can test patches. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c10 --- Comment #10 from Tejun Heo <teheo@novell.com> 2009-02-03 21:16:34 MST --- Also, can you please post boot.msg from the working configuration? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c11 --- Comment #11 from Robert Kaiser <kairo@kairo.at> 2009-02-04 06:50:45 MST --- I don't think I have the time to go into detailed kernel config and recompile stuff, I'm quite busy leading an OSS project (SeaMonkey) myself. I'd rather have rpms I can test, as this is my production machine (yes, I'm running Factory in a production environment) and I don't want it to be "down" (i.e. rebooting etc.) for too long. If there are rpms of vanilla 2.6.27.x I can install, it might be a good idea to try, as I somehow still suspect it's something openSUSE backported from the .28 cycle, esp. as the thread I referred to in comment #0 (which you apparently have looked at as well) looks quite similar from the error messages, and it's also a CD-RW/DVD-ROM combo drive there, just like in my case, only it's on SATA there and I have it on PATA, but with both handled by libata nowadays I guess that could still lead to the same code. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c12 --- Comment #12 from Tejun Heo <teheo@novell.com> 2009-02-11 23:51:37 MST --- Sorry about the delay. I don't think the report linked from comment#0 is the same problem. Can you please try the following kernel and report the kernel log? http://htj.dyndns.org/export/testing/sl111-x86_64-bug461183_dbg0/ Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c13 --- Comment #13 from Robert Kaiser <kairo@kairo.at> 2009-02-16 11:31:00 MST --- Hrm, now that I wanted to download the rpms I realized they are x86_64 and I'm still running my system with 32bit stuff... Can I use the 64bit kernel on a otherwise 32bit system (as in software, the hardware is 64bit-capable)? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c14 --- Comment #14 from Tejun Heo <teheo@novell.com> 2009-02-18 23:22:46 MST --- Eh.. I'll build a i386 kernel for you. Please wait a bit. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c15 --- Comment #15 from Tejun Heo <teheo@novell.com> 2009-02-19 20:22:12 MST --- Sorry about the delay. My remote building script was having a lot of problems with internal kernel source repo changes. Here is the i586 kernel. http://htj.dyndns.org/export/testing/sl111-i586-bug461183_dbg0/ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User kairo@kairo.at added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c16 Robert Kaiser <kairo@kairo.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|kairo@kairo.at | --- Comment #16 from Robert Kaiser <kairo@kairo.at> 2009-03-01 12:27:38 MST --- Created an attachment (id=276234) --> (https://bugzilla.novell.com/attachment.cgi?id=276234) boot.msg from sl111-i586-bug461183_dbg0 Here is the boot.msg from the sl111-i586-bug461183_dbg0 kernel, I hope it helps. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=461183 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=461183#c17 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |kairo@kairo.at --- Comment #17 from Tejun Heo <teheo@novell.com> 2009-03-02 01:21:45 MST --- Hmm... looks like your device has stuck ERR bit. I'm not sure why the problem didn't trigger before but the difference could be caused by preferring hardreset over softreset, but behaviors like this can also depend on strange things like media is presence in the drive or what type of media. Can you please post the output of "hdparm -I /dev/sr0" so that the drive can be quirked? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com