[Bug 462368] New: ata lockups on asus m2v mobo (via vt8237a)
https://bugzilla.novell.com/show_bug.cgi?id=462368 Summary: ata lockups on asus m2v mobo (via vt8237a) Product: openSUSE 11.1 Version: Final Platform: i686 OS/Version: openSUSE 11.1 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: desdiv@gmail.com QAContact: qa@suse.de CC: desdiv@gmail.com Found By: --- I have an Asus M2V motherboard with Via VT8237A southbridge chipset and the following disk devices connected to it: 1 pata hdd (/dev/sda), 1 pata dvd-rw (/dev/sr0), 1 sata hdd (/dev/sdb). The problem is that whenever I'm reading or writing data on the sata hdd, sooner or later cpu usage goes up to 100%, the sata disk becomes unavailable and the below entries are logged in /var/log/messages: Dec 24 11:56:26 macigep kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 24 11:56:26 macigep kernel: ata3.00: cmd 25/00:08:7d:09:be/00:00:38:00:00/e0 tag 0 dma 4096 in Dec 24 11:56:26 macigep kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 24 11:56:26 macigep kernel: ata3.00: status: { DRDY } Dec 24 11:56:26 macigep kernel: ata3: soft resetting link Dec 24 11:56:31 macigep kernel: ata3.00: qc timeout (cmd 0x27) Dec 24 11:56:31 macigep kernel: ata3.00: failed to read native max address (err_mask=0x4) Dec 24 11:56:31 macigep kernel: ata3.00: revalidation failed (errno=-5) Dec 24 11:56:31 macigep kernel: ata3: soft resetting link Dec 24 11:56:42 macigep kernel: ata3.00: qc timeout (cmd 0x27) Dec 24 11:56:42 macigep kernel: ata3.00: failed to read native max address (err_mask=0x4) Dec 24 11:56:42 macigep kernel: ata3.00: revalidation failed (errno=-5) Dec 24 11:56:42 macigep kernel: ata3: soft resetting link Dec 24 11:56:52 macigep kernel: ata3.00: qc timeout (cmd 0x27) Dec 24 11:56:52 macigep kernel: ata3.00: failed to read native max address (err_mask=0x4) Dec 24 11:56:52 macigep kernel: ata3.00: revalidation failed (errno=-5) Dec 24 11:56:52 macigep kernel: ata3.00: disabled Dec 24 11:56:52 macigep kernel: ata3: soft resetting link Dec 24 11:56:52 macigep kernel: ata3: EH complete Dec 24 11:56:52 macigep kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Dec 24 11:56:52 macigep kernel: end_request: I/O error, dev sdb, sector 951978365 Dec 24 11:56:52 macigep kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Dec 24 11:56:52 macigep kernel: end_request: I/O error, dev sdb, sector 951978365 Dec 24 11:56:52 macigep kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Dec 24 11:56:52 macigep kernel: end_request: I/O error, dev sdb, sector 951978357 Dec 24 11:56:52 macigep kernel: Buffer I/O error on device sdb6, logical block 102357963 Dec 24 11:56:52 macigep kernel: lost page write due to I/O error on sdb6 Dec 24 11:56:52 macigep kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Dec 24 11:56:52 macigep kernel: end_request: I/O error, dev sdb, sector 949141797 This error is reproducable, it always happens when I'm accessing the sata disk, only the time when it happens is different i.e. sometimes it happens after 15 minutes of using the disk, other times it happens after 1 minute. I've already lost some data due to this bug. Sometimes it also happens with the dvd drive. BIOS is the latest version, hardware profile is http://www.smolts.org/show?uuid=pub_1b7a86ad-460c-424a-92b6-69b4d0abfeb7 The only working workaround is booting with "noapic" kernel parameter, then the system is stable and I can use the sata disk. On different forums I read other users also have this issue with M2V motherboard. Ps: when I tried opensuse 11.1 beta2, it addresses my disks in the "old way", i.e. pata disks were /dev/hda and /dev/hdb, only the sata disk was /dev/sda. I did not epxerienced this bug in that version, although I didn't play much with beta2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=462368
User aorlovskyy@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c1
Alexander Orlovskyy
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c2
Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
Robert Vojcik
https://bugzilla.novell.com/show_bug.cgi?id=462368
Greg Kroah-Hartman
https://bugzilla.novell.com/show_bug.cgi?id=462368
User teheo@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c3
Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=462368
User teheo@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c4
--- Comment #4 from Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=462368
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c5
Thomas Renninger
Any ideas on how to proceed on this one? Let's collect some data first, maybe we see something. Simi: Please attach (best in plain/text, not zipped): cat /proc/interrupts acpidump lspci -vv -nn output. It would also be great to have a full dmesg output after the disk goes wild. Is the machine still responsive then? Maybe you can try to access the disk to trigger this, then copy dmesg to memory: dmesg > /dev/shm/dmesg and copy it via scp through network to another machine or directly on an USB stick. Hmm, doing the same after the disk went wild with: cat /proc/interrupts >/dev/shm/interrupts_after_disk_crash could also show something interesting. Hope that works out.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c6
--- Comment #6 from Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c7
--- Comment #7 from Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c8
--- Comment #8 from Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c9
--- Comment #9 from Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c10
Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c11
--- Comment #11 from Simi Co
https://bugzilla.novell.com/show_bug.cgi?id=462368
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c12
--- Comment #12 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=462368
User teheo@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c13
--- Comment #13 from Tejun Heo
This looks suspicious: sata_via 0000:00:0f.0: routed to hard irq line 10 Do you know that Tejun?
From ATA side, that's just the driver enabling the PCI device. Have no idea why it's hard...
Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=462368
User desdiv@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c14
--- Comment #14 from Simi Co
Three hpet timers? (nohpet boot param would disable them -> could be worth a try):
Tried nohpet -> didn't help, error still occured.
Some more "digging in the dark" ideas blacklist uhci and ehci: /etc/modprobe.conf.local: blacklist ehci_hcd blacklist uhci_hcd then invoke mkinitrd and try a reboot.
Erm, I don't think disabling usb support is good idea. How am I supposed to use the system without working (usb) keyboard and mouse?
Have you already googled for your VIA chipset, maybe you find some additional info somewhere?
All I could find were similar bug reports, no solution. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=462368
User trenn@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=462368#c15
--- Comment #15 from Thomas Renninger
How am I supposed to use the system without working (usb) keyboard and mouse? It's just to find out whether it's the usb device which causes the trouble. As there are no obvious warnings/errors pointing into a specific direction, all we can do is to try to pin it down to a specific device causing the trouble for now. As soon as we know more we can look at details.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com