[Bug 335505] New: System freeze and crash because of pata_pdc202xx_old problems
https://bugzilla.novell.com/show_bug.cgi?id=335505 Summary: System freeze and crash because of pata_pdc202xx_old problems Product: openSUSE 10.3 Version: Final Platform: Other OS/Version: openSUSE 10.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: th.sch@gmx.net QAContact: qa@suse.de Found By: --- This driver is used on my ASUS KT133A Mainboard for the Mass storage controller: Promise Technology, Inc. PDC20265 (FastTrak100 Lite/Ultra100) (rev 02) There are several hard disks connected. Sometimes the disks become very slow or can no longer be accessed. In some cases the drives write-operations are not possible but mount shows, that the partitions are mounted rw. I tried to unmount, did a rmmod pata_pdc202xx_old && modprobe pata_pdc202xx_old and some of the connected hard drives was working again. The computer falls off within fewer days or hours. This is happening only if on of the hard disks that are connected to the Promise Mass storage controller is used for e.g. recording using mythtv or copy-operations to this drive. This error message is show in the log file: Oct 15 19:57:54 DePP2 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2Oct 15 19:57:54 DePP2 kernel: ata3.00: BMDMA stat 0x4 Oct 15 19:57:54 DePP2 kernel: ata3.00: cmd c8/00:08:ef:8b:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in Oct 15 19:57:54 DePP2 kernel: res 51/84:00:f6:8b:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)Oct 15 19:57:54 DePP2 kernel: ata3: soft resetting link Oct 15 19:57:54 DePP2 kernel: ata3.00: configured for UDMA/33 Oct 15 19:57:54 DePP2 kernel: ata3: EH completeOct 15 19:57:54 DePP2 kernel: sd 2:0:0:0: [sdc] 241254720 512-byte hardware sectors (123522 MB) Oct 15 19:57:54 DePP2 kernel: sd 2:0:0:0: [sdc] Write Protect is off Oct 15 19:57:54 DePP2 kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00Oct 15 19:57:54 DePP2 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=335505
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=335505#c1
Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505#c2 Thomas Scholz
https://bugzilla.novell.com/show_bug.cgi?id=335505#c3 --- Comment #3 from Thomas Scholz
https://bugzilla.novell.com/show_bug.cgi?id=335505#c4
Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505#c5
--- Comment #5 from Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505#c6
Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c7
--- Comment #7 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c8 --- Comment #8 from Thomas Scholz
Also, are the errors localized to one drive?
Ok. I could reproduce the freeze (in a few sec.) by doing a yes > test.out on my backup hard disk. So I removed it and got no further system crash up to now. I did some log file analyses in the past days and discovered that the last 3-4 system crashes was caused by the same (the removed) hard drive. Maybe there are defect blocks, I can't check the device using the ibm/hitachi drive fitness test because it is not able to start the tests on it. For other devices it works without problems and no errors was found, also for one of the disks that was connected to the Promise controller before I switched it to the onboard VIA controller. I used it for recording tv shows using mythtv. After a few hours of recording no write operations can be done but the file system is still mounted rw (and not full ;) ). I'm not sure if this is a hardware defect or a kernel issue, I assume that this kind of error will end up in a crash if the device is connected to the Promise controller. Because it is not a driver- or kernel task to work around potential broken hardware I'm not sure how we should proceed. Maybe we can close this bug, on the other hand I got this problems since I installed suse 10.2, using 10.3 it is much better. ATM the system is working fine without problems. At the weekend I will do some tests again with the Promise Controller. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=335505#c9
--- Comment #9 from Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505#c10
--- Comment #10 from Robert Davies
Promise controllers are known to have this type of problems under certain configurations and it's yet unknown what's the cause. Can you please reproduce the problem on kernel.org 2.6.23 kernel? Just build drivers and filesystems you need for the root fs into the kernel and don't bother with initrd.
After reproducing the problem on vanilla 2.6.23 kernel, we can bring this problem to upstream linux-ide@vger.kernel.org for further debugging.
Also, please post /var/log/boot.msg and full dmesg here (with the current kernel).
Tejun, I've kept a dual Athlon MP box with Promise controller on standby since 1st Nov, to aid with any testing effort should it help. I very much want these IDE ATA controllers to be supported well by 2.6 kernel. As the machine has been idle since 1st, apart from test of Live CD's, I can do whatever it takes on that box, and will make time to do it. So does this help? Should I update to OS 10.3 updates, and run bonnie, dbench etc? Then report back. I haven't opened up a new Bug report, because it looked like my problem was duplicate of Thomas's. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=335505#c11
--- Comment #11 from Robert Davies
(In reply to comment #5 from Tejun Heo)
defect blocks, I can't check the device using the ibm/hitachi drive fitness test because it is not able to start the tests on it. For other devices it works without problems and no errors was found, also for one of the disks that was connected to the Promise controller before I switched it to the onboard VIA
I downloaded 2 DFT's, Hitachi & Seagate to CD-ROM, also 1 to floppy disk and hooked up a working floppy drive to the box temporarily with the case open. It was the floppy version which booted freedos, or Caldera DrDOS which I remember running the tests off, on this box. The reason I did not initially use badblocks, was because I didn't have a "trusted" Linux kernel, and wanted to see if it was HW issue, and not have SW faults complicating the drive test. The DFT ran fine on the Promise Controller, it picked up only 1 badblock and I verfied on a different box. To verify I booted the Seagate DFT CD-ROM on a different box with cases open, and drives hooked up to the other boxes IDE controllers. Now I have 2x60GB SeaGates, both passed by Seagate DFT in the MSI K7, with the AMD 768 southbridge. The only reason I've not fired up the box, and conducted tests under OS 10.3, to try and reproduce the problem, is that I'm not sure if it works for Tejun, to have me joining this bug report, or whether I should be in my own (I thought I had likely reproduction of your issue so presumed it was a duplicate). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=335505#c12
--- Comment #12 from Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505#c13
--- Comment #13 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c14
--- Comment #14 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c15
--- Comment #15 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c16
--- Comment #16 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c17
--- Comment #17 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c18 Thomas Scholz
So, other disks connected to the promise controller don't cause such problems, right? Please do the followings.
1. Post the result of "smartctl -a /dev/sdX" where sdX is the offending device.
2. Swap the power and SATA connector of the offending drive and another drive on the promise controller and see whether the problem follows the drive or stays with the port.
Thanks.
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=335505#c19
--- Comment #19 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c20
--- Comment #20 from Robert Davies
https://bugzilla.novell.com/show_bug.cgi?id=335505#c21
Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=335505
User teheo@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=335505#c22
Tejun Heo
participants (1)
-
bugzilla_noreply@novell.com