[Bug 757426] New: ATA errors corrupt other I/O
https://bugzilla.novell.com/show_bug.cgi?id=757426 https://bugzilla.novell.com/show_bug.cgi?id=757426#c0 Summary: ATA errors corrupt other I/O Classification: openSUSE Product: openSUSE 12.2 Version: Factory Platform: x86 OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: mrmazda@earthlink.net QAContact: qa-bugs@suse.de CC: jeffm@suse.com, jlee@suse.com Found By: --- Blocker: --- Created an attachment (id=486344) --> (http://bugzilla.novell.com/attachment.cgi?id=486344) y2logs I have an external SATA 1.5 (not eSATA hot-pluggable to any of my systems) HD that needs to be debugged, but the ATA errors it creates are resulting in filesystem(s) (including at least /) apparently becoming readonly, and bash unable to execute cached or locate uncached commands to run, including dmesg, reboot, shutdown, umount and others. CAD results in 'INIT: cannot execute "/sbin/shutdown"'. Trying to login on another tty becomes delayed or impossible. Recovery requires reset button or power switch, resulting in recent bash history being lost, and subsequent fscks on restart. Also if the restart is made without powering down the external HD, fetching the logs, lsscsi, lspci & other info may not be possible prior to the I/O corruption beginning, which usually is delayed by not mounting any partition on the external HD. This device poses the same problem used with several different puters, and 11.4, 12.1 & 12.2 kernels. Attachments here are made using 12.2M3 on P4/ICH6. System includes a SiL 3512 eSATA PCI card not used for collecting info for this bug but which can be used as an alternative to ICH6 for more data collection if necessary. http://www.newegg.com/Product/Product.aspx?Item=N82E16817173042 is the problem device, but the one I'm using now is actually the third replacement of the original purchased last September. They all presented this type of trouble. Up to now I assumed replacements would have worked better than the previous examples. I still believe the product is the root problem, as I have used without problems 5 of its predecessors that outwardly differ none except for the V2 appended to the what the original model number was. Always during the first boot with v2 device attached to the SiL 3512 port, the device ID line printed by the SiL adapter BIOS is gibberish. On warm reboots, it shows the actual HD ID info, just like all my older units. This problem may possibly be avoided by POSTing a second time immediately after the first before proceeding past the Grub menu. Since the external device's partition had been mounted, it was left dirty and needed fsck. Unthinking, I started fsck.ext2, but it did not produce any ata errors in /var/log/messages even after fsck completed. In immediate preparation to file this bug using host gx280, I/O corruption the first time was so severe and immediate that no ata errors made it into /var/log/messages before / went to readonly. Since then failure to be written into messages has been happening repeatedly, so I may need help figuring out how to capture it while it actually contains ata errors using current 3.3 kernel and device attached to ICH6 port. I've been renaming messages as ....01, ...02, ...03 so as to limit its size to containing more of what actually pertains here. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c1
--- Comment #1 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c2
--- Comment #2 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c3
--- Comment #3 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c4
kk zhang
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c
Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c5
--- Comment #5 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c6
Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c7
--- Comment #7 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c8
--- Comment #8 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c9
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c10
Felix Miata
For capture log, you can try to boot with USB flash if you have it.
What is this about? This system also has Cooker, 11.4 and 12.1 installed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c11
--- Comment #11 from Joey Lee
Created an attachment (id=486403) --> (http://bugzilla.novell.com/attachment.cgi?id=486403) [details] fresh /var/log/boot.msg with problem device on ICH6 SATA port 2/2
Are these the kernels referred to in comment 6 (built more than 48 hours ago)? http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
You can download kernel-default rpms here: https://build.opensuse.org/package/binaries?package=kernel-default&project=Kernel%3AHEAD&repository=standard
The comment 6 URL has too much info for me to find an actual download link.
Should I enable the repo, or just fetch a kernel and install it?
Yes, for example 64bits just download: kernel-default-3.4.rc2-2.1.x86_64.rpm kernel-default-base-3.4.rc2-2.1.x86_64.rpm Then, run 'rpm -ivh' to install them. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c12
Joey Lee
(In reply to comment #8)
For capture log, you can try to boot with USB flash if you have it.
What is this about? This system also has Cooker, 11.4 and 12.1 installed.
I know, but those system all install in the SATA device. Per your bug description, it maybe hard to capture log when issue happen, because the hard drive frozen. So, that's another way to capture _complete_ log is boot to Linux with a Live USB. You don't need install OS to hard drive, just need boot is through USB then capture messages log, of course you need reproduce the SATA issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c13
--- Comment #13 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c14
--- Comment #14 from Felix Miata
Per your bug description, it maybe hard to capture log when issue happen, because the hard drive frozen.
OS are installed to sda. Only data is on sdb which has empty MBR code block. But, the ATA bus gets frozen, not the actual HD with /. So USB boot helps only because maybe USB bus is not frozen too? I have many CD-RW & DVD-RW available and would have to goto store and buy USB stick for live image. Maybe PATA bus for OM drive does not get frozen? I will try a live M3 KDE CD-RW and see. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c15
--- Comment #15 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c16
--- Comment #16 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c17
Felix Miata
libata.force=5.0:pio4 or libata.force=5.0:noncq
Looking at kernel-parameters.txt and lsscsi -v output: [4:0:0:0] disk ATA WDC WD800JD-75MS 10.0 /dev/sda dir: /sys/bus/scsi/devices/4:0:0:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0] [4:0:1:0] disk ATA ST32000542AS CC34 /dev/sdb dir: /sys/bus/scsi/devices/4:0:1:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:1/4:0:1:0] your comment 5 is confusing me. I don't understand why not 4:noncq, 4:pio4, 4.1:noncq or 4.1:pio4 instead of 5.0 for libata.force=. Putting any libata.force= seems to be avoiding the bus problems, unless what's at play is a corollary of comment 0 paragraph 3. I've been leaving the (e)SATA device power on through many reboots in part because of the sparsity of errors since initializing this bug. Maybe the root problem is a need for the device to warm up more than briefly before it's completely ready to use. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c18
Joey Lee
Created an attachment (id=486452) --> (http://bugzilla.novell.com/attachment.cgi?id=486452) [details] /var/log/messages and /var/log/boot.kiwi from USB boot with 2 (e)sata devices
I connected an older and friendlier Rosewill (e)SATA device to the SiL 3512 port this time, leaving the problem Rosewill on ICH6 SATA 2/2. Errors did not occur from lspci -v, lsscsi -v, or hwinfo --scsi, but afterward on mounting the problem device's partition as noted on tty10. I was able to maintain working ttys and shutdown normally this time.
Thanks for your log, please kindly also attach on the complete /var/log/messages or /var/log/boot.msg when issue reproduced (have WRITE/READ DMA error). I want to compare the ata initial log when system boot. Thanks -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c19
--- Comment #19 from Joey Lee
(In reply to comment #13)
libata.force=5.0:pio4 or libata.force=5.0:noncq
Looking at kernel-parameters.txt and lsscsi -v output: [4:0:0:0] disk ATA WDC WD800JD-75MS 10.0 /dev/sda dir: /sys/bus/scsi/devices/4:0:0:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0] [4:0:1:0] disk ATA ST32000542AS CC34 /dev/sdb dir: /sys/bus/scsi/devices/4:0:1:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:1/4:0:1:0]
your comment 5 is confusing me. I don't understand why not 4:noncq, 4:pio4, 4.1:noncq or 4.1:pio4 instead of 5.0 for libata.force=.
Yes, sometimes we confuse about ATA port number with SCSI H:C:T:L. Per kernel parameter document: libata.force= [LIBATA] Force configurations. The format is comma separated list of "[ID:]VAL" where ID is PORT[.DEVICE]. PORT and DEVICE are decimal numbers matching port, link or device. Basically, it matches the ATA ID string printed on console by libata. If the whole ID part is omitted, the last PORT and DEVICE values are used. If ID hasn't been specified yet, the configuration applies to all ports, links and devices. That means the PORT.DEVICE number should grub from libata log like: <6>[ 12.484261] ata5.00: ATA-7: WDC WD800JD-75MSA3, 10.01E04, max UDMA/133 ^^^^^^^^ The '5' is port number, first '0' is pmp number, second '0' is device number. So, we should use 'libata.force=5.0:pio4', but not use the SCSI H:C:T:L number. On my machine, it works to me on v3.1 or v3.4-rc kernel for set the ata1.00 to PIO4 mode. I add 'libata.force=1.0:pio4' to /boot/grub/menu.lst, then it show up message: [ 2.802848] ata1.00: configured for PIO4 Please kindly attach you /boot/grub/menu.lst for check your parameter set by right way.
Putting any libata.force= seems to be avoiding the bus problems, unless what's at play is a corollary of comment 0 paragraph 3. I've been leaving the (e)SATA
The frozen was happen on the ATA port 5: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen <=== here If the hard drive of OS system shard the same ATA port with external disk (I think ata5.01?), then will have problem to access the parameter on system disk when port frozen. I am checking the source code to find out why got the port frozen problem.
device power on through many reboots in part because of the sparsity of errors since initializing this bug. Maybe the root problem is a need for the device to warm up more than briefly before it's completely ready to use.
Do you mean power up the eSATA box before power on computer? I have a Philips external eSATA box, and I plugged it to a eSATA port on HP notebook, tried warm-boot/cold-boot a couple of times, but didn't reproduce issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c20
--- Comment #20 from Joey Lee
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen <=== here
If the hard drive of OS system shard the same ATA port with external disk (I ^^^^^^^ shared think ata5.01?), then will have problem to access the parameter on system disk ^^^^^^^^^^ partitions
Sorry for my typo! :p -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c21
--- Comment #21 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c22
--- Comment #22 from Felix Miata
I just aware the AHCI mode didn't enable on your machine, did you try to enable AHCI mode before?
Host gx280 is a Dell Optiplex GX280, service tag 20HRT71. I've never found the string "AHCI" in any Dell BIOS setup utility. This one has a choice for SATA operation between "Normal" and "Combination". Whether "Normal" is supposed to provide AHCI I have no idea. (In reply to comment #19)
device power on through many reboots in part because of the sparsity of errors since initializing this bug. Maybe the root problem is a need for the device to warm up more than briefly before it's completely ready to use.
Do you mean power up the eSATA box before power on computer?
Of course. I have no systems that fully support eSATA (nothing hot-pluggable), so to make sure I cause no hardware damage, I make a point to only turn on external device power while the system it is connected to is powered off.
I have a Philips external eSATA box, and I plugged it to a eSATA port on HP notebook, tried warm-boot/cold-boot a couple of times, but didn't reproduce issue.
I have multiple eSATA devices. None except the newest gives any trouble. I suspect the v2 model to have cost-cutting (defective or missing) features and/or firmware over v1 models that cause this. Last boot used libata.force=5.0:noncq. Before trying to get it to error, I issued 'shutdown -h 00:59' to give me time for make it error but allow it to possibly shut down properly after errors occurred. I got it to error briefly on tty10 before ro on / occurred, so there are a few errors in /var/log/messages on 12.2M3 / filesystem now. It appeared to shutdown properly, but did not power off. I'm going to boot from live CD now and will comment more when I'm ready to attach useful logs. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c23
Joey Lee
(In reply to comment #21)
I just aware the AHCI mode didn't enable on your machine, did you try to enable AHCI mode before?
Host gx280 is a Dell Optiplex GX280, service tag 20HRT71. I've never found the string "AHCI" in any Dell BIOS setup utility. This one has a choice for SATA operation between "Normal" and "Combination". Whether "Normal" is supposed to provide AHCI I have no idea.
hmm....... I just google 'Dell Optiplex GX280' this model, the 'Combination' means SATA/PATA combination mode, the 'Normal' is only SATA mode. So, the BIOS of this machine doesn't support AHCI, at least no BIOS option can enable it.
(In reply to comment #19)
device power on through many reboots in part because of the sparsity of errors since initializing this bug. Maybe the root problem is a need for the device to warm up more than briefly before it's completely ready to use.
Do you mean power up the eSATA box before power on computer?
Of course. I have no systems that fully support eSATA (nothing hot-pluggable), so to make sure I cause no hardware damage, I make a point to only turn on external device power while the system it is connected to is powered off.
I have a Philips external eSATA box, and I plugged it to a eSATA port on HP notebook, tried warm-boot/cold-boot a couple of times, but didn't reproduce issue.
I have multiple eSATA devices. None except the newest gives any trouble. I suspect the v2 model to have cost-cutting (defective or missing) features and/or firmware over v1 models that cause this.
Last boot used libata.force=5.0:noncq. Before trying to get it to error, I issued 'shutdown -h 00:59' to give me time for make it error but allow it to possibly shut down properly after errors occurred. I got it to error briefly on tty10 before ro on / occurred, so there are a few errors in /var/log/messages on 12.2M3 / filesystem now. It appeared to shutdown properly, but did not power off. I'm going to boot from live CD now and will comment more when I'm ready to attach useful logs.
The NCQ feature only works on AHCI mode, but it still works to me when I switch to IDE mode on my HP notebook. I didn't see 'libata: Unknown parameter' message. The 'libata.force=5.0:pio4' kernel parameter still worth a try, switch it to PIO mode avoid DMA problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c24
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c25
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c26
--- Comment #26 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c27
--- Comment #27 from Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c28
--- Comment #28 from Felix Miata
You can also try the following kernel parameter, add it to /boot/grub/menu.lst for test maybe it can workaround issue:
libata.force=5.0:pio4
On USB boot this prevents kernel from finding any ATA devices. Using it on normal boot with 3.4rc2 kernel seems to affect only the internal 4:0:0:0 SATA HD on sda as shown by these hdparm -t results: /dev/sda: Timing buffered disk reads: 20 MB in 3.02 seconds = 6.63 MB/sec /dev/sdb: Timing buffered disk reads: 168 MB in 3.00 seconds = 55.95 MB/sec Instead using libata.force=5.1:pio4 slows down only the problem sdb device, and yet when hdparm -t /dev/sdb is run enough times, ata I/O errors and ro / eventually occur. Using libata.force=5:pio4 seems to be sufficient to avoid the errors, but at an enormous I/O speed penalty. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c29
--- Comment #29 from Joey Lee
(In reply to comment #13)
You can also try the following kernel parameter, add it to /boot/grub/menu.lst for test maybe it can workaround issue:
libata.force=5.0:pio4
On USB boot this prevents kernel from finding any ATA devices. Using it on normal boot with 3.4rc2 kernel seems to affect only the internal 4:0:0:0 SATA HD on sda as shown by these hdparm -t results:
Per your log in Comment#16, the issue was reproduced when boot with live USB: Apr 17 09:45:14 linux kernel: [ 2946.016066] ata5.01: failed command: READ DMA Just the DMA error was not causes system pending.
/dev/sda: Timing buffered disk reads: 20 MB in 3.02 seconds = 6.63 MB/sec /dev/sdb: Timing buffered disk reads: 168 MB in 3.00 seconds = 55.95 MB/sec
Instead using libata.force=5.1:pio4 slows down only the problem sdb device, and yet when hdparm -t /dev/sdb is run enough times, ata I/O errors and ro / eventually occur. Using libata.force=5:pio4 seems to be sufficient to avoid the errors, but at an enormous I/O speed penalty.
Per you testing result, looks we should limit whole 5 port to pio4 but not just limit one device. You can also try: libata.force=5:mwdma4 MWDMA is a old DMA mode the transfer rate up to 16.6 Mb/sec. Or set to other UDMA mode to slow down to transfer rate: libata.force=5:udma2 #udma2(UDMA33) or libata.force=5:udma4 #udma4(UDMA66) I am tracing the ata_piix and sata_sil, but no good finding yet. If this v2 eSATA box cann't work both on piix and sli, I doubt I cann't find good solution for it. I will continue to trace this issue. Due to you can reproduce this issue on 3.4-rc2 kernel, please feel free file this bug to kernel upstream through bugzilla or mail, please kindly add Cc. to me. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c30
--- Comment #30 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c31
--- Comment #31 from Felix Miata
I am tracing the ata_piix and sata_sil
Because of above comment, I moved to using 12.2M3 on host big31 with only ICH7 disk controllers for next tests. I disconnected its internal RAID HDs and used only external HDs connected to motherboard ICH7 SATA ports. Booting without libata.force got ata errors and ro /. big31's lsscsi -v: [2:0:0:0] disk ATA HDS722580VLSA80 V32O /dev/sda dir: /sys/bus/scsi/devices/2:0:0:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata3/host2/target2:0:0/2:0:0:0] [2:0:1:0] cd/dvd TSSTcorp CDDVDW TS-H653N 0208 /dev/sr0 dir: /sys/bus/scsi/devices/2:0:1:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata3/host2/target2:0:1/2:0:1:0] [3:0:1:0] disk ATA ST340014AS 8.05 /dev/sdb dir: /sys/bus/scsi/devices/3:0:1:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata4/host3/target3:0:1/3:0:1:0] -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c32
--- Comment #32 from Felix Miata
Please kindly help to attach on 'hdparm -IW' result: # hdparm -IW /dev/sd* > hdparm-IW.log
Instead I did /dev/sda and /dev/sdb as separate commands. Doing it your way replicated the same output for all 40 partitions across 2 HDs, 80k of repeats instead of 4k. Next boot I used libata.force=4:udma4 and got bus errors on tty10 on ata4.01 trying hdparm -t. Next I tried libata.force=4.1:udma2,3.0:udma2 which produced no errors, and then I tried libata.force=4.1:udma4,3.0:udma4, and generated this response. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c33
--- Comment #33 from Joey Lee
Created an attachment (id=486885) --> (http://bugzilla.novell.com/attachment.cgi?id=486885) [details] hdparm -IW /dev/sda ; hdparm -IW /dev/sdb output
(In reply to comment #30)
Please kindly help to attach on 'hdparm -IW' result: # hdparm -IW /dev/sd* > hdparm-IW.log
Instead I did /dev/sda and /dev/sdb as separate commands. Doing it your way replicated the same output for all 40 partitions across 2 HDs, 80k of repeats instead of 4k.
Next boot I used libata.force=4:udma4 and got bus errors on tty10 on ata4.01 trying hdparm -t. Next I tried libata.force=4.1:udma2,3.0:udma2 which produced no errors, and then I tried libata.force=4.1:udma4,3.0:udma4, and generated this response.
Did you run it on ICH7 machine? If you want, please also try to turn off the cache on issue disks (depend your setup, maybe /dev/sdb or /dev/sdc): # hdparm -W0 /dev/sdb Then try to reproduce issue. The cache will auto turn on when system reboot. Please also attach on whole /var/log/messages for this ICH7 machine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c34
--- Comment #34 from Felix Miata
(In reply to comment #32)
(In reply to comment #30)
Please kindly help to attach on 'hdparm -IW' result: # hdparm -IW /dev/sd* > hdparm-IW.log
Next boot I used libata.force=4:udma4 and got bus errors on tty10 on ata4.01 trying hdparm -t. Next I tried libata.force=4.1:udma2,3.0:udma2 which produced no errors, and then I tried libata.force=4.1:udma4,3.0:udma4, and generated this response.
Did you run it on ICH7 machine?
You asked for hdparm -IW after I moved host big31 into the workspace it shares with gx280. You can see from comment 30 and 31 timestamps I was composing what was supposed to be comment 30 when you slipped in your own comment 30. So, hdparm -IW has only been run on ICH7.
Please also attach on whole /var/log/messages for this ICH7 machine.
Note it's much bigger than the others, mostly because I started zypper dup before I went to bed late. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c35
Felix Miata
If you want, please also try to turn off the cache on issue disks (depend your setup, maybe /dev/sdb or /dev/sdc):
# hdparm -W0 /dev/sdb
Then try to reproduce issue. The cache will auto turn on when system reboot.
Booted ICH7 without libata.force, hdparm -W0 /dev/sdb apparently makes repeated hdparm -t unable to produce ata errors. I think now is time to file upstream. Do you agree? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c36
Felix Miata
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c37
--- Comment #37 from Joey Lee
(In reply to comment #33)
If you want, please also try to turn off the cache on issue disks (depend your setup, maybe /dev/sdb or /dev/sdc):
# hdparm -W0 /dev/sdb
Then try to reproduce issue. The cache will auto turn on when system reboot.
Booted ICH7 without libata.force, hdparm -W0 /dev/sdb apparently makes repeated hdparm -t unable to produce ata errors.
I think now is time to file upstream. Do you agree?
Yes, please! I will continue to trace it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c38
--- Comment #38 from Joey Lee
Booted ICH7 without libata.force, hdparm -W0 /dev/sdb apparently makes repeated hdparm -t unable to produce ata errors.
Per your testing, used 'hdparm -W0 /dev/sdb' to disable write caching feature in device can avoid issue. I traced the ide/ata source code in kernel, libata-core call SET FEATURES ata command to disable it: include/linux/ata.h: SETFEATURES_WC_ON = 0x02, /* Enable write cache */ SETFEATURES_WC_OFF = 0x82 In ATA/ATAPI-6 spec: 8.46.10 Enable/disable write cache Subcommand codes 02h and 82h allow the host to enable or disable write cache in devices that implement write cache. When the subcommand disable write cache is issued, the device shall initiate the sequence to T13/1410D revision 3b flush cache to non-volatile memory before command completion (see 8.12). This subcommand does not apply to commands that have a Flush to Disk bit. On RX-358 v2 box, the write cache feature in device causes READ/WRITE DMA failed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c
Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c39
--- Comment #39 from Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c40
Joey Lee
https://bugzilla.novell.com/show_bug.cgi?id=757426
https://bugzilla.novell.com/show_bug.cgi?id=757426#c41
--- Comment #41 from Joey Lee
participants (1)
-
bugzilla_noreply@novell.com