[Bug 570607] New: I keep getting SATA errors which lead to the RAID array getting degraded or XFS dying on top.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c0 Summary: I keep getting SATA errors which lead to the RAID array getting degraded or XFS dying on top. Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.1 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: aaronw@doofus.org QAContact: qa@suse.de Found By: --- Blocker: --- Created an attachment (id=336548) --> (http://bugzilla.novell.com/attachment.cgi?id=336548) First /var/log/messages including a crash and a second one after the crash. User-Agent: Mozilla/5.0 (compatible; Konqueror/4.3; Linux; en_US) KHTML/4.3.1 (like Gecko) SUSE I am running on an Intel I7 machine with 4 hard drives. Two of these drives, /dev/sdc and /dev/sdd are being used in a mirrored configuration to hold my home directory and two other small partitions. The hard drives are Western Digital Black Edition hard drives. As far as I can tell, there is no problem with the actual hard drives. Both pass SMART with flying colors. The motherboard is an ASUS K6T Deluxe V2 and these are connected to the built-in Intel SATA controller. This is causing me no end of frustration. I also saw this same problem with OpenSuSE 11.0 running on a different motherboard (nVidia based). What is also interesting is that the drive that "fails" is arbitrary. It can be either sdc or sdd. Also, the sector is often the RAID super sector. At one point I could not recover the RAID array and had to dd the RAID partition to another drive to mount it and recover my data. All patches are up to date. Linux flash 2.6.31.8-0.1-default #1 SMP 2009-12-15 23:55:40 +0100 x86_64 x86_64 x86_64 GNU/Linux I do not think the problem is with the hard drives because the failing hard drive is arbitrary, and I don't think it's the motherboard because I saw this with my previous motherboard as well that used a completely different chipset. I have SMART enabled on the drives so if there were a problem it would be recorded. Reproducible: Always Steps to Reproduce: 1.Boot up the computer 2.Let it run for a while 3.Watch it fail after some time. Smart data from hard drives: flash:~ # smartctl -a /dev/sdc smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Black family Device Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV0705568 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jan 14 00:22:12 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 222 021 Pre-fail Always - 5708 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 67 5 Reallocated_Sector_Ct 0x0033 198 198 140 Pre-fail Always - 9 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7572 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 62 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 43 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 67 194 Temperature_Celsius 0x0022 118 101 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 70% 7451 - # 2 Extended offline Completed without error 00% 7302 - # 3 Extended offline Completed without error 00% 7220 - # 4 Extended offline Interrupted (host reset) 30% 5287 - # 5 Extended offline Completed without error 00% 4965 - # 6 Extended offline Completed without error 00% 3947 - # 7 Extended offline Completed without error 00% 21 - # 8 Short offline Completed without error 00% 5 - # 9 Extended offline Interrupted (host reset) 50% 2 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. flash:~ # smartctl -a /dev/sdd smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Black family Device Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV0707931 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jan 14 00:22:15 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 224 021 Pre-fail Always - 5375 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 76 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7649 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 66 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 47 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 75 194 Temperature_Celsius 0x0022 114 103 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 7 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 7553 - # 2 Extended offline Interrupted (host reset) 70% 7529 - # 3 Extended offline Completed without error 00% 7380 - # 4 Extended offline Completed without error 00% 5557 - # 5 Extended offline Interrupted (host reset) 30% 5358 - # 6 Extended offline Completed without error 00% 5033 - # 7 Extended offline Completed without error 00% 4000 - # 8 Extended offline Completed without error 00% 22 - # 9 Short offline Completed without error 00% 12 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. PCI information: # lspci -n -v 00:00.0 0600: 8086:3405 (rev 12) Subsystem: 1043:836b Flags: fast devsel Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot-), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Access Control Services Capabilities: [160] Vendor Specific Information <?> 00:01.0 0604: 8086:3408 (rev 12) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00009000-00009fff Memory behind bridge: f3f00000-f3ffffff Capabilities: [40] Subsystem: 1043:836b Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Access Control Services Capabilities: [160] Vendor Specific Information <?> Kernel driver in use: pcieport-driver 00:03.0 0604: 8086:340a (rev 12) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 0000a000-0000afff Memory behind bridge: f4000000-f8cfffff Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff Capabilities: [40] Subsystem: 1043:836b Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Access Control Services Capabilities: [160] Vendor Specific Information <?> Kernel driver in use: pcieport-driver 00:07.0 0604: 8086:340e (rev 12) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Capabilities: [40] Subsystem: 1043:836b Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Access Control Services Capabilities: [160] Vendor Specific Information <?> Kernel driver in use: pcieport-driver 00:14.0 0800: 8086:342e (rev 12) (prog-if 00 [8259]) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:14.1 0800: 8086:3422 (rev 12) (prog-if 00 [8259]) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:14.2 0800: 8086:3423 (rev 12) (prog-if 00 [8259]) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:14.3 0800: 8086:3438 (rev 12) (prog-if 00 [8259]) Flags: fast devsel 00:1a.0 0c03: 8086:3a37 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 16 I/O ports at 8800 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1a.1 0c03: 8086:3a38 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 21 I/O ports at 8880 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1a.2 0c03: 8086:3a39 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 19 I/O ports at 8c00 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1a.7 0c03: 8086:3a3c (prog-if 20 [EHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 18 Memory at f3eff000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci_hcd 00:1b.0 0403: 8086:3a3e Subsystem: 1043:82ea Flags: bus master, fast devsel, latency 0, IRQ 22 Memory at f3ef8000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel <?> Capabilities: [130] Root Complex Link <?> Kernel driver in use: HDA Intel 00:1c.0 0604: 8086:3a40 (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=07, subordinate=07, sec-latency=0 Prefetchable memory behind bridge: 00000000f2f00000-00000000f2ffffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1043:82ea Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel <?> Capabilities: [180] Root Complex Link <?> Kernel driver in use: pcieport-driver 00:1c.2 0604: 8086:3a44 (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: f8f00000-f8ffffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1043:82ea Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel <?> Capabilities: [180] Root Complex Link <?> Kernel driver in use: pcieport-driver 00:1c.4 0604: 8086:3a48 (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: f8e00000-f8efffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1043:82ea Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel <?> Capabilities: [180] Root Complex Link <?> Kernel driver in use: pcieport-driver 00:1c.5 0604: 8086:3a4a (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 0000b000-0000bfff Memory behind bridge: f8d00000-f8dfffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1043:82ea Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel <?> Capabilities: [180] Root Complex Link <?> Kernel driver in use: pcieport-driver 00:1d.0 0c03: 8086:3a34 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 23 I/O ports at 8080 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1d.1 0c03: 8086:3a35 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 19 I/O ports at 8400 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1d.2 0c03: 8086:3a36 (prog-if 00 [UHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at 8480 [size=32] Capabilities: [50] PCI Advanced Features Kernel driver in use: uhci_hcd 00:1d.7 0c03: 8086:3a3a (prog-if 20 [EHCI]) Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0, IRQ 23 Memory at f3efe000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci_hcd 00:1e.0 0604: 8086:244e (rev 90) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=08, subordinate=08, sec-latency=32 I/O behind bridge: 0000e000-0000efff Memory behind bridge: f9000000-fbefffff Capabilities: [50] Subsystem: 1043:82d4 00:1f.0 0601: 8086:3a16 Subsystem: 1043:82d4 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information <?> 00:1f.2 0106: 8086:3a22 (prog-if 01 [AHCI 1.0]) Subsystem: 1043:82d4 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 57 I/O ports at 7c00 [size=8] I/O ports at 7880 [size=4] I/O ports at 7800 [size=8] I/O ports at 7480 [size=4] I/O ports at 7400 [size=32] Memory at f3efc000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA <?> Capabilities: [b0] PCI Advanced Features Kernel driver in use: ahci 00:1f.3 0c05: 8086:3a30 Subsystem: 1043:82d4 Flags: medium devsel, IRQ 18 Memory at f3efd000 (64-bit, non-prefetchable) [size=256] I/O ports at 0400 [size=32] 01:00.0 0106: 197b:2363 (rev 03) (prog-if 01 [AHCI 1.0]) Subsystem: 197b:2363 Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at f3ffe000 (32-bit, non-prefetchable) [size=8K] Expansion ROM at f3fe0000 [disabled] [size=64K] Capabilities: [68] Power Management version 2 Capabilities: [50] Express Legacy Endpoint, MSI 01 Kernel driver in use: ahci 01:00.1 0101: 197b:2363 (rev 03) (prog-if 85 [Master SecO PriO]) Subsystem: 197b:2363 Flags: bus master, fast devsel, latency 0, IRQ 40 I/O ports at 9c00 [size=8] I/O ports at 9880 [size=4] I/O ports at 9800 [size=8] I/O ports at 9480 [size=4] I/O ports at 9400 [size=16] Capabilities: [68] Power Management version 2 Kernel driver in use: pata_jmicron 02:00.0 0300: 10de:05e2 (rev a1) (prog-if 00 [VGA controller]) Subsystem: 3842:1265 Flags: bus master, fast devsel, latency 0, IRQ 24 Memory at f7000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at f4000000 (64-bit, non-prefetchable) [size=32M] I/O ports at ac00 [size=128] [virtual] Expansion ROM at f8c80000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel <?> Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information <?> Kernel driver in use: nvidia 04:00.0 0200: 11ab:4364 (rev 12) Subsystem: 1043:81f8 Flags: bus master, fast devsel, latency 0, IRQ 59 Memory at f8dfc000 (64-bit, non-prefetchable) [size=16K] I/O ports at b800 [size=256] Expansion ROM at f8dc0000 [disabled] [size=128K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: sky2 05:00.0 0101: 11ab:6121 (rev b2) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: 1043:8212 Flags: bus master, fast devsel, latency 0, IRQ 16 I/O ports at cc00 [size=8] I/O ports at c880 [size=4] I/O ports at c800 [size=8] I/O ports at c480 [size=4] I/O ports at c400 [size=16] Memory at f8effc00 (32-bit, non-prefetchable) [size=1K] Capabilities: [48] Power Management version 2 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: pata_marvell 06:00.0 0200: 11ab:4364 (rev 12) Subsystem: 1043:81f8 Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at f8ffc000 (64-bit, non-prefetchable) [size=16K] I/O ports at d800 [size=256] Expansion ROM at f8fc0000 [disabled] [size=128K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: sky2 08:01.0 0400: 14f1:8800 (rev 05) Subsystem: 7063:3000 Flags: bus master, medium devsel, latency 64, IRQ 17 Memory at f9000000 (32-bit, non-prefetchable) [size=16M] Capabilities: [44] Vital Product Data Capabilities: [4c] Power Management version 2 Kernel driver in use: cx8800 08:01.2 0480: 14f1:8802 (rev 05) Subsystem: 7063:3000 Flags: bus master, medium devsel, latency 64, IRQ 17 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Capabilities: [4c] Power Management version 2 Kernel driver in use: cx88-mpeg driver manager 08:02.0 0c00: 1106:3044 (rev c0) (prog-if 10 [OHCI]) Subsystem: 1043:81fe Flags: bus master, medium devsel, latency 64, IRQ 18 Memory at fbeff000 (32-bit, non-prefetchable) [size=2K] I/O ports at ec00 [size=128] Capabilities: [50] Power Management version 2 Kernel driver in use: ohci1394 I have attached the log files. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c yang xiaoyu <xyyang@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |xyyang@novell.com AssignedTo|bnc-team-screening@forge.pr |sbrabec@novell.com |ovo.novell.com | -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c1 --- Comment #1 from Aaron Williams <aaronw@doofus.org> 2010-01-15 08:44:59 UTC --- It failed again, this time /dev/sdd3: 78968.299083] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [78968.299095] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [78968.299097] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [78968.299101] ata4.00: status: { DRDY } [78968.299107] ata4: hard resetting link [78978.302311] ata4: softreset failed (device not ready) [78978.302316] ata4: hard resetting link [78988.301636] ata4: softreset failed (device not ready) [78988.301642] ata4: hard resetting link [78998.801262] ata4: link is slow to respond, please be patient (ready=0) [79023.323116] ata4: softreset failed (device not ready) [79023.323123] ata4: limiting SATA link speed to 1.5 Gbps [79023.323126] ata4: hard resetting link [79028.514620] ata4: softreset failed (device not ready) [79028.514625] ata4: reset failed, giving up [79028.514628] ata4.00: disabled [79028.514632] ata4.00: device reported invalid CHS sector 0 [79028.514645] ata4: EH complete [79028.514656] sd 3:0:0:0: [sdd] Unhandled error code [79028.514659] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [79028.514710] end_request: I/O error, dev sdd, sector 1953520044 [79028.514719] end_request: I/O error, dev sdd, sector 1953520044 [79028.514723] md: super_written gets error=-5, uptodate=0 [79028.514728] raid1: Disk failure on sdd3, disabling device. [79028.514729] raid1: Operation continuing on 1 devices. [79028.531008] RAID1 conf printout: [79028.531012] --- wd:1 rd:2 [79028.531016] disk 0, wo:0, o:1, dev:sdc3 [79028.531033] disk 1, wo:1, o:0, dev:sdd3 [79028.571057] RAID1 conf printout: [79028.571061] --- wd:1 rd:2 [79028.571064] disk 0, wo:0, o:1, dev:sdc3 [88550.455569] md: unbind<sdd3> [88550.502327] md: export_rdev(sdd3) [88554.627519] sd 3:0:0:0: [sdd] Unhandled error code [88554.627523] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.627527] end_request: I/O error, dev sdd, sector 1953520044 [88554.637277] sd 3:0:0:0: [sdd] Unhandled error code [88554.637281] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.637287] end_request: I/O error, dev sdd, sector 1953520044 [88554.637618] sd 3:0:0:0: [sdd] Unhandled error code [88554.637621] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.637625] end_request: I/O error, dev sdd, sector 1953520044 [88554.639706] sd 3:0:0:0: [sdd] Unhandled error code [88554.639709] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.639714] end_request: I/O error, dev sdd, sector 134239140 [88554.639719] Buffer I/O error on device sdd3, logical block 0 [88554.639722] Buffer I/O error on device sdd3, logical block 1 [88554.639726] Buffer I/O error on device sdd3, logical block 2 [88554.639729] Buffer I/O error on device sdd3, logical block 3 [88554.639732] Buffer I/O error on device sdd3, logical block 4 [88554.639735] Buffer I/O error on device sdd3, logical block 5 [88554.639738] Buffer I/O error on device sdd3, logical block 6 [88554.639741] Buffer I/O error on device sdd3, logical block 7 [88554.639745] Buffer I/O error on device sdd3, logical block 8 [88554.639748] Buffer I/O error on device sdd3, logical block 9 [88554.639824] sd 3:0:0:0: [sdd] Unhandled error code [88554.639826] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.639830] end_request: I/O error, dev sdd, sector 134239308 [88554.640004] sd 3:0:0:0: [sdd] Unhandled error code [88554.640007] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88554.640011] end_request: I/O error, dev sdd, sector 134239140 [88558.743932] sd 3:0:0:0: [sdd] Unhandled error code [88558.743936] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.743940] end_request: I/O error, dev sdd, sector 1953520044 [88558.767794] sd 3:0:0:0: [sdd] Unhandled error code [88558.767798] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.767803] end_request: I/O error, dev sdd, sector 1953520044 [88558.767838] sd 3:0:0:0: [sdd] Unhandled error code [88558.767841] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.767845] end_request: I/O error, dev sdd, sector 1953520044 [88558.769708] sd 3:0:0:0: [sdd] Unhandled error code [88558.769711] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.769716] end_request: I/O error, dev sdd, sector 134239140 [88558.769799] sd 3:0:0:0: [sdd] Unhandled error code [88558.769801] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.769805] end_request: I/O error, dev sdd, sector 134239308 [88558.769865] sd 3:0:0:0: [sdd] Unhandled error code [88558.769868] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [88558.769871] end_request: I/O error, dev sdd, sector 134239140 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c2 Stanislav Brabec <sbrabec@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sbrabec@novell.com AssignedTo|sbrabec@novell.com |kernel-maintainers@forge.pr | |ovo.novell.com --- Comment #2 from Stanislav Brabec <sbrabec@novell.com> 2010-01-15 13:46:28 CET --- Regarding S.M.A.R.T.: Logs look clean. You started extended test about a week ago. It found nothing. It should be able to find and log any errors on the disc surface. Any surface read error logged by Linux should be visible in the S.M.A.R.T. log. But it is clean. Note: Write errors (and sometimes even read errors) may be covered to both kernel and log - S.M.A.R.T. may re-allocate the block on fly and the only signature of the failed sector is the incremented number of reallocated sectors. Your values are 11 resp. 7 such events during the whole disc live. It's an average value per year of work. Sectors with reported errors are distant => It's more probably data transfer issue. (Especially if next time different sector numbers will be reported.) It may be a cable problem. (But UDMA CRC Error Count is still zero for both discs.) It may be a driver problem (but you say that two different mainboards are affected.) It may be a problem of disc manufacturing (firmware or hardware). I cannot tell you more about it than: Yes, your S.M.A.R.T. logs indicate no problems with discs. Reassigning to kernel-maintainers. Maybe they are aware about more such reports. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium CC| |jeffm@novell.com AssignedTo|kernel-maintainers@forge.pr |teheo@novell.com |ovo.novell.com | -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c3 --- Comment #3 from Aaron Williams <aaronw@doofus.org> 2010-01-15 23:12:59 UTC --- I've been searching the Internet about this problem and it seems that other people are also seeing this problem. Some of the comments I see are that some drives do not like frequent inquiry messages or SMART commands sent when there's a lot of I/O. I can try changing the cables and see if that makes any difference, but I somehow doubt it. It just failed again. I'm running a smart long selftest on /dev/sdd and /dev/sdc failed: [40832.748916] sd 14:0:0:0: Attached scsi generic sg7 type 0 [40832.751469] sd 14:0:0:0: [sdg] Attached SCSI removable disk [40832.751691] usb-storage: device scan complete [51295.565885] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [51295.565894] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [51295.565896] res 40/00:04:9c:74:8f/00:00:23:00:00/40 Emask 0x4 (timeout) [51295.565900] ata3.00: status: { DRDY } [51295.565905] ata3: hard resetting link [51305.561035] ata3: softreset failed (device not ready) [51305.561040] ata3: hard resetting link [51315.556321] ata3: softreset failed (device not ready) [51315.556329] ata3: hard resetting link [51326.111122] ata3: link is slow to respond, please be patient (ready=0) [51350.565811] ata3: softreset failed (device not ready) [51350.565815] ata3: limiting SATA link speed to 1.5 Gbps [51350.565817] ata3: hard resetting link [51355.765264] ata3: softreset failed (device not ready) [51355.765268] ata3: reset failed, giving up [51355.765272] ata3.00: disabled [51355.765288] ata3: EH complete [51355.765302] sd 2:0:0:0: [sdc] Unhandled error code [51355.765304] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765308] end_request: I/O error, dev sdc, sector 1044022400 [51355.765316] end_request: I/O error, dev sdc, sector 1044022400 [51355.765321] raid1: Disk failure on sdc3, disabling device. [51355.765322] raid1: Operation continuing on 1 devices. [51355.765386] sd 2:0:0:0: [sdc] Unhandled error code [51355.765388] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765392] end_request: I/O error, dev sdc, sector 1057776772 [51355.765404] sd 2:0:0:0: [sdc] Unhandled error code [51355.765406] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765409] end_request: I/O error, dev sdc, sector 1057813636 [51355.765417] sd 2:0:0:0: [sdc] Unhandled error code [51355.765419] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765422] end_request: I/O error, dev sdc, sector 1057814348 [51355.765429] sd 2:0:0:0: [sdc] Unhandled error code [51355.765431] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765435] end_request: I/O error, dev sdc, sector 1057814812 [51355.765442] sd 2:0:0:0: [sdc] Unhandled error code [51355.765444] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765447] end_request: I/O error, dev sdc, sector 1250753300 [51355.765455] sd 2:0:0:0: [sdc] Unhandled error code [51355.765457] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765460] end_request: I/O error, dev sdc, sector 1250855268 [51355.765516] sd 2:0:0:0: [sdc] Unhandled error code [51355.765519] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765522] end_request: I/O error, dev sdc, sector 1498699645 [51355.765530] sd 2:0:0:0: [sdc] Unhandled error code [51355.765532] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765536] end_request: I/O error, dev sdc, sector 1498699652 [51355.765544] sd 2:0:0:0: [sdc] Unhandled error code [51355.765546] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765549] end_request: I/O error, dev sdc, sector 1645583196 [51355.765556] sd 2:0:0:0: [sdc] Unhandled error code [51355.765559] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765562] end_request: I/O error, dev sdc, sector 1712611644 [51355.765569] sd 2:0:0:0: [sdc] Unhandled error code [51355.765571] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765575] end_request: I/O error, dev sdc, sector 589062180 [51355.765581] sd 2:0:0:0: [sdc] Unhandled error code [51355.765584] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765587] end_request: I/O error, dev sdc, sector 589923340 [51355.765594] sd 2:0:0:0: [sdc] Unhandled error code [51355.765596] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765599] end_request: I/O error, dev sdc, sector 590480116 [51355.765608] sd 2:0:0:0: [sdc] Unhandled error code [51355.765610] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765613] end_request: I/O error, dev sdc, sector 590483188 [51355.765620] sd 2:0:0:0: [sdc] Unhandled error code [51355.765622] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765626] end_request: I/O error, dev sdc, sector 590488260 [51355.765633] sd 2:0:0:0: [sdc] Unhandled error code [51355.765635] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765638] end_request: I/O error, dev sdc, sector 590489276 [51355.765645] sd 2:0:0:0: [sdc] Unhandled error code [51355.765647] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765651] end_request: I/O error, dev sdc, sector 590489292 [51355.765658] sd 2:0:0:0: [sdc] Unhandled error code [51355.765660] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765663] end_request: I/O error, dev sdc, sector 590489308 [51355.765670] sd 2:0:0:0: [sdc] Unhandled error code [51355.765672] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765676] end_request: I/O error, dev sdc, sector 590489324 [51355.765683] sd 2:0:0:0: [sdc] Unhandled error code [51355.765685] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765688] end_request: I/O error, dev sdc, sector 590489356 [51355.765695] sd 2:0:0:0: [sdc] Unhandled error code [51355.765697] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765701] end_request: I/O error, dev sdc, sector 590489372 [51355.765708] sd 2:0:0:0: [sdc] Unhandled error code [51355.765710] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765713] end_request: I/O error, dev sdc, sector 590489388 [51355.765721] sd 2:0:0:0: [sdc] Unhandled error code [51355.765723] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765726] end_request: I/O error, dev sdc, sector 590489452 [51355.765733] sd 2:0:0:0: [sdc] Unhandled error code [51355.765736] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765739] end_request: I/O error, dev sdc, sector 590489492 [51355.765746] sd 2:0:0:0: [sdc] Unhandled error code [51355.765748] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765751] end_request: I/O error, dev sdc, sector 590489540 [51355.765759] sd 2:0:0:0: [sdc] Unhandled error code [51355.765761] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765764] end_request: I/O error, dev sdc, sector 590489588 [51355.765771] sd 2:0:0:0: [sdc] Unhandled error code [51355.765773] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765777] end_request: I/O error, dev sdc, sector 590489660 [51355.765784] sd 2:0:0:0: [sdc] Unhandled error code [51355.765786] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765789] end_request: I/O error, dev sdc, sector 590489788 [51355.765797] sd 2:0:0:0: [sdc] Unhandled error code [51355.765799] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765802] end_request: I/O error, dev sdc, sector 590489836 [51355.765810] sd 2:0:0:0: [sdc] Unhandled error code [51355.765812] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765816] end_request: I/O error, dev sdc, sector 590489916 [51355.765823] sd 2:0:0:0: [sdc] Unhandled error code [51355.765825] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765829] end_request: I/O error, dev sdc, sector 590489972 [51355.765836] sd 2:0:0:0: [sdc] Unhandled error code [51355.765838] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765842] end_request: I/O error, dev sdc, sector 590490004 [51355.765849] sd 2:0:0:0: [sdc] Unhandled error code [51355.765851] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765855] end_request: I/O error, dev sdc, sector 590490044 [51355.765863] sd 2:0:0:0: [sdc] Unhandled error code [51355.765865] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765868] end_request: I/O error, dev sdc, sector 590490124 [51355.765875] sd 2:0:0:0: [sdc] Unhandled error code [51355.765877] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765881] end_request: I/O error, dev sdc, sector 590490868 [51355.765888] sd 2:0:0:0: [sdc] Unhandled error code [51355.765891] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765894] end_request: I/O error, dev sdc, sector 595150556 [51355.765901] sd 2:0:0:0: [sdc] Unhandled error code [51355.765904] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765907] end_request: I/O error, dev sdc, sector 596110012 [51355.765915] sd 2:0:0:0: [sdc] Unhandled error code [51355.765917] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765920] end_request: I/O error, dev sdc, sector 598020980 [51355.765928] sd 2:0:0:0: [sdc] Unhandled error code [51355.765930] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765934] end_request: I/O error, dev sdc, sector 598021004 [51355.765941] sd 2:0:0:0: [sdc] Unhandled error code [51355.765943] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765946] end_request: I/O error, dev sdc, sector 598021068 [51355.765954] sd 2:0:0:0: [sdc] Unhandled error code [51355.765956] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.765959] end_request: I/O error, dev sdc, sector 598021084 [51355.766019] sd 2:0:0:0: [sdc] Unhandled error code [51355.766021] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766024] end_request: I/O error, dev sdc, sector 598021204 [51355.766031] sd 2:0:0:0: [sdc] Unhandled error code [51355.766033] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766036] end_request: I/O error, dev sdc, sector 598021284 [51355.766042] sd 2:0:0:0: [sdc] Unhandled error code [51355.766044] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766048] end_request: I/O error, dev sdc, sector 598021316 [51355.766054] sd 2:0:0:0: [sdc] Unhandled error code [51355.766056] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766059] end_request: I/O error, dev sdc, sector 598021364 [51355.766065] sd 2:0:0:0: [sdc] Unhandled error code [51355.766067] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766070] end_request: I/O error, dev sdc, sector 598021508 [51355.766076] sd 2:0:0:0: [sdc] Unhandled error code [51355.766078] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766082] end_request: I/O error, dev sdc, sector 598021524 [51355.766087] sd 2:0:0:0: [sdc] Unhandled error code [51355.766090] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766093] end_request: I/O error, dev sdc, sector 598021588 [51355.766099] sd 2:0:0:0: [sdc] Unhandled error code [51355.766101] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766104] end_request: I/O error, dev sdc, sector 598021732 [51355.766110] sd 2:0:0:0: [sdc] Unhandled error code [51355.766113] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766116] end_request: I/O error, dev sdc, sector 598021780 [51355.766122] sd 2:0:0:0: [sdc] Unhandled error code [51355.766124] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766127] end_request: I/O error, dev sdc, sector 598021804 [51355.766133] sd 2:0:0:0: [sdc] Unhandled error code [51355.766135] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766138] end_request: I/O error, dev sdc, sector 605227020 [51355.766145] sd 2:0:0:0: [sdc] Unhandled error code [51355.766147] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766150] end_request: I/O error, dev sdc, sector 605230172 [51355.766156] sd 2:0:0:0: [sdc] Unhandled error code [51355.766159] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766162] end_request: I/O error, dev sdc, sector 605249420 [51355.766174] sd 2:0:0:0: [sdc] Unhandled error code [51355.766178] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766182] end_request: I/O error, dev sdc, sector 605251524 [51355.766189] sd 2:0:0:0: [sdc] Unhandled error code [51355.766191] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766195] end_request: I/O error, dev sdc, sector 605261044 [51355.766200] sd 2:0:0:0: [sdc] Unhandled error code [51355.766203] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766206] end_request: I/O error, dev sdc, sector 605261060 [51355.766212] sd 2:0:0:0: [sdc] Unhandled error code [51355.766214] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766218] end_request: I/O error, dev sdc, sector 605261300 [51355.766223] sd 2:0:0:0: [sdc] Unhandled error code [51355.766226] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766229] end_request: I/O error, dev sdc, sector 605261332 [51355.766235] sd 2:0:0:0: [sdc] Unhandled error code [51355.766237] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766240] end_request: I/O error, dev sdc, sector 605261380 [51355.766246] sd 2:0:0:0: [sdc] Unhandled error code [51355.766248] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766251] end_request: I/O error, dev sdc, sector 605261412 [51355.766257] sd 2:0:0:0: [sdc] Unhandled error code [51355.766259] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766263] end_request: I/O error, dev sdc, sector 605261668 [51355.766269] sd 2:0:0:0: [sdc] Unhandled error code [51355.766271] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766274] end_request: I/O error, dev sdc, sector 605265732 [51355.766280] sd 2:0:0:0: [sdc] Unhandled error code [51355.766282] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766286] end_request: I/O error, dev sdc, sector 605275260 [51355.766292] sd 2:0:0:0: [sdc] Unhandled error code [51355.766294] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766297] end_request: I/O error, dev sdc, sector 605275468 [51355.766303] sd 2:0:0:0: [sdc] Unhandled error code [51355.766305] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766308] end_request: I/O error, dev sdc, sector 605292492 [51355.766314] sd 2:0:0:0: [sdc] Unhandled error code [51355.766316] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766319] end_request: I/O error, dev sdc, sector 605292700 [51355.766325] sd 2:0:0:0: [sdc] Unhandled error code [51355.766327] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766331] end_request: I/O error, dev sdc, sector 605293292 [51355.766336] sd 2:0:0:0: [sdc] Unhandled error code [51355.766338] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766342] end_request: I/O error, dev sdc, sector 605301124 [51355.766347] sd 2:0:0:0: [sdc] Unhandled error code [51355.766349] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766353] end_request: I/O error, dev sdc, sector 605310620 [51355.766359] sd 2:0:0:0: [sdc] Unhandled error code [51355.766361] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766364] end_request: I/O error, dev sdc, sector 656174116 [51355.766370] sd 2:0:0:0: [sdc] Unhandled error code [51355.766372] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766376] end_request: I/O error, dev sdc, sector 670721980 [51355.766397] sd 2:0:0:0: [sdc] Unhandled error code [51355.766399] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766402] end_request: I/O error, dev sdc, sector 1044022425 [51355.766410] end_request: I/O error, dev sdc, sector 1044022425 [51355.766420] sd 2:0:0:0: [sdc] Unhandled error code [51355.766422] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766425] end_request: I/O error, dev sdc, sector 135923412 [51355.766436] sd 2:0:0:0: [sdc] Unhandled error code [51355.766438] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766441] end_request: I/O error, dev sdc, sector 1044366564 [51355.766449] sd 2:0:0:0: [sdc] Unhandled error code [51355.766451] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51355.766454] end_request: I/O error, dev sdc, sector 590063268 [51355.785976] RAID1 conf printout: [51355.785980] --- wd:1 rd:2 [51355.785982] disk 0, wo:1, o:0, dev:sdc3 [51355.785984] disk 1, wo:0, o:1, dev:sdd3 [51355.817188] RAID1 conf printout: [51355.817190] --- wd:1 rd:2 [51355.817192] disk 1, wo:0, o:1, dev:sdd3 [51470.910432] md: unbind<sdc3> [51470.929640] md: export_rdev(sdc3) [51472.988931] sd 2:0:0:0: [sdc] Unhandled error code [51472.988936] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51472.988942] end_request: I/O error, dev sdc, sector 1953520044 [51473.000842] sd 2:0:0:0: [sdc] Unhandled error code [51473.000846] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51473.000850] end_request: I/O error, dev sdc, sector 1953520044 [51473.000889] sd 2:0:0:0: [sdc] Unhandled error code [51473.000891] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51473.000895] end_request: I/O error, dev sdc, sector 1953520044 [51473.003416] sd 2:0:0:0: [sdc] Unhandled error code [51473.003420] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51473.003424] end_request: I/O error, dev sdc, sector 134239140 [51473.003429] Buffer I/O error on device sdc3, logical block 0 [51473.003432] Buffer I/O error on device sdc3, logical block 1 [51473.003436] Buffer I/O error on device sdc3, logical block 2 [51473.003439] Buffer I/O error on device sdc3, logical block 3 [51473.003442] Buffer I/O error on device sdc3, logical block 4 [51473.003445] Buffer I/O error on device sdc3, logical block 5 [51473.003448] Buffer I/O error on device sdc3, logical block 6 [51473.003450] Buffer I/O error on device sdc3, logical block 7 [51473.003455] Buffer I/O error on device sdc3, logical block 8 [51473.003458] Buffer I/O error on device sdc3, logical block 9 [51473.003581] sd 2:0:0:0: [sdc] Unhandled error code [51473.003584] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51473.003590] end_request: I/O error, dev sdc, sector 134239308 [51473.003783] sd 2:0:0:0: [sdc] Unhandled error code [51473.003787] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [51473.003792] end_request: I/O error, dev sdc, sector 134239140 -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c4 --- Comment #4 from Aaron Williams <aaronw@doofus.org> 2010-01-15 23:51:32 UTC --- Here's the latest SMART result after a long selftest on sdd, which failed earlier. sdc is unavailable after the failure and I need to reboot to restore it. smartctl -a /dev/sdd smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Black family Device Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV0707931 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 15 15:46:43 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 224 021 Pre-fail Always - 5375 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 77 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7689 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 67 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 48 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 76 194 Temperature_Celsius 0x0022 115 103 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 7553 - # 2 Extended offline Interrupted (host reset) 70% 7529 - # 3 Extended offline Completed without error 00% 7380 - # 4 Extended offline Completed without error 00% 5557 - # 5 Extended offline Interrupted (host reset) 30% 5358 - # 6 Extended offline Completed without error 00% 5033 - # 7 Extended offline Completed without error 00% 4000 - # 8 Extended offline Completed without error 00% 22 - # 9 Short offline Completed without error 00% 12 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. As you can see, virtually nothing changed. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c5 --- Comment #5 from Aaron Williams <aaronw@doofus.org> 2010-01-16 09:48:52 UTC --- The thread at http://osdir.com/ml/linux-kernel/2009-08/msg08274.html seems like it might be related to my problem. I found I had the hddtemp service running and have since stopped it to see if this improves the situation. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c6 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |aaronw@doofus.org --- Comment #6 from Tejun Heo <teheo@novell.com> 2010-01-20 02:12:46 UTC --- Aaron, did stopping hddtemp make any difference? Also, FLUSH_EXT (0xEA) failure is also often caused by unstable power supply. Can you please try the following? * Get an extra power supply unit. It doesn't have to be an expensive one. * Connect one hard drive to the new PSU while leaving everything else alone. * Power up the new PSU by jumping the power switch connectors. http://modtown.co.uk/mt/article2.php?id=psumod * Boot the system and see whether the failure pattern changes. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c7 --- Comment #7 from Aaron Williams <aaronw@doofus.org> 2010-01-20 07:32:47 UTC --- It actually made a big difference. I still saw one failure, but the failures are far less frequent. The power supply in my computer is an 850 watt Cooler Master which should be plenty since I am not running SLI. The failures often occur when the system is not under much load either. I have four hard drives, but the others are a WD Green and a WD Velociraptor, both of which are pretty efficient. I could try moving one of the drives to an external enclosure and see if that makes any difference, but that will have to wait until next week since I will be out of town for about a week starting tomorrow. I also have another 600 watt power supply I can try. One thing that was interesting was that I attempted to get the SMART statistics by issuing several syncs and waiting for no activity. Of course when I do so there's a bunch of disk activity and I immediately saw another failure on the drive I was querying (not in the SMART, but the usual I/O). I have managed to run another long self test on both drives and both indicate that there are no problems. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c8 --- Comment #8 from Tejun Heo <teheo@novell.com> 2010-01-20 07:53:03 UTC --- SMART tests are useful for detecting media errors but may not be too useful for failures like this. Can you move half of the drives to the separate PSU and see where the error occurs (ie. are the errors localized to the main PSU)? Also, the high wattage printed on the PSU doesn't necessarily guarantee the power quality. On many bug reports which turned out to be a power problem, the PSUs were rated way above what's necessary but still failed to serve the system properly. Please let me know what you find out. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c9 --- Comment #9 from Aaron Williams <aaronw@doofus.org> 2010-02-14 22:06:52 UTC --- With the hard drives connected to a separate power supply I just got the same failure. This indicates that the problem is NOT PSU related. The other power supply is a known good high-quality supply. Feb 14 14:00:50 flash kernel: [332580.048146] md: unbind<sdc3> Feb 14 14:00:51 flash kernel: [332580.080439] md: export_rdev(sdc3) Feb 14 14:00:54 flash kernel: [332583.490666] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.490673] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.490677] end_request: I/O error, dev sdc, sector 1953520044 Feb 14 14:00:54 flash kernel: [332583.509545] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.509551] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.509556] end_request: I/O error, dev sdc, sector 1953520044 Feb 14 14:00:54 flash kernel: [332583.517647] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.517651] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.517656] end_request: I/O error, dev sdc, sector 1953520044 Feb 14 14:00:54 flash kernel: [332583.518871] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.518876] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.518881] end_request: I/O error, dev sdc, sector 134239140 Feb 14 14:00:54 flash kernel: [332583.518885] Buffer I/O error on device sdc3, logical block 0 Feb 14 14:00:54 flash kernel: [332583.518889] Buffer I/O error on device sdc3, logical block 1 Feb 14 14:00:54 flash kernel: [332583.518892] Buffer I/O error on device sdc3, logical block 2 Feb 14 14:00:54 flash kernel: [332583.518895] Buffer I/O error on device sdc3, logical block 3 Feb 14 14:00:54 flash kernel: [332583.518899] Buffer I/O error on device sdc3, logical block 4 Feb 14 14:00:54 flash kernel: [332583.518902] Buffer I/O error on device sdc3, logical block 5 Feb 14 14:00:54 flash kernel: [332583.518905] Buffer I/O error on device sdc3, logical block 6 Feb 14 14:00:54 flash kernel: [332583.518909] Buffer I/O error on device sdc3, logical block 7 Feb 14 14:00:54 flash kernel: [332583.518913] Buffer I/O error on device sdc3, logical block 8 Feb 14 14:00:54 flash kernel: [332583.518955] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.518958] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.518962] end_request: I/O error, dev sdc, sector 134239308 Feb 14 14:00:54 flash kernel: [332583.518993] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.518996] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.519000] end_request: I/O error, dev sdc, sector 134239140 Feb 14 14:00:54 flash kernel: [332583.520117] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.520122] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.520126] end_request: I/O error, dev sdc, sector 0 Feb 14 14:00:54 flash kernel: [332583.520139] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.520142] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Feb 14 14:00:54 flash kernel: [332583.520146] end_request: I/O error, dev sdc, sector 0 Feb 14 14:00:54 flash kernel: [332583.520169] sd 2:0:0:0: [sdc] Unhandled error code Feb 14 14:00:54 flash kernel: [332583.520172] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c10 --- Comment #10 from Tejun Heo <teheo@novell.com> 2010-02-16 08:22:20 UTC --- Aaron, thanks for testing. Can you please attach full log? Also, in general, please attach (as plain text) full log file after a failure. The large amount of SCSI and block error messages don't really carry much information regarding what went wrong. I went through the log again and it's a bit strange. In log3, the disk aborted read requests at four different sectors and the smart count reported current_pending_sectors at 7, which means that it detected 7 unreliable sectors and they're scheduled for reallocation at the next overwrite (w/o overwrite, they can't be remapped as the original data can't be read), which seems correct. After that, the disk was kicked out of the array and later when the disk was reinserted to the array, the whole disk was overwritten during resync. This should have bumped up the reallocation counter. But according to the smartctl output from comment#4, it seems that the disk didn't actually do the reallocation during the overwrite. It simply cleared the current_pending counter. This could mean that after overwrite, the disk firmware thought that the sector seemed reliable enough and simple overwrite over the failed region should correct the problem. If so, it's possible that the firmware is misjudging the nature of those failures causing the same problem to happen repeatedly by not remapping them. Can you please record the output of "smartctl -a" right after the failure and then again after resync is complete? Let's see whether the behavior is consistent. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c11 --- Comment #11 from Aaron Williams <aaronw@doofus.org> 2010-02-27 21:51:02 UTC --- Here's the output right after a failure. I'll restart the rebuild right now and report again when it finishes. I did not need to reboot in this case. smartctl -a /dev/sdc smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Black family Device Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV0705568 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Feb 27 13:45:26 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 222 021 Pre-fail Always - 4016 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 81 5 Reallocated_Sector_Ct 0x0033 197 197 140 Pre-fail Always - 17 7 Seek_Error_Rate 0x002e 200 199 000 Old_age Always - 51 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8635 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 76 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 57 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 81 194 Temperature_Celsius 0x0022 117 101 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 8455 - # 2 Extended offline Completed without error 00% 7681 - # 3 Extended offline Interrupted (host reset) 70% 7677 - # 4 Extended offline Interrupted (host reset) 70% 7451 - # 5 Extended offline Completed without error 00% 7302 - # 6 Extended offline Completed without error 00% 7220 - # 7 Extended offline Interrupted (host reset) 30% 5287 - # 8 Extended offline Completed without error 00% 4965 - # 9 Extended offline Completed without error 00% 3947 - #10 Extended offline Completed without error 00% 21 - #11 Short offline Completed without error 00% 5 - #12 Extended offline Interrupted (host reset) 50% 2 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c12 --- Comment #12 from Aaron Williams <aaronw@doofus.org> 2010-02-28 23:05:13 UTC --- Here's the SMART data again after rebuilding the drive: === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Black family Device Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV0705568 Firmware Version: 05.00K05 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Feb 28 15:01:14 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 221) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 222 021 Pre-fail Always - 4016 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 81 5 Reallocated_Sector_Ct 0x0033 197 197 140 Pre-fail Always - 17 7 Seek_Error_Rate 0x002e 200 199 000 Old_age Always - 51 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8661 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 76 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 57 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 81 194 Temperature_Celsius 0x0022 117 101 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 8455 - # 2 Extended offline Completed without error 00% 7681 - # 3 Extended offline Interrupted (host reset) 70% 7677 - # 4 Extended offline Interrupted (host reset) 70% 7451 - # 5 Extended offline Completed without error 00% 7302 - # 6 Extended offline Completed without error 00% 7220 - # 7 Extended offline Interrupted (host reset) 30% 5287 - # 8 Extended offline Completed without error 00% 4965 - # 9 Extended offline Completed without error 00% 3947 - #10 Extended offline Completed without error 00% 21 - #11 Short offline Completed without error 00% 5 - #12 Extended offline Interrupted (host reset) 50% 2 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Everything seems the same except the current pending sector count went from 1 to 0. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c13 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|aaronw@doofus.org | --- Comment #13 from Tejun Heo <teheo@novell.com> 2010-03-02 06:33:28 UTC --- Yeah, that's strange. It at least should have bumped reallocated sector count. At any rate, your drive is constantly growing bad sectors which lead to IO errors and gets kicked out of the array. When it rejoins, during resync, it doesn't seem to be actually reallocating those failed sectors at least according to the SMART counters. You can try to compare the failed sectors across multiple failures but if the drive is keep failing like that, there isn't much can be done by software. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c14 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |aaronw@doofus.org --- Comment #14 from Tejun Heo <teheo@novell.com> 2010-03-02 07:02:58 UTC --- Do you have kernel logs from the last failure? In the first two you posted, the failed sectors were 1337142436 and 1343897380 on sdd. I/O errors following FLUSH_EXT timeouts and ensuing detach don't really point to the problematic ones. If you have logs at hand, seeing whether the drives are failing the same sectors might shed some light on what's going on. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=570607 http://bugzilla.novell.com/show_bug.cgi?id=570607#c15 --- Comment #15 from Aaron Williams <aaronw@doofus.org> 2010-03-02 07:38:50 UTC --- I have attached the latest logs. This failure was sdc. If this was only one drive I would think it's the drive, but I'm seeing the same problems with both sdc and sdd. The last smart info was also for sdc. I see an error at sector 1337811172 and a read error corrected at 1203572032. I will try and attach the log file. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=570607 https://bugzilla.novell.com/show_bug.cgi?id=570607#c16 Aaron Williams <aaronw@doofus.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Kernel |Kernel AssignedTo|teheo@novell.com |kernel-maintainers@forge.pr | |ovo.novell.com Product|openSUSE 11.1 |openSUSE 11.3 --- Comment #16 from Aaron Williams <aaronw@doofus.org> 2010-08-26 02:13:09 UTC --- I am seeing this since upgrading to OpenSUSE 11.3 but it is now quite rare. It was quite common under 11.2 even after replacing all of the hard drives. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=570607 https://bugzilla.novell.com/show_bug.cgi?id=570607#c17 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |teheo@novell.com --- Comment #17 from Tejun Heo <teheo@novell.com> 2010-08-26 08:42:11 UTC --- Can you please post dmesg output after such failures on 11.3? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=570607 https://bugzilla.novell.com/show_bug.cgi?id=570607#c18 --- Comment #18 from Aaron Williams <aaronw@doofus.org> 2010-08-26 16:18:14 UTC --- (In reply to comment #17)
Can you please post dmesg output after such failures on 11.3?
I will. It might take a while since it is far more rare in 11.3 than it was in 11.2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=570607 https://bugzilla.novell.com/show_bug.cgi?id=570607#c19 --- Comment #19 from Jiri Slaby <jslaby@novell.com> 2010-10-06 11:31:26 UTC --- (In reply to comment #18)
I will. It might take a while since it is far more rare in 11.3 than it was in 11.2.
Any updates? Is it still reproducible with the latest 11.3 kernel? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=570607 https://bugzilla.novell.com/show_bug.cgi?id=570607#c20 Jiri Slaby <jslaby@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED CC| |jslaby@novell.com InfoProvider|aaronw@doofus.org | Resolution| |NORESPONSE --- Comment #20 from Jiri Slaby <jslaby@novell.com> 2010-11-07 14:26:21 UTC --- (In reply to comment #19)
Any updates?
Apparently not. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com