Re: [opensuse] mdadm keeps breaking my array

24 Jan 2009

      On Friday 23 January 2009 23:06:42 Lars Marowsky-Bree wrote:
...
The kernel/md is not kicking the drive out of the array without a
reason, and not without an error message. Check your logs as to what
the reason is.
OK. The rebuild worked OK last night, and /dev/md0 is running on two disks 
ATM.

I've found the following in /var/log/messages...

Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: exception Emask 0x10 SAct 
0x0 SErr 0x80000 action 0xe frozen
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: irq_stat 0x01100010, PHY 
RDY changed
Jan 23 17:21:18 barrowhillfarm kernel: ata7: SError: { 10B8B }
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jan 23 17:21:18 barrowhillfarm kernel:          res 
2a/2d:01:01:00:00/00:00:00:00:2a/00 Emask 0x12 (ATA bus error)
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: status: { DF DRQ }
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: error: { ABRT }
Jan 23 17:21:18 barrowhillfarm kernel: ata7: hard resetting link
Jan 23 17:21:25 barrowhillfarm kernel: ata7: SATA link up 3.0 Gbps 
(SStatus 123 SControl 0)
Jan 23 17:21:25 barrowhillfarm kernel: ata7.00: configured for UDMA/100
Jan 23 17:21:25 barrowhillfarm kernel: ata7: EH complete
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] 1953525168 
512-byte hardware sectors: (1000GB/931GiB)
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write Protect is 
off
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Mode Sense: 00 3a 
00 00
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Jan 23 17:21:25 barrowhillfarm kernel: end_request: I/O error, dev sdd, 
sector 1953519813
Jan 23 17:21:25 barrowhillfarm kernel: md: super_written gets error=-5, 
uptodate=0
Jan 23 17:21:25 barrowhillfarm kernel: raid1: Disk failure on sdd1, 
disabling device.
Jan 23 17:21:25 barrowhillfarm kernel: raid1: Operation continuing on 1 
devices.
Jan 23 17:21:25 barrowhillfarm kernel: md: recovery of RAID array md0
Jan 23 17:21:25 barrowhillfarm kernel: md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
Jan 23 17:21:25 barrowhillfarm kernel: md: using maximum available idle IO 
bandwidth (but not more than 200000 KB/sec) for recovery.
Jan 23 17:21:25 barrowhillfarm kernel: md: using 128k window, over a total 
of 976759864 blocks.
Jan 23 17:21:25 barrowhillfarm kernel: md: resuming recovery of md0 from 
checkpoint.
Jan 23 17:21:25 barrowhillfarm kernel: md: md0: recovery done.

This seems to imply that sdd has a bad sector at 1953519813, but md seems 
quite happy to rebuild the array?? Does that mean that, by chance, it 
didn't use that bad sector when rebuilding, but tomorrow it might try 
writing there, triggering another failure?

Are there any more detailed logs I should looking for?

Thanks,

Bob
-- 
Registered Linux User #463880		FSFE Member #1300
GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E
openSUSE 11.1, Kernel 2.6.27.7-9-default, KDE 3.5.10
Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org

Re: [opensuse] mdadm keeps breaking my array

Bob Williams