On Friday 23 January 2009 23:06:42 Lars Marowsky-Bree wrote:
The kernel/md is not kicking the drive out of the array without a reason, and not without an error message. Check your logs as to what the reason is.
OK. The rebuild worked OK last night, and /dev/md0 is running on two disks ATM. I've found the following in /var/log/messages... Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: exception Emask 0x10 SAct 0x0 SErr 0x80000 action 0xe frozen Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: irq_stat 0x01100010, PHY RDY changed Jan 23 17:21:18 barrowhillfarm kernel: ata7: SError: { 10B8B } Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Jan 23 17:21:18 barrowhillfarm kernel: res 2a/2d:01:01:00:00/00:00:00:00:2a/00 Emask 0x12 (ATA bus error) Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: status: { DF DRQ } Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: error: { ABRT } Jan 23 17:21:18 barrowhillfarm kernel: ata7: hard resetting link Jan 23 17:21:25 barrowhillfarm kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Jan 23 17:21:25 barrowhillfarm kernel: ata7.00: configured for UDMA/100 Jan 23 17:21:25 barrowhillfarm kernel: ata7: EH complete Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] 1953525168 512-byte hardware sectors: (1000GB/931GiB) Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write Protect is off Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00 Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jan 23 17:21:25 barrowhillfarm kernel: end_request: I/O error, dev sdd, sector 1953519813 Jan 23 17:21:25 barrowhillfarm kernel: md: super_written gets error=-5, uptodate=0 Jan 23 17:21:25 barrowhillfarm kernel: raid1: Disk failure on sdd1, disabling device. Jan 23 17:21:25 barrowhillfarm kernel: raid1: Operation continuing on 1 devices. Jan 23 17:21:25 barrowhillfarm kernel: md: recovery of RAID array md0 Jan 23 17:21:25 barrowhillfarm kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Jan 23 17:21:25 barrowhillfarm kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Jan 23 17:21:25 barrowhillfarm kernel: md: using 128k window, over a total of 976759864 blocks. Jan 23 17:21:25 barrowhillfarm kernel: md: resuming recovery of md0 from checkpoint. Jan 23 17:21:25 barrowhillfarm kernel: md: md0: recovery done. This seems to imply that sdd has a bad sector at 1953519813, but md seems quite happy to rebuild the array?? Does that mean that, by chance, it didn't use that bad sector when rebuilding, but tomorrow it might try writing there, triggering another failure? Are there any more detailed logs I should looking for? Thanks, Bob -- Registered Linux User #463880 FSFE Member #1300 GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E openSUSE 11.1, Kernel 2.6.27.7-9-default, KDE 3.5.10 Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org