Mailinglist Archive: opensuse (3618 mails)

< Previous Next >
Re: [opensuse] mdadm keeps breaking my array
  • From: Bob Williams <linux@xxxxxxxxxxxxxxxxxxxxx>
  • Date: Sat, 24 Jan 2009 11:23:59 +0000
  • Message-id: <200901241124.00025.linux@xxxxxxxxxxxxxxxxxxxxx>
On Friday 23 January 2009 23:06:42 Lars Marowsky-Bree wrote:
The kernel/md is not kicking the drive out of the array without a
reason, and not without an error message. Check your logs as to what
the reason is.

OK. The rebuild worked OK last night, and /dev/md0 is running on two disks
ATM.

I've found the following in /var/log/messages...

Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: exception Emask 0x10 SAct
0x0 SErr 0x80000 action 0xe frozen
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: irq_stat 0x01100010, PHY
RDY changed
Jan 23 17:21:18 barrowhillfarm kernel: ata7: SError: { 10B8B }
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jan 23 17:21:18 barrowhillfarm kernel: res
2a/2d:01:01:00:00/00:00:00:00:2a/00 Emask 0x12 (ATA bus error)
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: status: { DF DRQ }
Jan 23 17:21:18 barrowhillfarm kernel: ata7.00: error: { ABRT }
Jan 23 17:21:18 barrowhillfarm kernel: ata7: hard resetting link
Jan 23 17:21:25 barrowhillfarm kernel: ata7: SATA link up 3.0 Gbps
(SStatus 123 SControl 0)
Jan 23 17:21:25 barrowhillfarm kernel: ata7.00: configured for UDMA/100
Jan 23 17:21:25 barrowhillfarm kernel: ata7: EH complete
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] 1953525168
512-byte hardware sectors: (1000GB/931GiB)
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write Protect is
off
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Mode Sense: 00 3a
00 00
Jan 23 17:21:25 barrowhillfarm kernel: sd 6:0:0:0: [sdd] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Jan 23 17:21:25 barrowhillfarm kernel: end_request: I/O error, dev sdd,
sector 1953519813
Jan 23 17:21:25 barrowhillfarm kernel: md: super_written gets error=-5,
uptodate=0
Jan 23 17:21:25 barrowhillfarm kernel: raid1: Disk failure on sdd1,
disabling device.
Jan 23 17:21:25 barrowhillfarm kernel: raid1: Operation continuing on 1
devices.
Jan 23 17:21:25 barrowhillfarm kernel: md: recovery of RAID array md0
Jan 23 17:21:25 barrowhillfarm kernel: md: minimum _guaranteed_ speed:
1000 KB/sec/disk.
Jan 23 17:21:25 barrowhillfarm kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jan 23 17:21:25 barrowhillfarm kernel: md: using 128k window, over a total
of 976759864 blocks.
Jan 23 17:21:25 barrowhillfarm kernel: md: resuming recovery of md0 from
checkpoint.
Jan 23 17:21:25 barrowhillfarm kernel: md: md0: recovery done.

This seems to imply that sdd has a bad sector at 1953519813, but md seems
quite happy to rebuild the array?? Does that mean that, by chance, it
didn't use that bad sector when rebuilding, but tomorrow it might try
writing there, triggering another failure?

Are there any more detailed logs I should looking for?

Thanks,

Bob
--
Registered Linux User #463880 FSFE Member #1300
GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E
openSUSE 11.1, Kernel 2.6.27.7-9-default, KDE 3.5.10
Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >