Help with disk integrity and RAID-1 please

8 Oct 2005

      Please forgive me if this shows up twice, I tried to send once but it
has taken an improbable time and still not shown up, so it's time to
try again.

Following a premature (3 months) disk failure, I created a RAID 1
array. I understand the basic idea of RAID, but have never used the
tools to do it before (not on Linux, not on anything).

As I built it, I knew there were many things I didn't know about, but
hoped I could learn slowly in "spare" time. For example: does RAID move
bad blocks on it's elements, or does it just dump the doubtful device?
If RAID finds a disk problem, does it tell me about it, and if so how?
If RAID rejects a device, particularly if it's for "transient" reasons
like a single bad sector, can I re-prepare the disk manually and get it
back into service. If I have to replace a failed disk, how do I do
that?

Anyway, these questions are still unanswered (after about 3 months...)
and guess what: I'm pretty sure I have a drive failure. It makes odd
noises, like the other one did :( I poked around, and managed to work
out the existance of the mdadm command, and found this:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Thu Sep  1 05:49:50 2005
     Raid Level : raid1
     Array Size : 156280192 (149.04 GiB 160.03 GB)
    Device Size : 156280192 (149.04 GiB 160.03 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Oct  8 09:38:25 2005
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c
         Events : 0.1345011

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1      34        1        1      active sync   /dev/hdg1

I don't really know what I'm looking at, but the output looks bad,
right?

I also found this in dmesg's output:

md: Autodetecting RAID arrays.
md: autorun ...
md: considering hdg1 ...
md:  adding hdg1 ...
md:  adding hde1 ...
md: created md0
md: bind<hde1>
md: bind<hdg1>
md: running: <hdg1><hde1>
md: kicking non-fresh hde1 from array!
md: unbind<hde1>
md: export_rdev(hde1)
raid1: raid set md0 active with 1 out of 2 mirrors
md: ... autorun DONE.

Which also looks bad, don't you think?

So, can anyone please tell me in the short term:

1) Is hde indeed out of the array as it appears?
2) How can I determine what the failure is? (is it "a few" bad sectors,
too many to want to reuse the drive, or a more complete failure)
3) Can I reformat, move bad sectors, clean up the drive (if it's a
minor failure) and get it back into service, and if so how?
4) If I elect/have to replace the drive, what do I do to make it take
up it's ordained place in the md array?

Then in the longer term, where should I be looking for the docs so I
can know this for myself in future?

Many thanks,
Simon

"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions."  Naguib Mahfouz

__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

Help with disk integrity and RAID-1 please

Simon Roberts