On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:
Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again.
Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything).
As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that?
Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this:
# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent
Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0
UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011
Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1
I don't really know what I'm looking at, but the output looks bad, right?
I also found this in dmesg's output:
md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE.
Which also looks bad, don't you think?
So, can anyone please tell me in the short term:
1) Is hde indeed out of the array as it appears?
Yes.
2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure)
There is no such thing as a 'partial drive failure' on an IDE drive. Bad sector marking/remapping is handled via the on board electrics - if the alternate sector map is full, the drive is a short time away from complete failure. Since you describe odd noises, you don't even need to worry about that - it's junk.
3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how?
See #2 above.
4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?
Power down the system, replace the drive, power up the system. The only real recovery headache with a RAID is if the boot drive is the one that failed... In that case, you need to have made certain that ALL the disks are bootable (lilo can do that, I don't know about grub), or else have an alternate boot method.
Then in the longer term, where should I be looking for the docs so I can know this for myself in future?
All of the docs on the linux software raid system that I've seen are lousy... The code is still evolving, and it seems to be being written by people who aren't into docs. O'Reily has 'Managing RAID on linux' which isn't too bad but IS inaccurate in places. The way I did it was to put together a junk system and try things, meanwhile reading everything google found on 'linux raid'. A real pain, but it's your data... Mike- -- Mornings: Evolution in action. Only the grumpy will survive. -- Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments.