[opensuse] Re-adding RAID drives
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing? tnx jk -- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
James Knott wrote:
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing?
If a drive/partition is marked as "failed", you need to remove it from the RAID first with: mdadm --manage --remove /dev/mdx /dev/sdxx Then you can add it again with: mdadm --manage --add /dev/mdx /dev/sdxx Hope it helps.
tnx jk
-- Rui Santos http://www.ruisantos.com/ Veni, vidi, Linux! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rui Santos wrote:
James Knott wrote:
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing?
If a drive/partition is marked as "failed", you need to remove it from the RAID first with:
Sorry - forgot this: mdadm --manage --set-faulty /dev/mdx /dev/sdxx
mdadm --manage --remove /dev/mdx /dev/sdxx
Then you can add it again with:
mdadm --manage --add /dev/mdx /dev/sdxx
Hope it helps.
tnx jk
-- Rui Santos http://www.ruisantos.com/ Veni, vidi, Linux! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rui Santos wrote:
Rui Santos wrote:
James Knott wrote:
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing?
If a drive/partition is marked as "failed", you need to remove it from the RAID first with:
Sorry - forgot this:
mdadm --manage --set-faulty /dev/mdx /dev/sdxx
mdadm --manage --remove /dev/mdx /dev/sdxx
Then you can add it again with:
mdadm --manage --add /dev/mdx /dev/sdxx
Hope it helps.
tnx jk
If I go through the --set-faulty, --fail, --add sequence from the command line, I have no problem adding the drive back. However, if I simulate a drive failure by pulling the drive, that sequence fails with the error message "mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument". If I then reboot the computer, I can then use --add to add the drive again. So, there appears to be some difference between using commands to remove a drive and an actual hardware failure. -- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
James Knott wrote:
Rui Santos wrote:
Rui Santos wrote:
James Knott wrote:
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing?
If a drive/partition is marked as "failed", you need to remove it from the RAID first with:
Sorry - forgot this:
mdadm --manage --set-faulty /dev/mdx /dev/sdxx
mdadm --manage --remove /dev/mdx /dev/sdxx
Then you can add it again with:
mdadm --manage --add /dev/mdx /dev/sdxx
Hope it helps.
tnx jk
If I go through the --set-faulty, --fail, --add sequence from the command line, I have no problem adding the drive back. However, if I simulate a drive failure by pulling the drive, that sequence fails with the error message "mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument". If I then reboot the computer, I can then use --add to add the drive again. So, there appears to be some difference between using commands to remove a drive and an actual hardware failure.
The --set-faulty and --fail options are the same... if you say you can execute a --set-faulty, then re-add the device, that is new for me. About you pulling out a hot-swap device, the device should be considered failed at that time. Before you add the drive back into the slot, do you use the --remove option on the already removed drive...at that time it should still be a part of the RAID but, in faulty mode. You have to firt remove it by issuing 'mdadm --manage --remove /dev/mdx /dev/sdxx'. Have you done this? Only then you're able to plug the device back in and re-add the device. At least that's how I use it... never tryed on hot-swap though, but the --set-faulty is supposed to do just that. There's one other issue: The kernel driver of the device you use should be able to disconnect and re-connect the device cleanly. Check 'dmesg' to see if it happens as it should...
-- Rui Santos http://www.ruisantos.com/ Veni, vidi, Linux! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Rui Santos wrote:
James Knott wrote:
Rui Santos wrote:
Rui Santos wrote:
James Knott wrote:
I've now got SUSE 10.2 set up with RAID and LVM on my server. However, I don't seem to be able to re-add a "failed" drive, without rebooting. The drives are hot-swapable. When I use the command mdadm /dev/md0/ -add /dev/sdxx I get a "device busy" error message. Even removing the drive with the --remove option, before the add command doesn't help. I still have to reboot to add the drive. Is there something I'm missing?
If a drive/partition is marked as "failed", you need to remove it from the RAID first with:
Sorry - forgot this:
mdadm --manage --set-faulty /dev/mdx /dev/sdxx
mdadm --manage --remove /dev/mdx /dev/sdxx
Then you can add it again with:
mdadm --manage --add /dev/mdx /dev/sdxx
Hope it helps.
tnx jk
If I go through the --set-faulty, --fail, --add sequence from the command line, I have no problem adding the drive back. However, if I simulate a drive failure by pulling the drive, that sequence fails with the error message "mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument". If I then reboot the computer, I can then use --add to add the drive again. So, there appears to be some difference between using commands to remove a drive and an actual hardware failure.
The --set-faulty and --fail options are the same... if you say you can execute a --set-faulty, then re-add the device, that is new for me.
About you pulling out a hot-swap device, the device should be considered failed at that time. Before you add the drive back into the slot, do you use the --remove option on the already removed drive...at that time it should still be a part of the RAID but, in faulty mode. You have to firt remove it by issuing 'mdadm --manage --remove /dev/mdx /dev/sdxx'. Have you done this?
Only then you're able to plug the device back in and re-add the device.
At least that's how I use it... never tryed on hot-swap though, but the --set-faulty is supposed to do just that.
There's one other issue: The kernel driver of the device you use should be able to disconnect and re-connect the device cleanly. Check 'dmesg' to see if it happens as it should...
After I unplugged the drive, dmesg shows this. raid5: Disk failure on sdd2, disabling device. Operation continuing on 3 devices RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda2 disk 1, o:1, dev:sdb2 disk 2, o:1, dev:sdc2 disk 3, o:0, dev:sdd2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sda2 disk 1, o:1, dev:sdb2 disk 2, o:1, dev:sdc2 md: unbind<sdd2> md: export_rdev(sdd2) At this point, I run the remove option. Then after reinserting the drive, it shows this, even though the drives are on B. scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A scsi1: Someone reset channel A Then when adding I get this netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument If I now reboot, I'll be able to add the drive, using the same command, like so. netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: re-added /dev/sdd2 -- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2007-06-24 at 23:06 -0400, James Knott wrote:
netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument
If I now reboot, I'll be able to add the drive, using the same command, like so.
I would say that the kernel does not support hot swapping of disks. In fact, I read time ago it didn't and was work in progress. I don't know what is the current state. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFGf4/ntTMYHG2NR9URAsXRAJ0ft2C+Rjsr5XrOc9yyx9E5Z7HiNACbBMKv AufTjctO9G3gujccqtriZHk= =8fcA -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2007-06-25 at 11:49 +0200, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The Sunday 2007-06-24 at 23:06 -0400, James Knott wrote:
netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument
If I now reboot, I'll be able to add the drive, using the same command, like so.
I would say that the kernel does not support hot swapping of disks. In fact, I read time ago it didn't and was work in progress. I don't know what is the current state.
The kernel does indeed support hot swapping. However, the quality of this depends on the underlying hardware. For example, the intel ICHx chips require that there be a disk in the machine when it is powered on. After that you can hotswap. Don't forget the hotplug option in /etc/fstab (or wherever you put the mount options). Also, the ICHx chipset seems to require time to sense that a swap has indeed happened. So I always: 1. umount the disks to be removed. 2. remove the disks 3. wait, say, 30 seconds. During this time you should see messages in /car/log/messages that the disks have been removed. If you are using intel ICHx chips, you may even see messages about disks being removed and waiting for the hardware to calm down. 4. insert the new disks. I do this in a system with 4 swappable SATA disks. I think it requires a kernel newer that 2.6.17. Otherwise you need to update the libata in your kernel to a newer version. Hotswapping (intel ICHx at least) did not really work in kernels older than 2.6.17. I do no know about hot swapping on other hardware. My understanding is that in most hardware it works ok. Only the intel ICHx stuff is problematic to some odd hardware quirks. One more thing: I have a udev rule so my SATA disks have the same kernel name when they are put in the same physical slot (I have 4 removable SATS disks bays). If you do not have this, sdd can become sde (and so on) after a swap. /var/log/messages tells what the inserted disk is known as. Perhaps it is as simple as that.
- -- Cheers, Carlos E. R.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Made with pgp4pine 1.76
iD8DBQFGf4/ntTMYHG2NR9URAsXRAJ0ft2C+Rjsr5XrOc9yyx9E5Z7HiNACbBMKv AufTjctO9G3gujccqtriZHk= =8fcA -----END PGP SIGNATURE-----
-- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Tel: Int +46 8-615 60 20 Fax: Int +46 8-31 42 23 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Roger Oberholtzer wrote:
On Mon, 2007-06-25 at 11:49 +0200, Carlos E. R. wrote:
The Sunday 2007-06-24 at 23:06 -0400, James Knott wrote:
netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument
If I now reboot, I'll be able to add the drive, using the same command, like so. I would say that the kernel does not support hot swapping of disks. In fact, I read time ago it didn't and was work in progress. I don't know what is the current state.
The kernel does indeed support hot swapping. However, the quality of this depends on the underlying hardware. For example, the intel ICHx chips require that there be a disk in the machine when it is powered on. After that you can hotswap. Don't forget the hotplug option in /etc/fstab (or wherever you put the mount options). Also, the ICHx chipset seems to require time to sense that a swap has indeed happened. So I always:
1. umount the disks to be removed. Since the disks are part of a RAID array, they're not listed in fstab, except for the small slice of one used for /boot.
-- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 6/25/07, Roger Oberholtzer
4. insert the new disks.
<snip>
One more thing: I have a udev rule so my SATA disks have the same kernel name when they are put in the same physical slot (I have 4 removable SATS disks bays). If you do not have this, sdd can become sde (and so on) after a swap. /var/log/messages tells what the inserted disk is known as. Perhaps it is as simple as that.
I am not a mdraid user, but I do follow the libata mailing list. Hotswap should work from an ATA level. As Roger says the default behavior is to assign the drive a new /dev/sdx name that will not correspond with the original name. I don't know what the fix is, but that is almost definitely the problem. FYI: If your just testing and you get the rebuild to work, make sure you test a reboot, because that will change the /dev/sdx value back to its original value. Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2007-06-25 at 16:26 -0400, Greg Freemyer wrote:
On 6/25/07, Roger Oberholtzer
wrote: 4. insert the new disks.
<snip>
One more thing: I have a udev rule so my SATA disks have the same kernel name when they are put in the same physical slot (I have 4 removable SATS disks bays). If you do not have this, sdd can become sde (and so on) after a swap. /var/log/messages tells what the inserted disk is known as. Perhaps it is as simple as that.
I am not a mdraid user, but I do follow the libata mailing list. Hotswap should work from an ATA level. As Roger says the default behavior is to assign the drive a new /dev/sdx name that will not correspond with the original name.
I don't know what the fix is, but that is almost definitely the problem. FYI: If your just testing and you get the rebuild to work, make sure you test a reboot, because that will change the /dev/sdx value back to its original value.
My udev rules for 4 removable disks are: SUBSYSTEM=="block", BUS=="scsi", KERNEL=="sd*[0-9]", ID=="0:0:0:0", \ SYMLINK="cameraA_p%n" SUBSYSTEM=="block", BUS=="scsi", KERNEL=="sd*[0-9]", ID=="1:0:0:0", \ SYMLINK="cameraB_p%n" SUBSYSTEM=="block", BUS=="scsi", KERNEL=="sd*[0-9]", ID=="0:0:1:0", \ SYMLINK="cameraC_p%n" SUBSYSTEM=="block", BUS=="scsi", KERNEL=="sd*[0-9]", ID=="1:0:1:0", \ SYMLINK="cameraD_p%n" Each removable bay has an unchanging ID, which you can see in /var/log/messages when a disk is found. In my case, I make a symlink from the /dev/sdXN to a consistent name I like. I decided not to change the name the kernel will use just so I did not mess up something else. All my mount commands and such use the symlink name, not the sdXN kernel name. I would imagine the same thing could be done when the disks are to be part of a RAID. I guess the removable aspect of the RAID is that you could replace a bad disk and rebuild it without a reboot? My RAID is not build on a hot swappable hardware. Too bad... -- Roger Oberholtzer OPQ Systems / Ramböll RST Ramböll Sverige AB Kapellgränd 7 P.O. Box 4205 SE-102 65 Stockholm, Sweden Tel: Int +46 8-615 60 20 Fax: Int +46 8-31 42 23 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
The Sunday 2007-06-24 at 23:06 -0400, James Knott wrote:
netfinity:~ # mdadm --add /dev/md0 /dev/sdd2 mdadm: add new device failed for /dev/sdd2 as 4: Invalid argument
If I now reboot, I'll be able to add the drive, using the same command, like so.
I would say that the kernel does not support hot swapping of disks. In fact, I read time ago it didn't and was work in progress. I don't know what is the current state.
It apparently still doesn't work. :-( Oh well, this system is just for learning about such things. As it sits right now, I've got LVM running on top of RAID, containing everything but /boot. . -- Use OpenOffice.org http://www.openoffice.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (5)
-
Carlos E. R.
-
Greg Freemyer
-
James Knott
-
Roger Oberholtzer
-
Rui Santos