[opensuse] Recover md RAID-1.
How do I recover a broken MD RAID-1.... Since there is no way to make a backup with easy restore? I boot the system and it says: md: radi5 personality registered md: raid4 personality registered md: md0 stopped. md: bind<sdb1> md: bind<sda1> md: md0: raid array is not clean -- starting background reconstruction raid1: raid set md0 active with 2 out of 2 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: peers -> run() failed... mdadm: failed to RUN_ARRAY /dev/md0: Input/output error mdadm: device /dev/md0 already active - cannot assemble it Trying manual resume from /dev/sdb2 Invoking userspace resume from /dev/sdb2 boot/82-resume.userspace.sh: line 48: /proc/splash: No such file or directory resume: libgcrypt version: 1.4.0 Trying manual resume from /dev/sdb2 Invoking in-kernel resume from /dev/sdb2 PM: starting manual resum from disk Waiting for device /dev/md0 to appear: ok /dev/md0: unknown volume type invalid root filesystem -- exiting to /bin/sh $ Before this system was working fine. I shut it down correctly with the 'halt' command. I plugged in a USB KVM that makes the system hang at boottime (to provide needed info for Bug # 412476) after the system hung I powered it off, unplugged the kVM and then booted it back, only to get the above messages. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sat, 30 Aug 2008 08:30:52 Andrew Joakimsen wrote:
How do I recover a broken MD RAID-1.... Since there is no way to make a backup with easy restore? I boot the system and it says:
md: radi5 personality registered md: raid4 personality registered md: md0 stopped. md: bind<sdb1> md: bind<sda1> md: md0: raid array is not clean -- starting background reconstruction raid1: raid set md0 active with 2 out of 2 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: peers -> run() failed... mdadm: failed to RUN_ARRAY /dev/md0: Input/output error mdadm: device /dev/md0 already active - cannot assemble it Trying manual resume from /dev/sdb2 Invoking userspace resume from /dev/sdb2 boot/82-resume.userspace.sh: line 48: /proc/splash: No such file or directory resume: libgcrypt version: 1.4.0 Trying manual resume from /dev/sdb2 Invoking in-kernel resume from /dev/sdb2 PM: starting manual resum from disk Waiting for device /dev/md0 to appear: ok /dev/md0: unknown volume type invalid root filesystem -- exiting to /bin/sh $
Before this system was working fine. I shut it down correctly with the 'halt' command. I plugged in a USB KVM that makes the system hang at boottime (to provide needed info for Bug # 412476) after the system hung I powered it off, unplugged the kVM and then booted it back, only to get the above messages.
First thing I would do is to mark /dev/sdb2 as failed and remove it from the array. Then try booting the array with just /dev/sdb1 as a member. Once the array is started, reassemble the array and let it rebuild the mirror. I am assuming that /dev/sdb2 is the partition with the problem - it may in fact be /dev/sdb1 (in which case do the above for /dev/sdb1). Reading man mdadm may be useful. -- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au =================================================== A [golf] ball hitting a tree shall be deemed not to have hit the tree. Hitting a tree is simply bad luck and has no place in a scientific game. The player should estimate the distance the ball would have traveled if it had not hit the tree and play the ball from there, preferably atop a nice firm tuft of grass. -- Donald A. Metz
What is the command? Because mdadm /dev/md0 --fail /dev/sdb2 does not
work. It says set device faulty failed for /dev/sdb2: no such
device.... but cat /dev/sdb2 shows there is a device there!
I have a 2nd system with the same config, I shut it down correctly
with the 'halt" command and when I bring it back up I get the same
problems due to the crappy RAID.
On Sat, Aug 30, 2008 at 3:41 AM, Rodney Baker
On Sat, 30 Aug 2008 08:30:52 Andrew Joakimsen wrote:
How do I recover a broken MD RAID-1.... Since there is no way to make a backup with easy restore? I boot the system and it says:
md: radi5 personality registered md: raid4 personality registered md: md0 stopped. md: bind<sdb1> md: bind<sda1> md: md0: raid array is not clean -- starting background reconstruction raid1: raid set md0 active with 2 out of 2 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: peers -> run() failed... mdadm: failed to RUN_ARRAY /dev/md0: Input/output error mdadm: device /dev/md0 already active - cannot assemble it Trying manual resume from /dev/sdb2 Invoking userspace resume from /dev/sdb2 boot/82-resume.userspace.sh: line 48: /proc/splash: No such file or directory resume: libgcrypt version: 1.4.0 Trying manual resume from /dev/sdb2 Invoking in-kernel resume from /dev/sdb2 PM: starting manual resum from disk Waiting for device /dev/md0 to appear: ok /dev/md0: unknown volume type invalid root filesystem -- exiting to /bin/sh $
Before this system was working fine. I shut it down correctly with the 'halt' command. I plugged in a USB KVM that makes the system hang at boottime (to provide needed info for Bug # 412476) after the system hung I powered it off, unplugged the kVM and then booted it back, only to get the above messages.
First thing I would do is to mark /dev/sdb2 as failed and remove it from the array. Then try booting the array with just /dev/sdb1 as a member. Once the array is started, reassemble the array and let it rebuild the mirror. I am assuming that /dev/sdb2 is the partition with the problem - it may in fact be /dev/sdb1 (in which case do the above for /dev/sdb1).
Reading man mdadm may be useful.
-- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au ===================================================
A [golf] ball hitting a tree shall be deemed not to have hit the tree. Hitting a tree is simply bad luck and has no place in a scientific game. The player should estimate the distance the ball would have traveled if it had not hit the tree and play the ball from there, preferably atop a nice firm tuft of grass. -- Donald A. Metz
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
http://www.roumazeilles.net/news/en/wordpress/2007/02/07/testing-raid-1-on-o...
http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/008898.html
Good Luck
Marco
--- El sáb 6-sep-08, Andrew Joakimsen
De: Andrew Joakimsen
Asunto: Re: [opensuse] Recover md RAID-1. A: "Rodney Baker" Cc: opensuse@opensuse.org Fecha: sábado, 6 septiembre, 2008, 6:10 pm What is the command? Because mdadm /dev/md0 --fail /dev/sdb2 does not work. It says set device faulty failed for /dev/sdb2: no such device.... but cat /dev/sdb2 shows there is a device there! I have a 2nd system with the same config, I shut it down correctly with the 'halt" command and when I bring it back up I get the same problems due to the crappy RAID.
On Sat, 30 Aug 2008 08:30:52 Andrew Joakimsen wrote:
How do I recover a broken MD RAID-1.... Since
a backup with easy restore? I boot the system and it says:
md: radi5 personality registered md: raid4 personality registered md: md0 stopped. md: bind<sdb1> md: bind<sda1> md: md0: raid array is not clean -- starting background reconstruction raid1: raid set md0 active with 2 out of 2 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: peers -> run() failed... mdadm: failed to RUN_ARRAY /dev/md0: Input/output error mdadm: device /dev/md0 already active - cannot assemble it Trying manual resume from /dev/sdb2 Invoking userspace resume from /dev/sdb2 boot/82-resume.userspace.sh: line 48: /proc/splash: No such file or directory resume: libgcrypt version: 1.4.0 Trying manual resume from /dev/sdb2 Invoking in-kernel resume from /dev/sdb2 PM: starting manual resum from disk Waiting for device /dev/md0 to appear: ok /dev/md0: unknown volume type invalid root filesystem -- exiting to /bin/sh $
Before this system was working fine. I shut it down correctly with the 'halt' command. I plugged in a USB KVM
boottime (to provide needed info for Bug # 412476) after the system hung I powered it off, unplugged the kVM and then booted it back, only to get the above messages.
First thing I would do is to mark /dev/sdb2 as failed and remove it from the array. Then try booting the array with just /dev/sdb1 as a member. Once the array is started, reassemble the array and let it rebuild the mirror. I am assuming that /dev/sdb2 is the partition with the
On Sat, Aug 30, 2008 at 3:41 AM, Rodney Baker
wrote: there is no way to make that makes the system hang at problem - it may in fact be /dev/sdb1 (in which case do the above for /dev/sdb1).
Reading man mdadm may be useful.
-- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au ===================================================
A [golf] ball hitting a tree shall be deemed not to have hit the tree. Hitting a tree is simply bad luck and has no place in a scientific game. The player should estimate the distance the ball would have traveled if it had not hit the tree and play the ball from there, preferably atop a nice firm tuft of grass. -- Donald A. Metz
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
____________________________________________________________________________________ Yahoo! MTV Blog & Rock >¡Cuéntanos tu historia, inspira una canción y gánate un viaje a los Premios MTV! Participa aquí http://mtvla.yahoo.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Andrew Joakimsen wrote:
How do I recover a broken MD RAID-1.... Since there is no way to make a backup with easy restore? I boot the system and it says:
Boot up a rescue system, and tell us what "cat /proc/mdstat" says. A broken RAID1 usually means one broken disk, and that the array is running degraded. It is typically fixed by hot-removing the failed drive, then hot-adding a new one. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
I can't get the array to rebuild
# mdadm -s /dev/md0
md0 stopped
unbind < sda2 >
export_rdev(sda2)
unbind < sdb1 >
export_rdev(sdb1)
stopped /dev/md0
# mdadm -A /dev/md0 /dev/sda2 /dev/sdb1
RAID array is not clean -- starting background reconstruction
RAID 1: RAID set md0 active with 2 out of 2 mirrors
md0: bitmap file is out of date (8 < 9) -- forcing full recovery
md0: bitmap file is out of date, doing full recovery
md0: bitmap initialisation failed: -5
md0: failed to create bitmap (-5)
md: perf->run() failed ...
mdadm: failed to RUN_ARRAY /dev/md0: input/output error
I can force mount /dev/sda2 or /dev /sdb1 to mount as ext3 and I can
read the data just fine.
On Sat, Aug 30, 2008 at 5:44 AM, Per Jessen
Andrew Joakimsen wrote:
How do I recover a broken MD RAID-1.... Since there is no way to make a backup with easy restore? I boot the system and it says:
Boot up a rescue system, and tell us what "cat /proc/mdstat" says.
A broken RAID1 usually means one broken disk, and that the array is running degraded. It is typically fixed by hot-removing the failed drive, then hot-adding a new one.
/Per Jessen, Zürich
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Andrew Joakimsen wrote:
I can't get the array to rebuild
# mdadm -s /dev/md0 md0 stopped unbind < sda2 > export_rdev(sda2) unbind < sdb1 > export_rdev(sdb1) stopped /dev/md0
# mdadm -A /dev/md0 /dev/sda2 /dev/sdb1 RAID array is not clean -- starting background reconstruction RAID 1: RAID set md0 active with 2 out of 2 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: perf->run() failed ... mdadm: failed to RUN_ARRAY /dev/md0: input/output error
I can force mount /dev/sda2 or /dev /sdb1 to mount as ext3 and I can read the data just fine.
If the data is fine, I would assemble the array in degraded mode using just one of the two drives, then hot-add the 2nd one later. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 2008-08-31T11:02:49, Per Jessen
mdadm: failed to RUN_ARRAY /dev/md0: input/output error
I can force mount /dev/sda2 or /dev /sdb1 to mount as ext3 and I can read the data just fine.
If the data is fine, I would assemble the array in degraded mode using just one of the two drives, then hot-add the 2nd one later.
Exactly. It's the only choice he has right now anyway, given that he has corrupted the mirror by mounting the sda/sdb devices separately (they now hold divergent data, and trying to assemble both into a mirror would crash). Backup the data. Start the mirror in degraded mode with one drive. Wipe the other drive. Add it to the mirror (which will do a full resync). And then don't do it again ;-) Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
My solution was:
Solution:
* Reinstall OS on first drive drive (screw this md raid stuff... no
--force does not work... obviously md RAID is worthless)
* # mount -o ro /dev/sda2 /old-sys -t ext3
* Copy data, everything's intact.
On Mon, Sep 1, 2008 at 8:31 AM, Lars Marowsky-Bree
And then don't do it again ;-)
This only happen because I was trying to get data for a bug report, No. 412476, so are you saying I should stop reporting bugs? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Andrew Joakimsen wrote:
My solution was:
Solution:
* Reinstall OS on first drive drive (screw this md raid stuff... no --force does not work... obviously md RAID is worthless)
Blaiming the pen when the writing is bad, huh? There's nothing wrong with MD RAID at all, it works very well indeed. /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
Andrew Joakimsen wrote:
My solution was:
Solution:
* Reinstall OS on first drive drive (screw this md raid stuff... no --force does not work... obviously md RAID is worthless)
Blaiming the pen when the writing is bad, huh? There's nothing wrong with MD RAID at all, it works very well indeed.
/Per Jessen, Zürich
While I can understand your frustrations Andrew, be careful you do not throw the baby out with the bathwater. I have had moments with disk issues when using MD raid. Such situations occasionally involve increasing blood-pressure, high anxiety, large amounts of panic etc. Having made mistakes in the past in such situations, I now make sure a clear step-by-step checklist if available for our servers using MD-raid. In closing, let me add that MD raid is very stable, useful, and cost-effective. I have numerous production systems where I have physical disks -> MD raid -> LVM -> OS file system. Works great, allows much flexibility, and keeps on humming. -- --Moby They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
----- Original Message -----
From: "Andrew Joakimsen"
Solution:
* Reinstall OS on first drive drive (screw this md raid stuff... no --force does not work... obviously md RAID is worthless)
Funny it works great for me and I have run into hairier problems than you described. Involving for instance raid0 arrays with one or more bad drives, raid5 arrays with more than one bad drive, and was able to reasssemble and mount at least well enough to recover data. Sometimes it meant actually destroying and re-creating the raid metadata to mak the payload data _appear_ uncorrupt again, where I just had to realize that some of the data was in fact corrupt or maybe it wasn't, in one case the disk had a physically bad spot, but it was on spot that was actually unused by any files, just it was within an array and within a filesystem, causing both to be broken, but, cloning the disk with dd_rescue onto a new disk that was physically all good, allowed me to re-create a new raid structure and repair the filesystem while managing to preserve the files. I could not have done that purely by the info provided in the man page for mdadm, and no easy magic front-end exists either that I know of (unless you count paying a consultant), but, googling for other peoples experiences, and mostly a lot of experimentation on test disks and test data showed me in more detail what exactly some of the things really mean that are mentioned but not explained in the man pages, and showed me what exactly certain commands & actions would do in certain situations. After *that* I was able to make md raid do not only anything I wanted, but even more than what the expensive and easy to use hardware raid cards can do (except for the one unavoidable particular hardware feature of making an array appear to be a plain disk to the motherboard bios). Rather the opposite of worthless. For starters, it's worth far more than the time & effort it takes to learn how to use it. So, if you don't want to put in that time learning how to use something, then yes indeed you should not use it. Brian K. White brian@aljex.com http://www.myspace.com/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Brian K. White wrote:
----- Original Message ----- From: "Andrew Joakimsen"
Solution:
* Reinstall OS on first drive drive (screw this md raid stuff... no --force does not work... obviously md RAID is worthless)
Funny it works great for me <snip>
So, if you don't want to put in that time learning how to use something, then yes indeed you should not use it.
Brian K. White
Well said, "you gotta pay the price of admission -- if you want a ticket to ride" -- David C. Rankin, J.D., P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (8)
-
Andrew Joakimsen
-
Brian K. White
-
David C. Rankin
-
Lars Marowsky-Bree
-
Marco Palominos
-
Moby
-
Per Jessen
-
Rodney Baker