[opensuse] Recover Linux Raid 0 with 2x2port 3Ware Raid 1 cards
Hi all, I have an openSUSE 11.4 32Bit system installed on its own disk. Additionally attached are 2 mirrored 2-Port 3Ware raid cards, striped as Linux Raid, then mounted as /home/Data (ext4). The softraid was mounted after the OS installation within fstab, because I wanted Data to be on the raid and not be dependent on or be part of the OS FS directly. I've never had a softraid before. Hard-Raid recover is painless with 2Ware, and immune to failing OS. I got a backup of the 4 TB raid; it has 2 TB data and a restore takes quite a while. Now I lost my OS Installation, the disk is unresponsive. Before I go ahead, I was wondering which way to go. I am planning to install a new system and just hope the soft raid will be there when I partition. What should I watch out for? I have searched and read some ways to recover defective raids in the TLDP.org and others, but the raid was ok. It was a mount as /home/Data directory. I will mount it after the new installation as /home/Data again. Will that work ok? How does the raid-setup work? Is the raid-partition info kept on the disks? There is no controller with bios as in hardware raid to store the Linux softraid info. As far as I know, HW-raids save the raid-partition info on the disks and on the controller. How is it in softraid? :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 08/31/2011 05:45 PM, LLLActive@GMX.Net wrote:
Hi all,
I have an openSUSE 11.4 32Bit system installed on its own disk. Additionally attached are 2 mirrored 2-Port 3Ware raid cards, striped as Linux Raid, then mounted as /home/Data (ext4).
The softraid was mounted after the OS installation within fstab, because I wanted Data to be on the raid and not be dependent on or be part of the OS FS directly.
I've never had a softraid before. Hard-Raid recover is painless with 2Ware, and immune to failing OS. I got a backup of the 4 TB raid; it has 2 TB data and a restore takes quite a while.
Now I lost my OS Installation, the disk is unresponsive. Before I go ahead, I was wondering which way to go.
I am planning to install a new system and just hope the soft raid will be there when I partition. What should I watch out for? I have searched and read some ways to recover defective raids in the TLDP.org and others, but the raid was ok. It was a mount as /home/Data directory. I will mount it after the new installation as /home/Data again. Will that work ok? How does the raid-setup work? Is the raid-partition info kept on the disks? There is no controller with bios as in hardware raid to store the Linux softraid info. As far as I know, HW-raids save the raid-partition info on the disks and on the controller. How is it in softraid?
:-) Al
Oh no! Seriously, I think you will be fine. Both dmraid (fake raid) and mdraid (software raid) are mature and robust enough to easily rebuild/recover for a mirrored raid failure if there is a complete crater of one of the disks in the array. If I understand your post well enough -- you don't even have that problem. If I have this straight you have: ---------------- Computer OS Disk (failed) 3Ware - 2 port striped (OK) ]______ mirrored mounted as /home/Data 3Ware - 2 port striped (OK) ] ---------------- If the OS disk is what is dead, you should be able to reload the OS on a new disk without even connecting the 3Ware controllers. After you are happy with the new install, then just reconnect/plug in the 3Ware controllers and re-create the mirrored array using the same config as you had originally. I have never had much of a problem getting a new install to recognize and use and existing software raid array. You can even move them machine to machine without a problem. Fakeraid (dmraid) is a bit more of a problem to move from box to box due to the BIOS dependent array setup and rebuild. Others may need to fill in the fine point, because I haven't ever done the recreation of a mirrored set of striped arrays before on a new install. However, from the standpoint of just reloading the OS and then recreating the mirror of the 3Ware controllers and mounting as /home/Data -- it doesn't seem that bad. Good luck! -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
)On 9/1/11 3:36 AM, David C. Rankin wrote:
---------------- Computer OS Disk (failed) 3Ware - 2 port striped (OK) ]______ mirrored mounted as /home/Data 3Ware - 2 port striped (OK) ] ----------------
Hi David and Joaquin, Actually, the config is this (my previous description may be a little confusing): ---------------- Computer OS Disk (failed) AMCC 9650SE-2LP Raid Controller (SATA II, PCI-e) 3Ware - 2 Disks mirrored (OK) ]______|-- Linux striped the 2 mirrors AMCC 9650SE-2LP Raid Controller (SATA II, PCI-e) 3Ware - 2 Disks mirrored (OK) ]______| (mounted as /home/Data) ---------------- Naturally, the controllers can only do Raid 1 or 0. I mirror 2 HD's on each controller. If one disk in the 2 mirrors fail, I can rebuild it with the HW raid controller. If 2 on the same controller mirror fail, I have a problem. The chance of this double failure on the same controller is slight. I have not yet in the last 6-7 years have a 3Ware controller failing. Not sure what will happen when one controller fails. I assume I can just replace the controller and it will pick up the config from the disks. Anyone know if a controller replacement will pick up the mirror? The 4 port 3Ware controllers (9650SE-4LPMLRaid Controller) do work if replaced, because the raid config is also on the disks, as far as I can hopefully recall correctly. I have not had a 3Ware fail yet, thus I lack the experience. I am considering replacing the 2x2 Ports with a 1x4 Port to have a HW-raid 10 without needing a Linux soft raid, for more safety; I just have to scratch the money together first. I will try with the openSUSE Live CD as Joaquin suggests, to make another backup of the raid (to be sure), and do a new install of the OS as David suggests. Thanks for the tips. :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/1/2011 6:49 AM, LLLActive@GMX.Net wrote:
I am considering replacing the 2x2 Ports with a 1x4 Port to have a HW-raid 10 without needing a Linux soft raid, for more safety;
In my humble opinion software raid is much safer than hardware raid if you are worried about controller failure or disk failure. With Software raid: 1) Any compatible controller will do. On the Motherboard, on an add-in card, in a new machine. 2) You can mix drive types, even drive sizes. 3) You can span controllers. (your current setup kills the raid if one controller dies) 4) Replacement controllers don't get harder and harder to find as time goes on Also, but not related to failure issues 4) Software raid improves as you upgrade the server - its not stuck at the speed you bought it 5) imposes very little load on the server (contrary to the FUD of the past) 6) easily configured in modern Distros, especially when your machine has 4 or 6 SATA ports. Yast does it. I've been the 3Ware route. Its nice in that if done correctly your computer OS need never be aware that a raid is involved at all. But over the years I have migrated to software raid for the ease of maintenance, migration, disk replacement. I've used raid controllers with their raid software turned off (via jumpers) just for the ports in the past, but my current servers have scads of Sata/eSATA ports making this unnecessary. I still endeavor to put the drives of an array on different controllers although I've never seen any measurable difference in doing so. Non of this is helpful in your current predicament, just a philosophical discussion. -- _____________________________________ ---This space for rent--- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, Sep 1, 2011 at 12:29, John Andersen <jsamyth@gmail.com> wrote:
software raid for the ease of ... disk replacement.
How is that? Just the other day I replaced a failed SCSI drive in an HP server with the onboard RAID controller. You remove the failed drive while the system is running, and that is it. If it was a Linux software RAID I would have to run commands and hope they were correct. -- Med Vennlig Hilsen, A. Helge Joakimsen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/1/2011 5:14 PM, Joaquin Sosa wrote:
On Thu, Sep 1, 2011 at 12:29, John Andersen<jsamyth@gmail.com> wrote:
software raid for the ease of ... disk replacement.
How is that? Just the other day I replaced a failed SCSI drive in an HP server with the onboard RAID controller. You remove the failed drive while the system is running, and that is it.
If it was a Linux software RAID I would have to run commands and hope they were correct.
It was only easy because you were wise and planned ahead and were lucky that your planning worked. Meaning lucky the special spare drive wasn't lost or damaged in all the time it was sitting around, and the thing that broke was a thing you had a spare of. Did you have a spare controller? Did you have another motherboard/case that could accept this controller? You just demonstrated how an inflexible solution can be somewhat mitigated by dint of effort and planning. Planning is wise, but that doesn't change the fact that it's an inherently inflexible solution that's waiting for you to slip up to burn you. If you didn't happen to have any spare scsi disks on hand, most people today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight, saturday delivery, special courriers etc. The only physically possible way to fix it immediately is to just already have the spares on-hand, not lost, not damaged. Conversely, a flexible solution requires active effort and an extremely improbably amount of bad luck to concoct a situation that can't be addressed somehow. You could very easily discover that your one or few special spare drive or spare controller were damaged in that last office move, or by some water or other mis-handling that no one noticed until now because no one even knew what that forgotten thing at the back of that drawer was even for... but it's almost impossible that you can't put your hand on _some_, _any_, sort of hard drive or controller at any given time in any given place. Even in the middle of the night when stores are closed you could rob something from any desktop. Heck, you could even go down to the 24 hour grocery store and buy 10 16G usb thumb drives to replace a dead 146G scsi disk. I wouldn't run that way one minute longer than necessary of course, but your array would be optimal and you could actually lose another scsi disk while waiting for the proper disk to get shipped. It's the difference between one possible way to deal with a situation vs infinite possible ways to deal with a situation. No comparison. No contest. -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Brian K. White wrote:
Planning is wise, but that doesn't change the fact that it's an inherently inflexible solution that's waiting for you to slip up to burn you.
If you didn't happen to have any spare scsi disks on hand, most people today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight,
If ordered before 17:00 and in stock, delivery usually follows the next day with regular Priority Mail. (that's from a normal web-shop available to everyone). Regardless, a hardware array will typically be configured with 1 spare drive sitting idle, so you'd be replacing the failed drive a while after your array had already recovered. No need to hurry at all. -- Per Jessen, Zürich (18.0°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/1/2011 11:38 PM, Per Jessen wrote:
Brian K. White wrote:
Planning is wise, but that doesn't change the fact that it's an inherently inflexible solution that's waiting for you to slip up to burn you.
If you didn't happen to have any spare scsi disks on hand, most people today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight,
If ordered before 17:00 and in stock, delivery usually follows the next day with regular Priority Mail. (that's from a normal web-shop available to everyone).
So posts the man from Zürich, a country of 15,940 square miles ;-)
Regardless, a hardware array will typically be configured with 1 spare drive sitting idle, so you'd be replacing the failed drive a while after your array had already recovered. No need to hurry at all.
Plus one on the hot spare(s). But that raises another issue. When the customer called me sevral years ago because his system was down, I found that his hardware controller was reporting a degraded raid 5. Turns out the hot spare had been put into use 6 months earlier, and a second drive had failed. Since they seldom looked at the server, and never rebooted it they got no warning of this at all. If my Software Raid so much as hiccups, I get an email from mdadm. If you buy all your disks for the array at the same time you have to be prepared for them to fail very close to each other. -- _____________________________________ ---This space for rent--- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Fri, Sep 2, 2011 at 16:16, John Andersen <jsamyth@gmail.com> wrote:
On 9/1/2011 11:38 PM, Per Jessen wrote:
Brian K. White wrote:>>> If you didn't happen to have any spare scsi disks on hand, most people
today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight,
If ordered before 17:00 and in stock, delivery usually follows the next day with regular Priority Mail. (that's from a normal web-shop available to everyone).
So posts the man from Zürich, a country of 15,940 square miles ;-)
Regardless, a hardware array will typically be configured with 1 spare drive sitting idle, so you'd be replacing the failed drive a while after your array had already recovered. No need to hurry at all.
Plus one on the hot spare(s).
Since they seldom looked at the server, and never rebooted it they got no warning of this at all.
If my Software Raid so much as hiccups, I get an email from mdadm.
If you buy all your disks for the array at the same time you have to be prepared for them to fail very close to each other.
FWIW: In my scenario it's a 1U server with only 2 drive bays. It took a week to get the replacement drive, but when the drive failed I got an email alert from Nagios, a free/open-source software to monitor servers. It's bad practice to setup a server and not put these measures in place. If you really value your data you shouldn't have some system in place to monitor it, be it you login every day and do the checks manually, or you implement something to do it for you on an automated basis. If you can't afford downtime, more important than what RAID you use, is how have you planned for when (not if) downtime happens? Wouldn't you feel silly if you setup the best RAID in the world and suffered a data loss because someone tripped over the extension cord between the server and the battery backup? -- Med Vennlig Hilsen, A. Helge Joakimsen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Joaquin Sosa wrote:
If you can't afford downtime, more important than what RAID you use, is how have you planned for when (not if) downtime happens? Wouldn't you feel silly if you setup the best RAID in the world and suffered a data loss because someone tripped over the extension cord between the server and the battery backup?
Poor datacentre design - workplace safety and all that. -- Per Jessen, Zürich (20.7°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/2/2011 4:34 PM, Joaquin Sosa wrote:
If you can't afford downtime, more important than what RAID you use, is how have you planned for when (not if) downtime happens? Wouldn't you feel silly if you setup the best RAID in the world and suffered a data loss because someone tripped over the extension cord between the server and the battery backup?
Man you got that right! -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
John Andersen wrote:
On 9/1/2011 11:38 PM, Per Jessen wrote:
Brian K. White wrote:
Planning is wise, but that doesn't change the fact that it's an inherently inflexible solution that's waiting for you to slip up to burn you.
If you didn't happen to have any spare scsi disks on hand, most people today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight,
If ordered before 17:00 and in stock, delivery usually follows the next day with regular Priority Mail. (that's from a normal web-shop available to everyone).
So posts the man from Zürich, a country of 15,940 square miles ;-)
Point taken :-) (although the service is the same in Germany).
Regardless, a hardware array will typically be configured with 1 spare drive sitting idle, so you'd be replacing the failed drive a while after your array had already recovered. No need to hurry at all.
Plus one on the hot spare(s).
But that raises another issue. When the customer called me sevral years ago because his system was down, I found that his hardware controller was reporting a degraded raid 5. Turns out the hot spare had been put into use 6 months earlier, and a second drive had failed.
Since they seldom looked at the server, and never rebooted it they got no warning of this at all.
SMART and SNMP monitoring. (or equivalent).
If you buy all your disks for the array at the same time you have to be prepared for them to fail very close to each other.
Yes, if you're planning on buying lots of disks, stage your purchases. -- Per Jessen, Zürich (20.2°C) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
John Andersen wrote:
Since they seldom looked at the server, and never rebooted it they got no warning of this at all.
I take it they weren't using 3ware then? It happily sends emails on status changes and tells me when it's running a verify etc etc.
If my Software Raid so much as hiccups, I get an email from mdadm.
I use both 3ware and mdadm and am happy with both. I find 3ware a bit easier to drive but mdadm is more flexible.
If you buy all your disks for the array at the same time you have to be prepared for them to fail very close to each other.
One way to ameliorate this is to buy disks from more than one manufacturer, but it certainly is best to buy them at intervals. Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/2/2011 2:38 AM, Per Jessen wrote:
Brian K. White wrote:
Planning is wise, but that doesn't change the fact that it's an inherently inflexible solution that's waiting for you to slip up to burn you.
If you didn't happen to have any spare scsi disks on hand, most people today would not be able to lay their hands on one in less than 24 to 48 hours no matter how much they're willing to pay for overnight,
If ordered before 17:00 and in stock, delivery usually follows the next day with regular Priority Mail. (that's from a normal web-shop available to everyone). Regardless, a hardware array will typically be configured with 1 spare drive sitting idle, so you'd be replacing the failed drive a while after your array had already recovered. No need to hurry at all.
A day is an eternity. A hot spare is both aging at the same rate as all the active drives, and is still as I said planning overcoming the limitations of the system, not a system that lacks the limitation in the first place. There is no getting around that simple math. Very often, if everyone did everything they were supposed to, it's good enough. Counting on, depending on, requiring, everyone, or even just yourself, always doing what they're supposed to is less than wise. And even when they do it doesn't fix the inherent limitations. A few months ago one of the hosting facilities where I have a rack of servers had a power issue which killed half of the sata ports in some of my machines. It's not clear from remote whether it's the backplanes or on the controller cards that are fried. But it is clear from having the on-site staff move drives around to different hot swap bays that the drives themselves weren't actually hurt at all. Spare drives, be they in the array as hot spares or on the shelf, wouldn't have helped at all. Not being tied to any special hardware helped. The array itself, as in the drives and the data on the drives, was actually still fine. They could all be moved to a neighboring machine that had different hardware and run. Now go ahead and say "You're supposed to use ups's to avoid having power issues." -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/1/2011 12:29 PM, John Andersen wrote:
On 9/1/2011 6:49 AM, LLLActive@GMX.Net wrote:
I am considering replacing the 2x2 Ports with a 1x4 Port to have a HW-raid 10 without needing a Linux soft raid, for more safety;
In my humble opinion software raid is much safer than hardware raid if you are worried about controller failure or disk failure.
With Software raid: 1) Any compatible controller will do. On the Motherboard, on an add-in card, in a new machine. 2) You can mix drive types, even drive sizes. 3) You can span controllers. (your current setup kills the raid if one controller dies) 4) Replacement controllers don't get harder and harder to find as time goes on
Also, but not related to failure issues 4) Software raid improves as you upgrade the server - its not stuck at the speed you bought it 5) imposes very little load on the server (contrary to the FUD of the past) 6) easily configured in modern Distros, especially when your machine has 4 or 6 SATA ports. Yast does it.
I've been the 3Ware route. Its nice in that if done correctly your computer OS need never be aware that a raid is involved at all. But over the years I have migrated to software raid for the ease of maintenance, migration, disk replacement. I've used raid controllers with their raid software turned off (via jumpers) just for the ports in the past, but my current servers have scads of Sata/eSATA ports making this unnecessary. I still endeavor to put the drives of an array on different controllers although I've never seen any measurable difference in doing so.
Non of this is helpful in your current predicament, just a philosophical discussion.
+1 all the way. Used many hardware raid cards over the years on several varieties of unix. It was ok back in the old days when there was no other choice, but since years ago I can't stand the hassle of dealing with hardware raid cards. When you only have one machine, and only have one card, and it's nowhere near time to migrate or upgrade, a hardware card looks enticing. It's easy, and moving day is years away and you can afford the time to do a slow full copy to some different new hardware since it's just your one machine and you have nothing else to do and only yourself not a boatload of angry paying customers to worry about. If you have the luxury of living in a well funded or at least large, and utterly homogeneous, environment "We're an [HP|Dell|Intel|FooXYZ] shop, we have an account with FooXYZ and even though they don't make this FooXYZ card any more and even though it's all proprietary and only fits in this one FooXYZ case, we have piles of'em lying around..." then a hardware raid card looks enticing. If you're trying to grant the benefits of raid to an OS that can't do it in software, then a hardware raid card looks enticing. Any other situation and they're great right up until there's a problem and then they're just the reason you're telling the customer or boss any of a dozen different excuses instead of handling the problem without even resorting to backups. Perhaps after some googling and some experimenting with small ramdisk images just to be sure before doing something with the real disks. And/or after taking dd_rescue clones of the real disks instead of even risking touching them at all. Excuses like "I can't get one of those cards sooner than 4 days from now (by the way it's $800)" or "The card and drives are ok but I don't have another motherboard handy that has the required PCI-X slot for the raid card." or "The raid card firmware doesn't provide an option to do what I know needs to to happen to save 99% or 100% of the data and accept a tiny bit of corruption on a part of the disk that isn't even used by any files you wanted..." or "The boot message is ambiguous, and the cards manual doesn't even mention the topic, so we don't dare answer yes or no until we get the manufacturers tech support on the phone, and we can't reach them right now for some reason" or "raid6 would have saved the day but they didn't have raid6 when this card was made" etc... All of those show stoppers are not show stoppers if the array were software raid. Since it's hardware agnostic, you are free to piece together whatever solution is physically possible with any/all of the materials at hand (ie, the local BestBuy out in the middle of nowhere, or even less than that, just the other pc's in someones office and your and their homes in the middle of the night in the middle of nowhere) It's very difficult to encounter a situation that can't bet met somehow when every scrap of computer tech within reach could potentially be used. You have to know a little more but the important point is it's possible vs not-possible. -- bkw -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Boot openSUSE 11.4 GNOME live CD and disk utility should see the RAID array and have the ability to start the RAID and mount the resulting volume. Why such the odd arrangement? It would be less headache to use RAID 0+1 native to your RAID card. The issue is your hardware RAID cards are worthless in your configuration. Say you used desktop-grade hardware in your system and it develops bad RAM, it the MD-RAID driver will write the corrupt data to the array. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 9/1/11 4:21 AM, Joaquin Sosa wrote:
Boot openSUSE 11.4 GNOME live CD and disk utility should see the RAID array and have the ability to start the RAID and mount the resulting volume.
Why such the odd arrangement? It would be less headache to use RAID 0+1 native to your RAID card. It is a 2-Port card, i.e. only Raid 0 or 1. The issue is your hardware RAID cards are worthless in your configuration. Say you used desktop-grade hardware in your system and it develops bad RAM, it the MD-RAID driver will write the corrupt data to the array. ... ok, but how will bad ram not do the same to any other set-up?
I was a bit nervous about the SW-Raid. With HW-Raid, you open the 3Ware Bios, rebuild and start the OS. Painless. I have not used SW-Raids before. Today I succeeded in recovering the SW-Raid stripe. In YaST -> Services -> Partitioner I just redefined the two HW-Mirrors as a Raid 0, without any formatting or mount-point. When I first looked in the FS with Gnome and KDE Data Managers, there was nothing on the /home/Data volume when I mounted it. On Gnome Nautilus there was a new entry on the left of an unmounted Volume, named exactly as I named it in the previous installation; I felt a heartbeat skip. It wanted some authorisation before it could mount it. It failed, because I used the original root password, which it did not recognise. I went to the CLI Terminal and su -, then looked, and it was all there, I made another full backup to a USB HD, just in case. Now I can try some alternatives. Phew.... I was relieved. It seems the SW-Raid also has the raid info on the disks. Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (7)
-
Brian K. White
-
Dave Howorth
-
David C. Rankin
-
Joaquin Sosa
-
John Andersen
-
LLLActive@GMX.Net
-
Per Jessen