[opensuse] Failed RAID Please Help

older
[opensuse] lsb_release not present...

Andrei Verovski (aka MacGuru)

23 Jul 2008 23 Jul '08

16:03

Hi ! I've got a Linux server with failed 2-disk software RAID (and valuable data). fsck.ext3 tries to check it on startup but fails with "Unrecovered Read Error - Auto reallocate failed" errors. Looks like some blocks gone bad. fstab entries looks reads: /dev/md0 /home ext3 acl, user_xattrs /dev/sda1 /data2 ext3 defaults 1 1 /dev/sda2 /data3 ext3 defaults 1 1 First of all, I could not figure out - is this software RAID 0 or RAID 1 (it is not mountable so I cannot figure out MD0 refers to RAID device number not to its level)? if this is RAID 1 I can simply remove HDs one by one and eliminate one with bad blocks. If this is RAID 0, situation seems to be worse. Anyone have any idea how to recover it? Now I am running "fsck.ext3 -p -v -c /dev/md0", but its seems it cannot eliminate bad blacks. Anyone have any idea what to do next? Thanks in advance for any suggestion(s) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Show replies by date

Rui Santos

23 Jul 23 Jul

16:38

Andrei Verovski (aka MacGuru) wrote:

...

Hi !

Hi Andrei,

...

I've got a Linux server with failed 2-disk software RAID (and valuable data). fsck.ext3 tries to check it on startup but fails with "Unrecovered Read Error - Auto reallocate failed" errors. Looks like some blocks gone bad.

fstab entries looks reads:

/dev/md0 /home ext3 acl, user_xattrs /dev/sda1 /data2 ext3 defaults 1 1 /dev/sda2 /data3 ext3 defaults 1 1

First of all, I could not figure out - is this software RAID 0 or RAID 1

execute 'cat /proc/mdstat' Then you get intelligible information about it.

...

(it is not mountable so I cannot figure out MD0 refers to RAID device number not to its level)? if this is RAID 1 I can simply remove HDs one by one and eliminate one with bad blocks. If this is RAID 0, situation seems to be worse.

Lets hope is a RAID1 but, I think that if it was RAID1, the failing HD would probably already have been kicked out from the array. Hope it's a RAID1 and lets hope I'm wrong...

...

Anyone have any idea how to recover it?

Well, first of all, cease any activity on any of HD that contain the array. DO NOT EVER run fsck on top of failing HD. It can damage your precious files even further. Do the following (other may suggest another method, of course ): - Boot with openSUSE 11.0 rescue system found on the Install DVD/CD - Try to assemble all your arrays with mdadm. Check first if they are not already assembled at boot. - mount your RAID filesystem with the read-only attribute, like 'mount -t ext3 -o ro /dev/md0 /mnt' - cd onto /mnt - tar the entire filesystem to another HD, like 'tar --preserve -zcvf destinationdir/destinationfile.tar.gz * > destinationdir/destinationfile.log.txt' Now if you have unrecoverable errors, tar will claim that the files containing the errors were padded with zeros. Check the log file to see witch were the unlucky ones. Well, now you have the array filesystem on a gziped tar archive. Almost all files should be binary equal to those present on the array filesystem. The exceptions are the files tar tar padded with zeros. Check if the padded files are expendable. If so, you just got lucky. If not, then another approach is needed. Just try the few steps I mentioned and report back the results.

...

Now I am running "fsck.ext3 -p -v -c /dev/md0", but its seems it cannot eliminate bad blacks.

fsck.ext3 will not correct/eliminate bad blocks. When an 'Auto Relocate failed' error is reported usually means that the HD was not able to relocate those sectors to a clean/safe one. I guess other can explain better or correct me: When a HD detects that a specific sector cannot be read, it tries a few consecutive reads on that sector. If it succeeds it will copy the data to a free ( never used ) sector and remap the internal sector map so that specific never gets used again. The only thing fsck.ext3 can do is force another relocation tries but, the result can be worst, and other files/sectors can be damaged.

...

Anyone have any idea what to do next?

Seems you have a few hours of work. Good luck...

...

Thanks in advance for any suggestion(s)

-- Rui Santos http://www.ruisantos.com/ Veni, vidi, Linux! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

John Andersen

18:00

On Wed, Jul 23, 2008 at 9:03 AM, Andrei Verovski (aka MacGuru) wrote:

...

Hi !

I've got a Linux server with failed 2-disk software RAID (and valuable data). fsck.ext3 tries to check it on startup but fails with "Unrecovered Read Error - Auto reallocate failed" errors. Looks like some blocks gone bad.

fstab entries looks reads:

/dev/md0 /home ext3 acl, user_xattrs /dev/sda1 /data2 ext3 defaults 1 1 /dev/sda2 /data3 ext3 defaults 1 1

First of all, I could not figure out - is this software RAID 0 or RAID 1 (it is not mountable so I cannot figure out MD0 refers to RAID device number not to its level)? if this is RAID 1 I can simply remove HDs one by one and eliminate one with bad blocks.

Yes, If it is indeed software raid, you can simply unpower one of the drives and see if you can mount the other directly. If so, recover data, and rebuild. If not, try the other drive. If its raid0 you have bigger problems, about the same problems is you had used LVM and skipped raid all together, but even given the lack of redundancy, LVM makes more sense than raid0 in linux. So I'm guessing no sane person would use raid0 just to concatenate drives in linux, and you probably don't have raid0. -- ----------JSA--------- Sig line deleted for the humor impaired. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Rodney Baker

22:47

On Thu, 24 Jul 2008 03:30:54 John Andersen wrote:

...

[...snip...]

If its raid0 you have bigger problems, about the same problems is you had used LVM and skipped raid all together, but even given the lack of redundancy, LVM makes more sense than raid0 in linux. So I'm guessing no sane person would use raid0 just to concatenate drives in linux, and you probably don't have raid0.

Hmmm; last time I saw him my doctor said he thought I was still sane, yet I'm using raid0 for exactly that purpose... My previous experience with LVM was that it was a PITA to set up and then it got corrupted due to a power outage. As a result /home was completely hosed :-(. I learned from that - I won't use LVM again. /home is now on a raid1 array, with nightly backups to an external drive, and non-critical data (e.g. stuff downloaded from the net) goes onto a raid0 array that I used to concat three smaller partitions that were previously used for other purposes. Regards, -- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au =================================================== "All flesh is grass" -- Isiah Smoke a friend today.

John Andersen

24 Jul 24 Jul

00:50

On Wed, Jul 23, 2008 at 3:47 PM, Rodney Baker wrote:

...

On Thu, 24 Jul 2008 03:30:54 John Andersen wrote:

...
[...snip...]

If its raid0 you have bigger problems, about the same problems is you had used LVM and skipped raid all together, but even given the lack of redundancy, LVM makes more sense than raid0 in linux. So I'm guessing no sane person would use raid0 just to concatenate drives in linux, and you probably don't have raid0.

Hmmm; last time I saw him my doctor said he thought I was still sane, yet I'm using raid0 for exactly that purpose...

My previous experience with LVM was that it was a PITA to set up and then it got corrupted due to a power outage. As a result /home was completely hosed :-(.

I learned from that - I won't use LVM again. /home is now on a raid1 array, with nightly backups to an external drive, and non-critical data (e.g. stuff downloaded from the net) goes onto a raid0 array that I used to concat three smaller partitions that

Don't assume from the fact that you have not YET had a failure on raid0 that it is any safer than LVM. Its about the same risk. Loss of any of one of the partitions may cause loss of ALL data. Depending on what file system you format the raid0 with it could be really serious to just have a couple sectors go bad. Raid0 composed of 3 drives TRIPLES you chance of loss, because a fault on any ONE drive may render the whole thing borken. If you had a 1 in 10000 chance of a drive failure previously, you now have a 3 in 10000 chance. Really, If I had three drives of approximately the same size, I would accept 2 drives worth of storage and sacrifice the other drive to the gods of Raid5. -- ----------JSA--------- There are 10 kinds of people in this world, those that can read binary and those that can't. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Brian K. White

03:24

----- Original Message ----- From: "John Andersen" Cc: Sent: Wednesday, July 23, 2008 8:50 PM Subject: Re: [opensuse] Failed RAID Please Help

...

On Wed, Jul 23, 2008 at 3:47 PM, Rodney Baker wrote:

...
On Thu, 24 Jul 2008 03:30:54 John Andersen wrote:

...
[...snip...]

If its raid0 you have bigger problems, about the same problems is you had used LVM and skipped raid all together, but even given the lack of redundancy, LVM makes more sense than raid0 in linux. So I'm guessing no sane person would use raid0 just to concatenate drives in linux, and you probably don't have raid0.

Hmmm; last time I saw him my doctor said he thought I was still sane, yet I'm using raid0 for exactly that purpose...

My previous experience with LVM was that it was a PITA to set up and then it got corrupted due to a power outage. As a result /home was completely hosed :-(.

I learned from that - I won't use LVM again. /home is now on a raid1 array, with nightly backups to an external drive, and non-critical data (e.g. stuff downloaded from the net) goes onto a raid0 array that I used to concat three smaller partitions that

Don't assume from the fact that you have not YET had a failure on raid0 that it is any safer than LVM. Its about the same risk. Loss of any of one of the partitions may cause loss of ALL data.

Depending on what file system you format the raid0 with it could be really serious to just have a couple sectors go bad.

Raid0 composed of 3 drives TRIPLES you chance of loss, because a fault on any ONE drive may render the whole thing borken. If you had a 1 in 10000 chance of a drive failure previously, you now have a 3 in 10000 chance.

Indeed. Further, I think a more accurate and scarier way to represent it is: If the MTBF of one drive is 600,000 hours, Then MTBF of a 3 drive raid0 is only 200,000 hours. (600k is a typical estimate for commodity sata drives) Worse, commodity sata drives only have a duty cycle of 30% So, if you are running these 24/7 instead of 8/7 then the individual mtbf drops to merely 200,000 hours and the mtbf for the array drops to merely 66,666 So the lifetime of the array is only a little better than 10% of the nominal/advertised lifetime of a drive. And, on top of all that, remember the M in MTBF, MEAN time before failure. That 66,666 hour estimate is the average, so half of all such arrays will die even sooner, much sooner. 7.6 years sounds like a long time but that's total drive failure. Data corruption happens long before that. I don't know where they get those huge mtbf estimates anyways. I see drives fail all the time in as little as a year. Some last 10 years, true, but many last 1, 2, or 3. If your power conditioning, air temperature and cleanliness aren't all *perfect* that surely drops all the numbers way down too. Running hot and suffering power fluctuations and surges both on the power connector and on the data connector definitely kills drives early, and what most people have in their homes is pretty bad power, pretty dusty air, and not cold enough nor enough air flow. Those ridiculously long mtbf estimates are probably simply whats required just to make a drive last a year or so in normal conditions. Don't bet that your ups does any power conditioning either. The cheap ones mostly don't. They are simply switches and as long as there is power available from the wall, you are directly connected to the wall. Maybe there is a little surge absorbtion in play like what a cheap power-strip has, which is just about worthless for the purposes of this topic. It's value is that maybe you don't lose you whole room full of hardware when lightening hits your circuit. It does just about nothing for the 24/7 general dirtiness of most wall power, which gradually kills hardware a lot sooner than if the power was perfect 24/7 over the same period of time. I'm seeing one out of ten drives die within 3 years even _in_ perfectly controlled and protected environment, consistent low temperature, good strong airflow over the drives, 100% power conditioning ups's, closed room (no constant influx of new dust) so the parts all stay clean, And that's with 100% duty cycle 5 year warranty u320 scsi drives not just commodity ide and sata drives. By die I also mean merely that the raid card they are connected to has marked them bad, meaning it detected a single data discrepency. That's a far cry from total drive failure and a lot easier to happen and happens a lot sooner on average. Conversely, I have seen linux's software raid mark drives bad when really there was nothing wrong with them. Depending on the controller I've seen dmraid mark up to 50% of drives bad when they were really all 100% ok. Those same exact drives, on the same exact motherboards & cases, in the same exact server farm/power/air temp/etc..., running the same exact OS & software, but plugged into a real raid card instead of using software raid, the drives were fine and still are to this day so far, under heavier load actually since the servers in question never made it out of testing/vetting while the drives were "dying" so often, but are in full production now. That was just using raid10 in software too, not even the extra complication of raid5. Raid0 has it's uses, but it definitely should be used with very open eyes and the acceptance that the array will likely die and all data will be gone in as little as a year or maybe three. Just do whatever you have to to somehow arrange to be ok with that. -- Brian K. White brian@aljex.com http://www.myspace.com/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Rodney Baker

04:51

On Thu, 24 Jul 2008 10:20:31 John Andersen wrote: [...snip...]

...

...
Hmmm; last time I saw him my doctor said he thought I was still sane, yet I'm using raid0 for exactly that purpose...

Actually, I mis-spoke. I'm not using Raid 0 (which is data striping and is used to increase performance by spreading the i/o across multiple interfaces and take advantage of parallel read/writes), but linear raid which is different and concatenates multiple partitions into a single logical volume a-la LVM (but managed differently as I understand).

...

[...] Don't assume from the fact that you have not YET had a failure on raid0 that it is any safer than LVM. Its about the same risk. Loss of any of one of the partitions may cause loss of ALL data.

Depending on what file system you format the raid0 with it could be really serious to just have a couple sectors go bad.

Raid0 composed of 3 drives TRIPLES you chance of loss, because a fault on any ONE drive may render the whole thing borken. If you had a 1 in 10000 chance of a drive failure previously, you now have a 3 in 10000 chance.

Really, If I had three drives of approximately the same size, I would accept 2 drives worth of storage and sacrifice the other drive to the gods of Raid5.

Unfortunately they are 3 partitions spread across 2 physical drives (2 on /dev/sda, on one /dev/sdb) that are not even approximately the same size. I just recently replaced a 150GB drive with a 640GB unit; the 250GB drive that was previously the boot drive was moved to the secondary interface. To avoid reinstalling I simply dd'd the 250GB drive to the 640GB drive, then repartitioned/reformatted the 250GB drive. In doing so I created a new 200GB /home partition as Raid 1 mirrored on both drives. I also relocated a couple of other partitions on the larger drive (so that I could make them bigger) but that left me with three non-contiguous partitions (1x 100GB, 1x 42GB and 1 around 85GB) that I figured would be much more useful as one large partition around 220GB, hence the use of linear raid to concat them together. Yes, I know that it is not ideal but the data stored on there is non-critical and if it does disappear, I'm not going to lose much sleep over it. The critical stuff is all on a mirrored volume and backed up externally. Eventually I'll replace the 250GB unit with another 640GB unit and then I'll review the partitioning strategy. Regards, -- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au =================================================== A hypothetical paradox: What would happen in a battle between an Enterprise security team, who always get killed soon after appearing, and a squad of Imperial Stormtroopers, who can't hit the broad side of a planet? -- Tom Galloway

Carlos E. R.

01:38

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Wednesday 2008-07-23 at 11:00 -0700, John Andersen wrote:

...

...
First of all, I could not figure out - is this software RAID 0 or RAID 1 (it is not mountable so I cannot figure out MD0 refers to RAID device number not to its level)? if this is RAID 1 I can simply remove HDs one by one and eliminate one with bad blocks.

Yes, If it is indeed software raid, you can simply unpower one of the drives and see if you can mount the other directly. If so, recover data, and rebuild. If not, try the other drive.

It should be easier to remove by command. If it is mounted it shouldn't let you remove the one that is working. ( mdadm --manage --set-faulty /dev/md0 /dev/device ) mdadm /dev/md0 -r /dev/device The logs should tell which HD is bad. Or commands: mdadm --detail /dev/md0 which also says if it is raid 0, 1 or what. It is much more verbose than "cat /proc/mdstat". - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFIh90gtTMYHG2NR9URAlQ+AKCGIUEV8yOkct+eGFKEQp2S6rk43ACfejmI IWt/TvAmWG2J1rvyAAh5s8w= =rhhR -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

5756

Age (days ago)

5757

Last active (days ago)

List overview

Download

7 comments

6 participants

participants (6)

Andrei Verovski (aka MacGuru)
Brian K. White
Carlos E. R.
John Andersen
Rodney Baker
Rui Santos