Help with disk integrity and RAID-1 please

newer
anyone else have mkinitrd problems?

older
OpenSuse & problems with monitor...

Simon Roberts

8 Oct 2005 8 Oct '05

16:26

Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again. Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything). As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that? Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this: # mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011 Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1 I don't really know what I'm looking at, but the output looks bad, right? I also found this in dmesg's output: md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE. Which also looks bad, don't you think? So, can anyone please tell me in the short term: 1) Is hde indeed out of the array as it appears? 2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure) 3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how? 4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array? Then in the longer term, where should I be looking for the docs so I can know this for myself in future? Many thanks, Simon "You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

Show replies by date

Michael W Cocke

8 Oct 8 Oct

17:13

New subject: [SLE] Help with disk integrity and RAID-1 please

On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:

...

Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again.

Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything).

As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that?

Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this:

# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent

Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0

UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011

Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1

I don't really know what I'm looking at, but the output looks bad, right?

I also found this in dmesg's output:

md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE.

Which also looks bad, don't you think?

So, can anyone please tell me in the short term:

1) Is hde indeed out of the array as it appears?

Yes.

...

2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure)

There is no such thing as a 'partial drive failure' on an IDE drive. Bad sector marking/remapping is handled via the on board electrics - if the alternate sector map is full, the drive is a short time away from complete failure. Since you describe odd noises, you don't even need to worry about that - it's junk.

...

3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how?

See #2 above.

...

4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?

Power down the system, replace the drive, power up the system. The only real recovery headache with a RAID is if the boot drive is the one that failed... In that case, you need to have made certain that ALL the disks are bootable (lilo can do that, I don't know about grub), or else have an alternate boot method.

...

Then in the longer term, where should I be looking for the docs so I can know this for myself in future?

All of the docs on the linux software raid system that I've seen are lousy... The code is still evolving, and it seems to be being written by people who aren't into docs. O'Reily has 'Managing RAID on linux' which isn't too bad but IS inaccurate in places. The way I did it was to put together a junk system and try things, meanwhile reading everything google found on 'linux raid'. A real pain, but it's your data... Mike- -- Mornings: Evolution in action. Only the grumpy will survive. -- Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments.

Simon Roberts

17:56

New subject: [SLE] Help with disk integrity and RAID-1 please

Aha, interesting (and rather depresseing!) thanks. I think I'll have to/elect to rebuild from scratch. I was contemplating an upgrade to SuSE 10 (if I can ever get the images down :) I also have my swap space on the bad drive, and it is, as you might have guessed, my boot drive! So, all in all, I think I'll take the heavyweight route. Actually, I've had some other odd behavior that might be explained if my swap partition were failing :( Thanks for the info. Cheers, Simon --- Michael W Cocke wrote:

...

On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:

...
Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again.

Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything).

As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that?

Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this:

# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent

Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0

UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011

Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1

I don't really know what I'm looking at, but the output looks bad, right?

I also found this in dmesg's output:

md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE.

Which also looks bad, don't you think?

So, can anyone please tell me in the short term:

1) Is hde indeed out of the array as it appears?

Yes.

...
2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure)

There is no such thing as a 'partial drive failure' on an IDE drive. Bad sector marking/remapping is handled via the on board electrics - if the alternate sector map is full, the drive is a short time away from complete failure. Since you describe odd noises, you don't even need to worry about that - it's junk.

...
3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how?

See #2 above.

...
4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?

Power down the system, replace the drive, power up the system. The only real recovery headache with a RAID is if the boot drive is the one that failed... In that case, you need to have made certain that ALL the disks are bootable (lilo can do that, I don't know about grub), or else have an alternate boot method.

...
Then in the longer term, where should I be looking for the docs so I can know this for myself in future?

All of the docs on the linux software raid system that I've seen are lousy... The code is still evolving, and it seems to be being written by people who aren't into docs. O'Reily has 'Managing RAID on linux' which isn't too bad but IS inaccurate in places. The way I did it was to put together a junk system and try things, meanwhile reading everything google found on 'linux raid'. A real pain, but it's your data...

Mike-

-- Mornings: Evolution in action. Only the grumpy will survive. --

Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments.

-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com

"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs

Simon Roberts

10 Oct 10 Oct

05:39

New subject: [SLE] Help with disk integrity and RAID-1 please

Following another post pointing out the existence of the smartctl test interface, it looks as if this drive of mine might actually be ok. Is there any possibility that I screwed up the configuration and, in effect, switched off the other drive from the RAID array, rather than it being taken down for errors? If I did, how might I get it back, can I just zero its contents and add it to the array again? And any pointer as to the command(s) to re-add it? (I know how to use dd to zero it). TIA, Simon --- Michael W Cocke wrote:

...

On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:

...
Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again.

Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything).

As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that?

Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this:

# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent

Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0

UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011

Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1

I don't really know what I'm looking at, but the output looks bad, right?

I also found this in dmesg's output:

md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE.

Which also looks bad, don't you think?

So, can anyone please tell me in the short term:

1) Is hde indeed out of the array as it appears?

Yes.

...
2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure)

There is no such thing as a 'partial drive failure' on an IDE drive. Bad sector marking/remapping is handled via the on board electrics - if the alternate sector map is full, the drive is a short time away from complete failure. Since you describe odd noises, you don't even need to worry about that - it's junk.

...
3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how?

See #2 above.

...
4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?

Power down the system, replace the drive, power up the system. The only real recovery headache with a RAID is if the boot drive is the one that failed... In that case, you need to have made certain that ALL the disks are bootable (lilo can do that, I don't know about grub), or else have an alternate boot method.

...
Then in the longer term, where should I be looking for the docs so I can know this for myself in future?

All of the docs on the linux software raid system that I've seen are lousy... The code is still evolving, and it seems to be being written by people who aren't into docs. O'Reily has 'Managing RAID on linux' which isn't too bad but IS inaccurate in places. The way I did it was to put together a junk system and try things, meanwhile reading everything google found on 'linux raid'. A real pain, but it's your data...

Mike-

-- Mornings: Evolution in action. Only the grumpy will survive. --

Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments.

-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com

"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

Simon Roberts

05:55

New subject: OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

Silly me, when I rub the sleep out of my eyes, and do a long test, no, the disk is indeed dying. It reported happy before I told it to do any explicit tests, then again after a short test, but part way through a long test, it's complaining of seek errors, and says it has only a day to live. Pretty cool utility the SMART stuff though! Ideal for managing an array and preemptively replacing stuff before it's too late. Thanks, Simon --- Simon Roberts wrote:

...

Following another post pointing out the existence of the smartctl test interface, it looks as if this drive of mine might actually be ok. Is there any possibility that I screwed up the configuration and, in effect, switched off the other drive from the RAID array, rather than it being taken down for errors? If I did, how might I get it back, can I just zero its contents and add it to the array again? And any pointer as to the command(s) to re-add it? (I know how to use dd to zero it).

TIA, Simon

--- Michael W Cocke wrote:

...
On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:

...
Please forgive me if this shows up twice, I tried to send once but it has taken an improbable time and still not shown up, so it's time to try again.

Following a premature (3 months) disk failure, I created a RAID 1 array. I understand the basic idea of RAID, but have never used the tools to do it before (not on Linux, not on anything).

As I built it, I knew there were many things I didn't know about, but hoped I could learn slowly in "spare" time. For example: does RAID move bad blocks on it's elements, or does it just dump the doubtful device? If RAID finds a disk problem, does it tell me about it, and if so how? If RAID rejects a device, particularly if it's for "transient" reasons like a single bad sector, can I re-prepare the disk manually and get it back into service. If I have to replace a failed disk, how do I do that?

Anyway, these questions are still unanswered (after about 3 months...) and guess what: I'm pretty sure I have a drive failure. It makes odd noises, like the other one did :( I poked around, and managed to work out the existance of the mdadm command, and found this:

# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Sep 1 05:49:50 2005 Raid Level : raid1 Array Size : 156280192 (149.04 GiB 160.03 GB) Device Size : 156280192 (149.04 GiB 160.03 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent

Update Time : Sat Oct 8 09:38:25 2005 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0

UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c Events : 0.1345011

Number Major Minor RaidDevice State 0 0 0 - removed 1 34 1 1 active sync /dev/hdg1

I don't really know what I'm looking at, but the output looks bad, right?

I also found this in dmesg's output:

md: Autodetecting RAID arrays. md: autorun ... md: considering hdg1 ... md: adding hdg1 ... md: adding hde1 ... md: created md0 md: bind<hde1> md: bind<hdg1> md: running: <hdg1><hde1> md: kicking non-fresh hde1 from array! md: unbind<hde1> md: export_rdev(hde1) raid1: raid set md0 active with 1 out of 2 mirrors md: ... autorun DONE.

Which also looks bad, don't you think?

So, can anyone please tell me in the short term:

1) Is hde indeed out of the array as it appears?

Yes.

...
2) How can I determine what the failure is? (is it "a few" bad sectors, too many to want to reuse the drive, or a more complete failure)

There is no such thing as a 'partial drive failure' on an IDE drive. Bad sector marking/remapping is handled via the on board electrics

...
if the alternate sector map is full, the drive is a short time away from complete failure. Since you describe odd noises, you don't even need to worry about that - it's junk.

...
3) Can I reformat, move bad sectors, clean up the drive (if it's a minor failure) and get it back into service, and if so how?

See #2 above.

...
4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?

Power down the system, replace the drive, power up the system. The only real recovery headache with a RAID is if the boot drive is the one that failed... In that case, you need to have made certain that ALL the disks are bootable (lilo can do that, I don't know about grub), or else have an alternate boot method.

...
Then in the longer term, where should I be looking for the docs so I can know this for myself in future?

All of the docs on the linux software raid system that I've seen are lousy... The code is still evolving, and it seems to be being written by people who aren't into docs. O'Reily has 'Managing RAID on linux' which isn't too bad but IS inaccurate in places. The way I did it was to put together a junk system and try things, meanwhile reading everything google found on 'linux raid'. A real pain, but it's your data...

Mike-

-- Mornings: Evolution in action. Only the grumpy will survive. --

Please note - Due to the intense volume of spam, we have installed site-wide spam filters at catherders.com. If email from you bounces, try non-HTML, non-encoded, non-attachments.

-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com

"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz

__________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com

"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Yahoo! Music Unlimited Access over 1 million songs. Try it free. http://music.yahoo.com/unlimited/

Carlos E. R.

11:27

New subject: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Sunday 2005-10-09 at 22:55 -0700, Simon Roberts wrote:

...

Pretty cool utility the SMART stuff though! Ideal for managing an array and preemptively replacing stuff before it's too late.

¡Right! It doesn't catch everything, of course, but it does a good job. You can also start the daemon (rcsmartd start), which will do periodic tests - after you configure it, wich is not very clear/easy. You might use that failing disk (after safeguarding your data) to run tests on smart, and see how the daemon warns you. PS: it would be appreciated if you trimmed your quotes. This email was over 10 Kb, most of it useless quoted text... - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDSlAmtTMYHG2NR9URAgWtAJ4hdofMIhntFg0PwEvKaO9l0eYNdgCfaDko Lqu1cGFl4S2fZtKW78KA7zM= =atTm -----END PGP SIGNATURE-----

Per Jessen

11 Oct 11 Oct

12:30

New subject: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

Carlos E. R. wrote:

...

¡Right! It doesn't catch everything, of course, but it does a good job. You can also start the daemon (rcsmartd start), which will do periodic tests - after you configure it, wich is not very clear/easy.

Actually, it's not too bad: just add "DEVICESCAN -m <email> -n standby" and you've got a good start. (in fact, that's all I ever do). Fill in an appropriate email-address. Per Jessen -- http://www.spamchek.co.uk/ - managed anti-spam and anti-virus solution. Sign up for your free 30-day trial now!

Carlos E. R.

12 Oct 12 Oct

12:02

New subject: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Tuesday 2005-10-11 at 14:30 +0200, Per Jessen wrote:

...

Carlos E. R. wrote:

...
¡Right! It doesn't catch everything, of course, but it does a good job. You can also start the daemon (rcsmartd start), which will do periodic tests - after you configure it, wich is not very clear/easy.

Actually, it's not too bad: just add "DEVICESCAN -m <email> -n standby" and you've got a good start. (in fact, that's all I ever do). Fill in an appropriate email-address.

/dev/hda -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23) /dev/hdb -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23) Haven't got around to the mailing thing yet... I'd appreciate if Yast included a module for this, or at list, a /etc/sysconfig/smart file. - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDTPtTtTMYHG2NR9URAvVUAJ9fn+8nd2dL/BOBWw1zf53FaPWR1gCgkp6W Msl8GTJjVMmLzRb/4/wzuY8= =BBQt -----END PGP SIGNATURE-----

Patrick Shanahan

17:06

New subject: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

* Carlos E. R. [10-12-05 07:06]:

...

/dev/hda -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23) /dev/hdb -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23)

Haven't got around to the mailing thing yet...

"-m <preferred address>" will provide the mailing note: include on *each* config line /dev/hda -a -m root -o on -S on -s (S/../.././02|L/../../6/03) -M test /dev/hda -H -l error -l selftest -m root -t -I 194 -- Patrick Shanahan Registered Linux User #207535 http://wahoo.no-ip.org @ http://counter.li.org HOG # US1244711 Photo Album: http://wahoo.no-ip.org/gallery2

Carlos E. R.

23:33

New subject: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Wednesday 2005-10-12 at 12:06 -0500, Patrick Shanahan wrote:

...

...
/dev/hda -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23) /dev/hdb -H -l error -l selftest -s (S/../../2|4|6|7/22|L/../../5/23)

Haven't got around to the mailing thing yet...

"-m <preferred address>"

will provide the mailing note: include on *each* config line

/dev/hda -a -m root -o on -S on -s (S/../.././02|L/../../6/03) -M test /dev/hda -H -l error -l selftest -m root -t -I 194

You will agree that that is cryptic :-) Let's see if I can decipher it... -a Equivalent to turning on all of the following Direc- tives: ´-H´ to check the SMART health status, ´-f´ to report failures of Usage (rather than Prefail) Attributes, ´-t´ to track changes in both Prefailure and Usage Attributes, ´-l selftest´ to report increases in the number of Self-Test Log errors, ´-l error´ to report increases in the number of ATA errors, ´-C 197´ to report nonzero values of the cur- rent pending sector count, and ´-U 198´ to report nonzero values of the offline pending sector count. Note that -a is the default for ATA devices. If none of these other Directives is given, then -a is assumed. -m ADD Send a warning email to the email address ADD if the ´-H´, ´-l´, ´-f´, ´-C´, or ´-O´ Directives detect a failure or a new error, or if a SMART command to the disk fails. This Directive only works in conjunction with these other Directives (or with the equivalent default ´-a´ Directive). -o VALUE Enables or disables SMART Automatic Offline Testing when smartd starts up and has no further effect. The valid arguments to this Directive are on and off. The delay between tests is vendor-specific, but is typically four hours. Note that SMART Automatic Offline Testing is not part of the ATA Specification. Please see the smartctl -o command-line option documentation for further infor- mation about this feature. -M TYPE These Directives modify the behavior of the smartd email warnings enabled with the ´-m´ email Directive described above. These ´-M´ Directives only work in conjunction with the ´-m´ Directive and can not be used without it. test - send a single test email immediately upon smartd startup. This allows one to verify that email is delivered correctly. Instead of a configuration file, it is a bunch of unreadable cryptic options. That's not what I expect to find in a normal configuration file, usually very verbose and understandable. Lets see the next line: /dev/hda -H -l error -l selftest -m root -t -I 194 -H Check the SMART health status of the disk. If any Prefailure Attributes are less than or equal to their threshold values, then disk failure is predicted in less than 24 hours, and a message at loglevel ´LOG_CRITICAL´ will be logged to syslog. [Please see the smartctl -H command-line option.] -l TYPE Reports increases in the number of errors in one of the two SMART logs. The valid arguments to this Directive are: error - report if the number of ATA errors reported in the ATA Error Log has increased since the last check. selftest - report if the number of failed tests reported in the SMART Self-Test Log has increased since the last check, or if the timestamp associated with the most recent failed test has increased. Note that such errors will only be logged if you run self- tests on the disk (and it fails a test!). Self-Tests can be run automatically by smartd: please see the ´-s´ Directive below. Self-Tests can also be run manually by using the ´-t short´ and ´-t long´ options of smartctl and the results of the testing can be observed using the smartctl ´-l selftest´ com- mand-line option.] -t Equivalent to turning on the two previous flags ´-p´ and ´-u´. Tracks changes in all device Attributes (both Prefailure and Usage). [Please see the smartctl -A command-line option.] -p Report anytime that a Prefail Attribute has changed its value since the last check, 30 minutes ago. [Please see the smartctl -A command-line option.] -u Report anytime that a Usage Attribute has changed its value since the last check, 30 minutes ago. [Please see the smartctl -A command-line option.] -I ID Ignore device Attribute ID when tracking changes in the Attribute values. ID must be a decimal integer in the range from 1 to 255. This Directive modifies the behavior of the ´-p´, ´-u´, and ´-t´ tracking Directives and has no effect without one of them. Oh Yea, very easy :-P I have changed my line to (looking at yours): /dev/hda -a -m cer -s (S/../../2|4|6|7/22|L/../../5/23) -M test and now, I'm getting idiotic log entries: Oct 12 22:40:39 nimrodel smartd[4769]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 61 to 62 Oct 12 23:10:39 nimrodel smartd[4769]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 62 to 59 What do I care that the raw error rate has changed? That is not dangerous at all, not even important. It is trivial. Let me see... - -a = -H -f -t -l selftest -l error -C 197 -U 198 - -t = -p -u So I have to use remove "-t" /dev/hda -H -f -l selftest -l error -C 197 -U 198 -m cer -s (S/../../2|4|6|7/22|L/../../5/23) -M test - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDTZ1utTMYHG2NR9URAr5zAJ42CZbvFfQUQ3cnofYZ608z6Eji4ACdEfDM 3QAqndPyt+cuuDA7f1EDPKA= =Qq+5 -----END PGP SIGNATURE-----

Jerry Westrick

8 Oct 8 Oct

20:08

New subject: [SLE] Help with disk integrity and RAID-1 please

On Saturday 08 October 2005 18:26, Simon Roberts wrote:

...

4) If I elect/have to replace the drive, what do I do to make it take up it's ordained place in the md array?

The procedure is not complicated at all: 1) partition the new disk as required. 2) add the new partition to the raid with: mdadm /dev/md0 -a /dev/hda1 of course using the prepared partition insted of /dev/hda1. The new raid partition will be sync'ed in background as you continue work. That's all there is to it! Jerry

6772

Age (days ago)

6776

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Carlos E. R.
Jerry Westrick
Michael W Cocke
Patrick Shanahan
Per Jessen
Simon Roberts