Ok Raid-1 is beggining to scare me BIG TIME!!
----- Original Message -----
From: Chris Roubekas
To: suse-linux-e@suse.com
Sent: Saturday, October 09, 2004 11:47 PM
Subject: Ok Raid-1 is beggining to scare me BIG TIME!!
Dear friends,
I recently posted a message asking for help on why my RAID-1 has a state
of "dirty,no-errors" when issuing mdadm --detail /dev/md0. A friend in the list
said that when he changed from reiserfs to ext3 it stopped being dirty as he was
experiencing the same problems that I was when running reiserfsck on the /dev/md0
drive and found errors and upon running reiserfsck --fix-fixable the errors where not
fixed. So being through this same tunnel of problems, I though I should follow his
steps and try changing the entire md0 to ext3 to see if I will be able to avoide the
problems that I am faced with.
Well I thought of switching the md0 to md1 and recreating it in ext3 as opposed
to reiserfs which it was before.
Since my raid-1 in md0 is 2x200GB drives and I have no other drive to backup
I removed one of the disks of md0, created a new md1 with mdadm and formated
it in ext3 format. Then I copied all the data from md0 to md1 and finally I removed the
second drive from md0 and added it to md1.
After the process of reconstruction was over, I restarted my machine and issued
cat /proc/mdstat which reported:
server:/ # cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc1[1] hdb1[0]
199125568 blocks [2/2] [UU]
unused devices: <none>
Which made me very happy as I saw that after the hard-drive lights stopped glowing
things appeared just great!
But then, I issued a mdadm --detail /dev/md0 and see what I get:
server:/ # mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Sat Oct 9 15:44:51 2004
Raid Level : raid1
Array Size : 199125568 (189.90 GiB 203.95 GB)
Device Size : 199125568 (189.90 GiB 203.95 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sun Oct 10 02:09:02 2004
State : dirty, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Number Major Minor RaidDevice State
0 3 65 0 active sync /dev/hdb1
1 22 1 1 active sync /dev/hdc1
UUID : 22a79613:35bf4980:215dbc1e:910acdea
I am totally confused by what I see here for the following reasons:
a) How is it possible to say Total Devices 3 when there are only 2??
b) Why am I seeing the State: Dirty, no-errors still???
Then I copied a file from hda (which is the root filesystem; my raid is totally
storage and nothing more as it is shared by Samba) and noticed that although
the directory that is mounted on /dev/md1 has the file, when I mount /dev/hdb1 or hdc1
I notice that the file is not there.
I tried issuing "sync" to see what would happen and I got nothing!! The file
still doesn't appear on the drives!
I am so confused and at the same time very very scared since I do not know if
the data that the users put on that raid are actually being raid-ed (if there is such
a word...).
Tried to see how dmesg looks like and I got among other messages
the following ones which report stuff about my md :
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 0000000e]
[events: 0000000e]
md: autorun ...
md: considering hdc1 ...
md: adding hdc1 ...
md: adding hdb1 ...
md: created md1
md: bind
Hey Chris, Well, I am not an expert... but what I can tell you is this. When a file system is mounted... it is dirty. If you have any data in buffer cache, which is almost unavoidable, then the file system cannot be 100% consistent. This is why, when you try and run fsck on a mounted/active partition you get warnings about potential data loss. As for the mdadm reporting three devices... have you removed md0 from raidtab (if you had one)? Have you rebooted the machine since you mucked with the raid devices? It's possible the 3rd device is just a shadow from your old md0. Are you seeing any sort of file corruption or errors in your logs/dmesg? If not, then I would think you are possibly being overly concerned about something that really isn't an issue. - Herman Chris Roubekas wrote:
----- Original Message ----- From: Chris Roubekas To: suse-linux-e@suse.com Sent: Saturday, October 09, 2004 11:47 PM Subject: Ok Raid-1 is beggining to scare me BIG TIME!!
Dear friends,
I recently posted a message asking for help on why my RAID-1 has a state of "dirty,no-errors" when issuing mdadm --detail /dev/md0. A friend in the list said that when he changed from reiserfs to ext3 it stopped being dirty as he was experiencing the same problems that I was when running reiserfsck on the /dev/md0 drive and found errors and upon running reiserfsck --fix-fixable the errors where not fixed. So being through this same tunnel of problems, I though I should follow his steps and try changing the entire md0 to ext3 to see if I will be able to avoide the problems that I am faced with.
Well I thought of switching the md0 to md1 and recreating it in ext3 as opposed to reiserfs which it was before.
Since my raid-1 in md0 is 2x200GB drives and I have no other drive to backup I removed one of the disks of md0, created a new md1 with mdadm and formated it in ext3 format. Then I copied all the data from md0 to md1 and finally I removed the second drive from md0 and added it to md1.
After the process of reconstruction was over, I restarted my machine and issued
cat /proc/mdstat which reported:
server:/ # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md1 : active raid1 hdc1[1] hdb1[0] 199125568 blocks [2/2] [UU]
unused devices: <none>
Which made me very happy as I saw that after the hard-drive lights stopped glowing things appeared just great! But then, I issued a mdadm --detail /dev/md0 and see what I get:
server:/ # mdadm --detail /dev/md1 /dev/md1: Version : 00.90.00 Creation Time : Sat Oct 9 15:44:51 2004 Raid Level : raid1 Array Size : 199125568 (189.90 GiB 203.95 GB) Device Size : 199125568 (189.90 GiB 203.95 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent
Update Time : Sun Oct 10 02:09:02 2004 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0
Number Major Minor RaidDevice State 0 3 65 0 active sync /dev/hdb1 1 22 1 1 active sync /dev/hdc1 UUID : 22a79613:35bf4980:215dbc1e:910acdea
I am totally confused by what I see here for the following reasons:
a) How is it possible to say Total Devices 3 when there are only 2?? b) Why am I seeing the State: Dirty, no-errors still???
Then I copied a file from hda (which is the root filesystem; my raid is totally storage and nothing more as it is shared by Samba) and noticed that although the directory that is mounted on /dev/md1 has the file, when I mount /dev/hdb1 or hdc1 I notice that the file is not there. I tried issuing "sync" to see what would happen and I got nothing!! The file still doesn't appear on the drives!
I am so confused and at the same time very very scared since I do not know if the data that the users put on that raid are actually being raid-ed (if there is such a word...).
Tried to see how dmesg looks like and I got among other messages the following ones which report stuff about my md :
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. [events: 0000000e] [events: 0000000e] md: autorun ... md: considering hdc1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md1 md: bind
md: bind md: running: <hdc1><hdb1> md: hdc1's event counter: 0000000e md: hdb1's event counter: 0000000e md: RAID level 1 does not need chunksize! Continuing anyway. kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2 md: personality 3 is not loaded! md :do_md_run() returned -22 md: md1 stopped. md: unbind md: export_rdev(hdc1) md: unbind md: export_rdev(hdb1) md: ... autorun DONE. Then a little bellow this point I get the following messages: hdd: bad special flag: 0x03 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. (recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4908 (recovery.c, 256): journal_recover: JBD: Replayed 113 and revoked 0/1 blocks kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.18, 14 May 2002 on ide1(22,1), internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. (recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4921 (recovery.c, 256): journal_recover: JBD: Replayed 133 and revoked 1/2 blocks kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,65), internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. It looks like md is operating on one hand but is having a near to impossible day trying to import the hdc1 and hdb1 directories.... ARRRGGG!!!! THis is driving me crazy!
Please please please try to give me a hand as I am about to loose my mind for ever!
I am running a SuSE8.1 box with 3 HD 2 are 200GB in Raid-1 and the third is a small drive of 8GB for Linux and root filesystem.
What do you think?? Thank you for your reply! Chris
On Sat, 2004-10-09 at 22:56, Chris Roubekas wrote:
----- Original Message ----- From: Chris Roubekas To: suse-linux-e@suse.com Sent: Saturday, October 09, 2004 11:47 PM Subject: Ok Raid-1 is beggining to scare me BIG TIME!!
Dear friends,
I recently posted a message asking for help on why my RAID-1 has a state of "dirty,no-errors" when issuing mdadm --detail /dev/md0. A friend in the list said that when he changed from reiserfs to ext3 it stopped being dirty as he was experiencing the same problems that I was when running reiserfsck on the /dev/md0 drive and found errors and upon running reiserfsck --fix-fixable the errors where not fixed. So being through this same tunnel of problems, I though I should follow his steps and try changing the entire md0 to ext3 to see if I will be able to avoide the problems that I am faced with.
Well I thought of switching the md0 to md1 and recreating it in ext3 as opposed to reiserfs which it was before.
Since my raid-1 in md0 is 2x200GB drives and I have no other drive to backup I removed one of the disks of md0, created a new md1 with mdadm and formated it in ext3 format. Then I copied all the data from md0 to md1 and finally I removed the second drive from md0 and added it to md1.
After the process of reconstruction was over, I restarted my machine and issued
cat /proc/mdstat which reported:
server:/ # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md1 : active raid1 hdc1[1] hdb1[0] 199125568 blocks [2/2] [UU]
unused devices: <none>
Which made me very happy as I saw that after the hard-drive lights stopped glowing things appeared just great! But then, I issued a mdadm --detail /dev/md0 and see what I get:
server:/ # mdadm --detail /dev/md1 /dev/md1: Version : 00.90.00 Creation Time : Sat Oct 9 15:44:51 2004 Raid Level : raid1 Array Size : 199125568 (189.90 GiB 203.95 GB) Device Size : 199125568 (189.90 GiB 203.95 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent
Update Time : Sun Oct 10 02:09:02 2004 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0
Number Major Minor RaidDevice State 0 3 65 0 active sync /dev/hdb1 1 22 1 1 active sync /dev/hdc1 UUID : 22a79613:35bf4980:215dbc1e:910acdea
I am totally confused by what I see here for the following reasons:
a) How is it possible to say Total Devices 3 when there are only 2?? b) Why am I seeing the State: Dirty, no-errors still???
Then I copied a file from hda (which is the root filesystem; my raid is totally storage and nothing more as it is shared by Samba) and noticed that although the directory that is mounted on /dev/md1 has the file, when I mount /dev/hdb1 or hdc1 I notice that the file is not there. I tried issuing "sync" to see what would happen and I got nothing!! The file still doesn't appear on the drives!
Hmmm, Chris, you are not supposed to mount Raid drives without the raid. If I remember right, there are some special blocks that are not recognised when you do it that way, and you run a big danger of really messing things up... I know that it's possible to do it, but I only do it in emergency cases, when I donot need the raid any more.... I'm not sure what you are trying but I've been using software raid since way before 8.0, and have had no problems with it... Jerry
I am so confused and at the same time very very scared since I do not know if the data that the users put on that raid are actually being raid-ed (if there is such a word...).
Tried to see how dmesg looks like and I got among other messages the following ones which report stuff about my md :
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. [events: 0000000e] [events: 0000000e] md: autorun ... md: considering hdc1 ... md: adding hdc1 ... md: adding hdb1 ... md: created md1 md: bind
md: bind md: running: <hdc1><hdb1> md: hdc1's event counter: 0000000e md: hdb1's event counter: 0000000e md: RAID level 1 does not need chunksize! Continuing anyway. kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2 md: personality 3 is not loaded! md :do_md_run() returned -22 md: md1 stopped. md: unbind md: export_rdev(hdc1) md: unbind md: export_rdev(hdb1) md: ... autorun DONE. Then a little bellow this point I get the following messages: hdd: bad special flag: 0x03 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. (recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4908 (recovery.c, 256): journal_recover: JBD: Replayed 113 and revoked 0/1 blocks kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.18, 14 May 2002 on ide1(22,1), internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. (recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4921 (recovery.c, 256): journal_recover: JBD: Replayed 133 and revoked 1/2 blocks kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,65), internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. It looks like md is operating on one hand but is having a near to impossible day trying to import the hdc1 and hdb1 directories.... ARRRGGG!!!! THis is driving me crazy!
Please please please try to give me a hand as I am about to loose my mind for ever!
I am running a SuSE8.1 box with 3 HD 2 are 200GB in Raid-1 and the third is a small drive of 8GB for Linux and root filesystem.
What do you think?? Thank you for your reply! Chris
On Saturday 09 October 2004 12:56 pm, Chris Roubekas wrote:
----- Original Message ----- From: Chris Roubekas
cat /proc/mdstat which reported:
server:/ # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md1 : active raid1 hdc1[1] hdb1[0] 199125568 blocks [2/2] [UU]
unused devices: <none>
Which made me very happy as I saw that after the hard-drive lights stopped glowing things appeared just great! But then, I issued a mdadm --detail /dev/md0 and see what I get:
server:/ # mdadm --detail /dev/md1 /dev/md1: Version : 00.90.00 Creation Time : Sat Oct 9 15:44:51 2004 Raid Level : raid1 Array Size : 199125568 (189.90 GiB 203.95 GB) Device Size : 199125568 (189.90 GiB 203.95 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent
Update Time : Sun Oct 10 02:09:02 2004 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0
See the above? One drive has been previously marked as failed somewhere along the way. You can go in and remove that and your count will go down to 2 total. Its reporting it correctly.
Number Major Minor RaidDevice State 0 3 65 0 active sync /dev/hdb1 1 22 1 1 active sync /dev/hdc1 UUID : 22a79613:35bf4980:215dbc1e:910acdea
I am totally confused by what I see here for the following reasons:
a) How is it possible to say Total Devices 3 when there are only 2?? b) Why am I seeing the State: Dirty, no-errors still???
Don't worry about the dirty bit. It means that files on that drive are potentially not all flushed to disk yet, or are currently open. -- _____________________________________ John Andersen
Chris Roubekas wrote:
b) Why am I seeing the State: Dirty, no-errors still???
Every mounted drive is "dirty". When you umount a drive, the dirty flag is reset, so that reiserfsck knows that it was a clean shutdown and it doesn't need to do a fsck upon boot the next time. It is working as designed. -- BMO
On Saturday, 9 October 2004 22.56, Chris Roubekas wrote:
I recently posted a message asking for help on why my RAID-1 has a state of "dirty,no-errors" when issuing mdadm --detail /dev/md0.
This is normal. It has nothing to do with the file system, it refers to the RAID superblock. It is set to "dirty" when you create the RAID and is set to clean when you update the superblock (or when you shut down the RAID array)
participants (6)
-
Anders Johansson
-
Chris Roubekas
-
Daniel Podgurski
-
Herman Knief
-
Jerome R. Westrick
-
John Andersen