Re: [SLE] Ok Raid-1 is beggining to scare me BIG TIME!!

9 Oct 2004

      Hey Chris,

Well, I am not an expert... but what I can tell you is this.  When a 
file system is mounted... it is dirty.  If you have any data in buffer 
cache, which is almost unavoidable, then the file system cannot be 100% 
consistent.  This is why, when you try and run fsck on a mounted/active 
partition you get warnings about potential data loss.

As for the mdadm reporting three devices... have you removed md0 from 
raidtab (if you had one)?  Have you rebooted the machine since you 
mucked with the raid devices?  It's possible the 3rd device is just a 
shadow from your old md0.

Are you seeing any sort of file corruption or errors in your 
logs/dmesg?  If not, then I would think you are possibly being overly 
concerned about something that really isn't an issue.

 - Herman

Chris Roubekas wrote:
...
----- Original Message ----- 
From: Chris Roubekas 
To: suse-linux-e@suse.com 
Sent: Saturday, October 09, 2004 11:47 PM
Subject: Ok Raid-1 is beggining to scare me BIG TIME!!
Dear friends,
I recently posted a message asking for help on why my RAID-1 has a state
of "dirty,no-errors" when issuing mdadm --detail /dev/md0. A friend in the list
said that when he changed from reiserfs to ext3 it stopped being dirty as he was
experiencing the same problems that I was when running reiserfsck on the /dev/md0
drive and found errors and upon running reiserfsck --fix-fixable the errors where not 
fixed. So being through this same tunnel of problems, I though I should follow his 
steps and try changing the entire md0 to ext3 to see if I will be able to avoide the
problems that I am faced with.
Well I thought of switching the md0 to md1 and recreating it in ext3 as opposed
to reiserfs which it was before.
Since my raid-1 in md0 is 2x200GB drives and I have no other drive to backup
I removed one of the disks of md0, created a new md1 with mdadm and formated
it in ext3 format. Then I copied all the data from md0 to md1 and finally I removed the
second drive from md0 and added it to md1.
After the process of reconstruction was over, I restarted my machine and issued
cat /proc/mdstat which reported:
server:/ # cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc1[1] hdb1[0]
     199125568 blocks [2/2] [UU]
unused devices: <none>
Which made me very happy as I saw that after the hard-drive lights stopped glowing
things appeared just great!
But then, I issued a mdadm --detail /dev/md0 and see what I get:
server:/ # mdadm --detail /dev/md1
/dev/md1:
       Version : 00.90.00
 Creation Time : Sat Oct  9 15:44:51 2004
    Raid Level : raid1
    Array Size : 199125568 (189.90 GiB 203.95 GB)
   Device Size : 199125568 (189.90 GiB 203.95 GB)
  Raid Devices : 2
 Total Devices : 3
Preferred Minor : 1
   Persistence : Superblock is persistent
Update Time : Sun Oct 10 02:09:02 2004
         State : dirty, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 1
 Spare Devices : 0
Number   Major   Minor   RaidDevice State
      0       3       65        0      active sync   /dev/hdb1
      1      22        1        1      active sync   /dev/hdc1
          UUID : 22a79613:35bf4980:215dbc1e:910acdea
I am totally confused by what I see here for the following reasons:
a) How is it possible to say Total Devices 3 when there are only 2??
   b) Why am I seeing the State: Dirty, no-errors still???
Then I copied a file from hda (which is the root filesystem; my raid is totally
storage and nothing more as it is shared by Samba) and noticed that although
the directory that is mounted on /dev/md1 has the file, when I mount /dev/hdb1 or hdc1
I notice that the file is not there.
I tried issuing "sync" to see what would happen and I got nothing!! The file
still doesn't appear on the drives!
I am so confused and at the same time very very scared since I do not know if 
the data that the users put on that raid are actually being raid-ed (if there is such
a word...).
Tried to see how dmesg looks like and I got among other messages
the following ones which report stuff about my md :
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 0000000e]
[events: 0000000e]
md: autorun ...
md: considering hdc1 ...
md:  adding hdc1 ...
md:  adding hdb1 ...
md: created md1
md: bind
md: bind
md: running: <hdc1><hdb1>
md: hdc1's event counter: 0000000e
md: hdb1's event counter: 0000000e
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind
md: export_rdev(hdc1)
md: unbind
md: export_rdev(hdb1)
md: ... autorun DONE.
Then a little bellow this point I get the following messages:
hdd: bad special flag: 0x03
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
(recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4908
(recovery.c, 256): journal_recover: JBD: Replayed 113 and revoked 0/1 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide1(22,1), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
(recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4921
(recovery.c, 256): journal_recover: JBD: Replayed 133 and revoked 1/2 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,65), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
It looks like md is operating on one hand but is having a near to impossible day
trying to import the hdc1 and hdb1 directories....
ARRRGGG!!!! THis is driving me crazy!
Please please please try to give me a hand as I am about to loose my mind for ever!
I am running a SuSE8.1 box with 3 HD 2 are 200GB in Raid-1 and the third is 
a small drive of 8GB for Linux and root filesystem.
What do you think??
Thank you for your reply!
Chris

Re: [SLE] Ok Raid-1 is beggining to scare me BIG TIME!!

Herman Knief