New subject: [SLE] Ok Raid-1 is beggining to scare me BIG TIME!!

9 Oct 2004

      ----- Original Message ----- 
From: Chris Roubekas 
To: suse-linux-e@suse.com 
Sent: Saturday, October 09, 2004 11:47 PM
Subject: Ok Raid-1 is beggining to scare me BIG TIME!!

Dear friends,

    I recently posted a message asking for help on why my RAID-1 has a state
of "dirty,no-errors" when issuing mdadm --detail /dev/md0. A friend in the list
said that when he changed from reiserfs to ext3 it stopped being dirty as he was
experiencing the same problems that I was when running reiserfsck on the /dev/md0
drive and found errors and upon running reiserfsck --fix-fixable the errors where not 
fixed. So being through this same tunnel of problems, I though I should follow his 
steps and try changing the entire md0 to ext3 to see if I will be able to avoide the
problems that I am faced with.

    Well I thought of switching the md0 to md1 and recreating it in ext3 as opposed
to reiserfs which it was before.

    Since my raid-1 in md0 is 2x200GB drives and I have no other drive to backup
I removed one of the disks of md0, created a new md1 with mdadm and formated
it in ext3 format. Then I copied all the data from md0 to md1 and finally I removed the
second drive from md0 and added it to md1.

    After the process of reconstruction was over, I restarted my machine and issued 

cat /proc/mdstat which reported:

server:/ # cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hdc1[1] hdb1[0]
      199125568 blocks [2/2] [UU]

unused devices: <none>

Which made me very happy as I saw that after the hard-drive lights stopped glowing
things appeared just great!
But then, I issued a mdadm --detail /dev/md0 and see what I get:

server:/ # mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.00
  Creation Time : Sat Oct  9 15:44:51 2004
     Raid Level : raid1
     Array Size : 199125568 (189.90 GiB 203.95 GB)
    Device Size : 199125568 (189.90 GiB 203.95 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sun Oct 10 02:09:02 2004
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0       3       65        0      active sync   /dev/hdb1
       1      22        1        1      active sync   /dev/hdc1
           UUID : 22a79613:35bf4980:215dbc1e:910acdea

I am totally confused by what I see here for the following reasons:

    a) How is it possible to say Total Devices 3 when there are only 2??
    b) Why am I seeing the State: Dirty, no-errors still???

Then I copied a file from hda (which is the root filesystem; my raid is totally
storage and nothing more as it is shared by Samba) and noticed that although
the directory that is mounted on /dev/md1 has the file, when I mount /dev/hdb1 or hdc1
I notice that the file is not there.
I tried issuing "sync" to see what would happen and I got nothing!! The file
still doesn't appear on the drives!

I am so confused and at the same time very very scared since I do not know if 
the data that the users put on that raid are actually being raid-ed (if there is such
a word...).

Tried to see how dmesg looks like and I got among other messages
 the following ones which report stuff about my md :

md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 0000000e]
 [events: 0000000e]
md: autorun ...
md: considering hdc1 ...
md:  adding hdc1 ...
md:  adding hdb1 ...
md: created md1
md: bind
md: bind
md: running: <hdc1><hdb1>
md: hdc1's event counter: 0000000e
md: hdb1's event counter: 0000000e
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind
md: export_rdev(hdc1)
md: unbind
md: export_rdev(hdb1)
md: ... autorun DONE.
Then a little bellow this point I get the following messages:
hdd: bad special flag: 0x03
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
(recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4908
(recovery.c, 256): journal_recover: JBD: Replayed 113 and revoked 0/1 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide1(22,1), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
(recovery.c, 254): journal_recover: JBD: recovery, exit status 0, recovered transactions 4810 to 4921
(recovery.c, 256): journal_recover: JBD: Replayed 133 and revoked 1/2 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,65), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
It looks like md is operating on one hand but is having a near to impossible day
trying to import the hdc1 and hdb1 directories....

ARRRGGG!!!! THis is driving me crazy!

Please please please try to give me a hand as I am about to loose my mind for ever!

I am running a SuSE8.1 box with 3 HD 2 are 200GB in Raid-1 and the third is 
a small drive of 8GB for Linux and root filesystem.

What do you think??
Thank you for your reply!
Chris

Ok Raid-1 is beggining to scare me BIG TIME!!

Chris Roubekas

Herman Knief

Jerome R. Westrick

John Andersen

Daniel Podgurski

Anders Johansson

tags

participants (6)