New subject: [Bug 775746] mdadm degraded array on boot, random device partition missing

13 Aug 2012

      https://bugzilla.novell.com/show_bug.cgi?id=775746

https://bugzilla.novell.com/show_bug.cgi?id=775746#c0

           Summary: mdadm degraded array on boot, random device partition
                    missing
    Classification: openSUSE
           Product: openSUSE 12.1
           Version: Final
          Platform: x86-64
        OS/Version: openSUSE 12.1
            Status: NEW
          Severity: Critical
          Priority: P5 - None
         Component: Other
        AssignedTo: bnc-team-screening@forge.provo.novell.com
        ReportedBy: j.langley@gmx.net
         QAContact: qa-bugs@suse.de
          Found By: ---
           Blocker: ---

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20100101
Firefox/14.0.1

I have two soft raid devices configured with yast. Regulary, but not always i
got a degraded array after boot. One random partition is missing in one array,
in most cases it is sdb4 or sda4 - appears to be alternately. Sometimes in both
arrays one partition is missing.

I searched the web and found similar problems but not exactly the same.
I also booted with raid=noautodetect since some Ubuntu forums suggested that.
But no success.

/tmp/initrd/lib/udev/rules.d/64-md-raid.rules is in place.

I also tried to put it in /etc/udev/rules.d/64-md-raid.rules too, but no
difference. The random partition is kicked even though it is clean:
I suspect a bug in mdadm used with udev and systemd.

linux-dioz:/ # mdadm --examine /dev/sdb4
/dev/sdb4:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : 909d58d0:3a4ee94e:8897cfe8:d7aefeea
           Name : linux-99ig:1
  Creation Time : Fri Oct 21 21:36:33 2011
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 878198512 (418.76 GiB 449.64 GB)
     Array Size : 878198512 (418.76 GiB 449.64 GB)
   Super Offset : 878198768 sectors
          State : clean
    Device UUID : 46360291:b7806b6c:ce1f892e:fa56fc78

Internal Bitmap : -8 sectors from superblock
    Update Time : Mon Aug 13 22:24:16 2012
       Checksum : 5c2804a9 - correct
         Events : 93354

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)

linux-dioz:~ # mdadm -V
mdadm - v3.2.2 - 17th June 2011

linux-dioz:~ # uname -r
3.1.10-1.16-desktop

linux-dioz:~ # cat /proc/cmdline
root=/dev/md0 noresume splash=silent quiet vga=795 raid=noautodetect

linux-dioz:~ # cat /etc/fstab:
/dev/md0             /                    xfs        defaults              1 1
/dev/md1             /home                xfs        defaults              1 2

linux-dioz:~ # cat /etc/mdadm.conf
DEVICE containers partitions
ARRAY /dev/md/0 UUID=7c60e2b2:804071ee:1ef2019b:e3fce998
ARRAY /dev/md/1 UUID=909d58d0:3a4ee94e:8897cfe8:d7aefeea

linux-dioz:~ # cat /tmp/initrd/etc/mdadm.conf
AUTO -all
ARRAY /dev/md0 metadata=1.0 name=linux:0
UUID=7c60e2b2:804071ee:1ef2019b:e3fce998

linux-dioz:~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Fri Oct 21 19:56:37 2011
     Raid Level : raid1
     Array Size : 41945016 (40.00 GiB 42.95 GB)
  Used Dev Size : 41945016 (40.00 GiB 42.95 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Aug 13 22:04:19 2012
          State : active 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : linux:0
           UUID : 7c60e2b2:804071ee:1ef2019b:e3fce998
         Events : 5124

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

linux-dioz:~ # mdadm --detail /dev/md1
/dev/md1:
        Version : 1.0
  Creation Time : Fri Oct 21 21:36:33 2011
     Raid Level : raid1
     Array Size : 439099256 (418.76 GiB 449.64 GB)
  Used Dev Size : 439099256 (418.76 GiB 449.64 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Aug 13 22:05:20 2012
          State : active 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : linux-99ig:1
           UUID : 909d58d0:3a4ee94e:8897cfe8:d7aefeea
         Events : 93354

    Number   Major   Minor   RaidDevice State
       0       8        4        0      active sync   /dev/sda4
       1       8       20        1      active sync   /dev/sdb4

linux-dioz:~ # mount
/dev/md0 on / type xfs (rw,relatime,attr2,delaylog,noquota)
/dev/md1 on /home type xfs (rw,relatime,attr2,delaylog,noquota)

linux-dioz:~ # dmesg | grep md
[    0.000000] Command line: root=/dev/md0 noresume splash=silent quiet vga=795
raid=noautodetect
[    0.000000] Kernel command line: root=/dev/md0 noresume splash=silent quiet
vga=795 raid=noautodetect
[    1.354683] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xe000 irq 21
[    1.354687] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xe008 irq 21
[    1.355625] ata3: SATA max UDMA/133 cmd 0x9e0 ctl 0xbe0 bmdma 0xcc00 irq 20
[    1.355629] ata4: SATA max UDMA/133 cmd 0x960 ctl 0xb60 bmdma 0xcc08 irq 20
[    3.024577] md: bind<sda2>
[    3.075631] md: bind<sdb2>
[    3.078527] md: raid1 personality registered for level 1
[    3.078799] md/raid1:md0: active with 2 out of 2 mirrors
[    3.078988] created bitmap (1 pages) for device md0
[    3.079187] md0: bitmap initialized from disk: read 1/1 pages, set 0 of 641
bits
[    3.149541] md0: detected capacity change from 0 to 42951696384
[    3.151403]  md0: unknown partition table
[    3.392302] md: raid0 personality registered for level 0
[    3.396557] md: raid10 personality registered for level 10
[    3.524158] md: raid6 personality registered for level 6
[    3.524162] md: raid5 personality registered for level 5
[    3.524164] md: raid4 personality registered for level 4
[    3.720134] XFS (md0): Mounting Filesystem
[    3.861353] XFS (md0): Ending clean mount
[    4.612088] systemd[1]: systemd 37 running in system mode. (+PAM +LIBWRAP
+AUDIT +SELINUX +SYSVINIT +LIBCRYPTSETUP; suse)
[    5.005999] systemd[1]: Set hostname to <linux-dioz.site>.
[    9.145112] EDAC amd64: DRAM ECC disabled.
[    9.145123] EDAC amd64: ECC disabled in the BIOS or no ECC capability,
module will not load.
[   10.461748] md: md1 stopped.
[   10.645936] md: bind<sdb4>
[   10.675967] md: bind<sda4>
[   10.676014] md: kicking non-fresh sdb4 from array!
[   10.676020] md: unbind<sdb4>
[   10.679034] md: export_rdev(sdb4)
[   10.833257] md/raid1:md1: active with 1 out of 2 mirrors
[   10.854573] created bitmap (4 pages) for device md1
[   10.854830] md1: bitmap initialized from disk: read 1/1 pages, set 113 of
6701 bits
[   10.900278] md1: detected capacity change from 0 to 449637638144
[   10.900540] boot.md[462]: Starting MD RAID mdadm: /dev/md/1 has been started
with 1 drive (out of 2).
[   11.002196]  md1: unknown partition table
[   11.399485] boot.md[462]: ..done
[   11.500712] systemd-fsck[934]: /sbin/fsck.xfs: XFS file system.
[   11.548664] XFS (md1): Mounting Filesystem
[   12.068339] XFS (md1): Ending clean mount
[ 5458.784139] ata2.00: cmd 61/01:00:e8:2f:20/00:00:05:00:00/40 tag 0 ncq 512
out
[ 9758.290839] md: bind<sdb4>
[ 9758.339918] md: recovery of RAID array md1
[ 9758.339922] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 9758.339927] md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
[ 9758.339934] md: using 128k window, over a total of 439099256k.
[ 9897.447202] md: md1: recovery done.

linux-dioz:~ # dmesg | grep sdb
[    2.302027] sd 1:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465
GiB)
[    2.302230] sd 1:0:0:0: [sdb] Write Protect is off
[    2.302234] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.302284] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[    2.328030]  sdb: sdb1 sdb2 sdb3 sdb4
[    2.328465] sd 1:0:0:0: [sdb] Attached SCSI disk
[    3.075631] md: bind<sdb2>
[   10.645936] md: bind<sdb4>
[   10.676014] md: kicking non-fresh sdb4 from array!
[   10.676020] md: unbind<sdb4>
[   10.679034] md: export_rdev(sdb4)
[   11.177828] Adding 1051644k swap on /dev/sdb1.  Priority:0 extents:1
across:1051644k 
[ 5458.784082]   dhfis 0x1 dmafis 0x1 sdbfis 0x0
[ 5458.784093] ata2: tag : dhfis dmafis sdbfis sactive
[ 9758.290839] md: bind<sdb4>
[ 9758.339510]  disk 1, wo:1, o:1, dev:sdb4
[ 9897.575473]  disk 1, wo:0, o:1, dev:sdb4

Reproducible: Sometimes

Steps to Reproduce:
1. create array and verify it is running well, move some data around
2. reboot 
3. check arrays again with mdadm --detail /dev/md1
Actual Results:  
[   10.645936] md: bind<sdb4>
[   10.676014] md: kicking non-fresh sdb4 from array!

Expected Results:  
[    3.075631] md: bind<sdb2>
[   10.645936] md: bind<sdb4>

I had data losses several times now, since the missing partition is
alternating!

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[Bug 775746] New: mdadm degraded array on boot, random device partition missing

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

tags

participants (1)