New subject: [Bug 770351] Functional system using Intel RAID (BIOS) fails upgrade to 12.1

7 Jul 2012

      https://bugzilla.novell.com/show_bug.cgi?id=770351

https://bugzilla.novell.com/show_bug.cgi?id=770351#c0

           Summary: Functional system using Intel RAID (BIOS) fails
                    upgrade to 12.1
    Classification: openSUSE
           Product: openSUSE 12.1
           Version: Final
          Platform: x86-64
        OS/Version: openSUSE 11.4
            Status: NEW
          Severity: Critical
          Priority: P5 - None
         Component: Update Problems
        AssignedTo: bnc-team-screening@forge.provo.novell.com
        ReportedBy: novell@roblucke.com
         QAContact: jsrain@suse.com
          Found By: ---
           Blocker: ---

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101
Firefox/13.0.1

I have two systems that I (stupidly) tried to upgrade to OpenSUSE 12.1 from
11.4 at the same time.  One is an HP Z600 workstation (x86_64) and the other is
an XW4200 workstation (i686), both with Intel RAID controllers on the
motherboard, supporting RAID0 and RAID1.  The Z600 has an
IntelICH8R/ICH9R/ICH10R/DO/5 Series 3400 SATA RAID Controller.

Both systems failed the update at the reboot point (the software installed and
apparently the initrd was built).  GRUB comes up and offers the appropriate
options (I have a Windows 7 partition active on the z600, and that boots
properly from GRUB).

The xw4200 system has two 750 GB disks under RAID control, which presents a
single RAID1 LUN to the system at boot.  The LUN is partitioned into /boot,
swap, and / partitions. The controller presents a second LUN that is RAID0. 
This all functioned fine under 11.4. 

After the upgrade completes, at boot time, I get the messages:

doing fast boot
FATAL: Module ata_piix not found.
FATAL: Error running install command for ata_piix
Creating device nodes with udev
Trying manual resume from /dev/disk/by-id/md-uuid-[UUID]-part1
resume device /dev/disk/by-id/md-uuid-[UUID]-part1 not found (ignoring)
[repeated previous messages]
Waiting for device /dev/disk/by-id/md-uuid-[UUID]-part3 to appear ...... Could
not find /dev/disk/by-id/md-uuid-[UUID]-part3
Want me to fall back to /dev/disk/by-id/md-uuid-[UUID]-part3? (Y/n)

Answering "Y" to the question repeats the wait message.  Answering "n" to the
question or failing the second search drops to the shell (in the initrd, I
assume).

Doing "cat /proc/mdstat" shows:

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md125 : inactive sda[1] sdb[0]
     335544320 blocks super external:-md127/0

md126 : active raid0 sda[1] sdb[0]
     1129604352 blocks super external:/md127/1 16k chunks

md127 : inactive sda[1](s) sdb[0](S)
     496 blocks super external: imsm

This is essentially a RAID1 LUN, split into /boot, swap, and / -- along with a
RAID0 LUN.  Doing "cat /proc/partitions" yields:

8    0    732574584 sda
8    1      2095104 sda1
8    2       200704 sda2
8    3    165462016 sda3
[same entries for sdb, with 16, 17,18,19 for minor numbers]
9  126   1129604352 md126
259  0   1127425024 md126p1

Doing "ls -al /dev/md/" yields:

lrwxrwxrwx   1 root root     8 Jul  7 14:35 RAID0-Vol0   -> ../md126
lrwxrwxrwx   1 root root    10 Jul  7 14:35 RAID0-Vol0p1 -> ../md126p1

Doing "ls -al /dev/md*" yields:

brw-rw---- 1 root disk    9, 125 Jul  7 14:35 /dev/md125
brw-rw---- 1 root disk    9, 126 Jul  7 14:35 /dev/md126
brw-rw---- 1 root disk  259,   0 Jul  7 14:35 /dev/md126p1
brw-rw---- 1 root disk    9, 127 Jul  7 14:35 /dev/md127

[Yes, I am typing all of this by hand ...]  If I do a "mdadm --examine --scan",
I get:

ARRAY metadata=imsm UUID=8f3d66f2:5c2e7601:4ec4fa30:e6d4be45
ARRAY /dev/md/RAID1-Vol0 container=[Same UUID as metadata] member=0 \
    UUID=73dae693:17f852dc:f7010e82:ab9aca07
ARRAY /dev/md/RAID0-Vol0 container=[Same UUID as metadata] member=1 \
    UUID=26c6ca9a:223de7d5:94e684cf:315fdae7

I assume (ASSuME) that the active device is the /boot partition on the RAID1
device.  What is not happening is activation of the three system partitions.  I
have tried using "init=/sbin/sysvinit" and selecting "Sysvinit" in the GRUB
menu with no luck.

I can (and have) done the following:

mdadm --stop /dev/md12[567]

at this point /proc/mdstat is empty.  Next I have executed:

mdadm --examine --scan > /etc/mdadm.conf

Also, using "mdadm --examine /dev/sd[ab]" yields the expected Intel RAID labels
on the disks for the two 750 GB physical disks and the expected information for
the presented LUNs (Volumes?).  RAID1-Vol0 shows 171.80 GB and RAID0-Vol0 shows
578.36 GB.  

/proc/partitions shows only the partitions on /dev/sda [123] and /dev/sdb [123]
as listed above.

Now, I should have all of the devices deactivated and only the kernel/initrd in
memory.  Further, I have placed the information into /etc/mdadm.conf that the
normal system should expect (I hope).

I execute:

mdadm --assemble /dev/md125
mdadm --assemble /dev/md127
mdadm: Container /dev/md127 has been assembled with 2 drives

So far so good.  Maybe.  Oops, wrong order.  The container is now started, and
I need to rerun:

mdadm --assemble /dev/md125
mdadm: array /dev/md125 now has 2 devices

Now we might be mostly there.  The devices are not running according to
/proc/mdstat:

md125 : inactive sda[1] sdb[0]
     335544320 blocks super external:-md127/0

md127 : inactive sdb[1](S) sda[0](S)
     496 blocks super external:imsm

This is where I am stuck.  My question is: "How do I get my systems back?" 
Also, is the "ata_piix" error message a red-herring (i.e. not really a
problem)?  I've seen issues like this before in other race conditions
(primarily for resume functionality) when checks are made before the actual
RAID startup (assuming the disk is physical).

It looks like for some reason the MD devices are not being started properly
(the partitions are not marked with the Linux RAID auto type) before the rest
of the boot process in the initrd proceeds (and then fails).  Should the
existing mdadm configuration be propagated into the initrd at upgrade time?

Reproducible: Always

Steps to Reproduce:
1.Install 11.4 system using Intel RAID
2.Upgrade to 12.1
3.
Actual Results:  
Unbootable system.

Expected Results:  
A bootable system.

I am more than willing to work through this problem.  Both of my systems are
down until I can get this fixed or make the system boot.

The rescue disk suffers from the same issue as the system itself.  There is
currently no way to recover the systems without assistance.

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[Bug 770351] New: Functional system using Intel RAID (BIOS) fails upgrade to 12.1

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

tags

participants (1)