[Bug 770351] New: Functional system using Intel RAID (BIOS) fails upgrade to 12.1
https://bugzilla.novell.com/show_bug.cgi?id=770351 https://bugzilla.novell.com/show_bug.cgi?id=770351#c0 Summary: Functional system using Intel RAID (BIOS) fails upgrade to 12.1 Classification: openSUSE Product: openSUSE 12.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.4 Status: NEW Severity: Critical Priority: P5 - None Component: Update Problems AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: novell@roblucke.com QAContact: jsrain@suse.com Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1 I have two systems that I (stupidly) tried to upgrade to OpenSUSE 12.1 from 11.4 at the same time. One is an HP Z600 workstation (x86_64) and the other is an XW4200 workstation (i686), both with Intel RAID controllers on the motherboard, supporting RAID0 and RAID1. The Z600 has an IntelICH8R/ICH9R/ICH10R/DO/5 Series 3400 SATA RAID Controller. Both systems failed the update at the reboot point (the software installed and apparently the initrd was built). GRUB comes up and offers the appropriate options (I have a Windows 7 partition active on the z600, and that boots properly from GRUB). The xw4200 system has two 750 GB disks under RAID control, which presents a single RAID1 LUN to the system at boot. The LUN is partitioned into /boot, swap, and / partitions. The controller presents a second LUN that is RAID0. This all functioned fine under 11.4. After the upgrade completes, at boot time, I get the messages: doing fast boot FATAL: Module ata_piix not found. FATAL: Error running install command for ata_piix Creating device nodes with udev Trying manual resume from /dev/disk/by-id/md-uuid-[UUID]-part1 resume device /dev/disk/by-id/md-uuid-[UUID]-part1 not found (ignoring) [repeated previous messages] Waiting for device /dev/disk/by-id/md-uuid-[UUID]-part3 to appear ...... Could not find /dev/disk/by-id/md-uuid-[UUID]-part3 Want me to fall back to /dev/disk/by-id/md-uuid-[UUID]-part3? (Y/n) Answering "Y" to the question repeats the wait message. Answering "n" to the question or failing the second search drops to the shell (in the initrd, I assume). Doing "cat /proc/mdstat" shows: Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md125 : inactive sda[1] sdb[0] 335544320 blocks super external:-md127/0 md126 : active raid0 sda[1] sdb[0] 1129604352 blocks super external:/md127/1 16k chunks md127 : inactive sda[1](s) sdb[0](S) 496 blocks super external: imsm This is essentially a RAID1 LUN, split into /boot, swap, and / -- along with a RAID0 LUN. Doing "cat /proc/partitions" yields: 8 0 732574584 sda 8 1 2095104 sda1 8 2 200704 sda2 8 3 165462016 sda3 [same entries for sdb, with 16, 17,18,19 for minor numbers] 9 126 1129604352 md126 259 0 1127425024 md126p1 Doing "ls -al /dev/md/" yields: lrwxrwxrwx 1 root root 8 Jul 7 14:35 RAID0-Vol0 -> ../md126 lrwxrwxrwx 1 root root 10 Jul 7 14:35 RAID0-Vol0p1 -> ../md126p1 Doing "ls -al /dev/md*" yields: brw-rw---- 1 root disk 9, 125 Jul 7 14:35 /dev/md125 brw-rw---- 1 root disk 9, 126 Jul 7 14:35 /dev/md126 brw-rw---- 1 root disk 259, 0 Jul 7 14:35 /dev/md126p1 brw-rw---- 1 root disk 9, 127 Jul 7 14:35 /dev/md127 [Yes, I am typing all of this by hand ...] If I do a "mdadm --examine --scan", I get: ARRAY metadata=imsm UUID=8f3d66f2:5c2e7601:4ec4fa30:e6d4be45 ARRAY /dev/md/RAID1-Vol0 container=[Same UUID as metadata] member=0 \ UUID=73dae693:17f852dc:f7010e82:ab9aca07 ARRAY /dev/md/RAID0-Vol0 container=[Same UUID as metadata] member=1 \ UUID=26c6ca9a:223de7d5:94e684cf:315fdae7 I assume (ASSuME) that the active device is the /boot partition on the RAID1 device. What is not happening is activation of the three system partitions. I have tried using "init=/sbin/sysvinit" and selecting "Sysvinit" in the GRUB menu with no luck. I can (and have) done the following: mdadm --stop /dev/md12[567] at this point /proc/mdstat is empty. Next I have executed: mdadm --examine --scan > /etc/mdadm.conf Also, using "mdadm --examine /dev/sd[ab]" yields the expected Intel RAID labels on the disks for the two 750 GB physical disks and the expected information for the presented LUNs (Volumes?). RAID1-Vol0 shows 171.80 GB and RAID0-Vol0 shows 578.36 GB. /proc/partitions shows only the partitions on /dev/sda [123] and /dev/sdb [123] as listed above. Now, I should have all of the devices deactivated and only the kernel/initrd in memory. Further, I have placed the information into /etc/mdadm.conf that the normal system should expect (I hope). I execute: mdadm --assemble /dev/md125 mdadm --assemble /dev/md127 mdadm: Container /dev/md127 has been assembled with 2 drives So far so good. Maybe. Oops, wrong order. The container is now started, and I need to rerun: mdadm --assemble /dev/md125 mdadm: array /dev/md125 now has 2 devices Now we might be mostly there. The devices are not running according to /proc/mdstat: md125 : inactive sda[1] sdb[0] 335544320 blocks super external:-md127/0 md127 : inactive sdb[1](S) sda[0](S) 496 blocks super external:imsm This is where I am stuck. My question is: "How do I get my systems back?" Also, is the "ata_piix" error message a red-herring (i.e. not really a problem)? I've seen issues like this before in other race conditions (primarily for resume functionality) when checks are made before the actual RAID startup (assuming the disk is physical). It looks like for some reason the MD devices are not being started properly (the partitions are not marked with the Linux RAID auto type) before the rest of the boot process in the initrd proceeds (and then fails). Should the existing mdadm configuration be propagated into the initrd at upgrade time? Reproducible: Always Steps to Reproduce: 1.Install 11.4 system using Intel RAID 2.Upgrade to 12.1 3. Actual Results: Unbootable system. Expected Results: A bootable system. I am more than willing to work through this problem. Both of my systems are down until I can get this fixed or make the system boot. The rescue disk suffers from the same issue as the system itself. There is currently no way to recover the systems without assistance. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c1
--- Comment #1 from Rob Lucke
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c
kk zhang
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c2
Rob Lucke
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c3
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c4
--- Comment #4 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c5
--- Comment #5 from Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=770351
https://bugzilla.novell.com/show_bug.cgi?id=770351#c6
--- Comment #6 from Bernhard Wiedemann
participants (1)
-
bugzilla_noreply@novell.com