https://bugzilla.novell.com/show_bug.cgi?id=832501 https://bugzilla.novell.com/show_bug.cgi?id=832501#c0 Summary: boot on raid device is not started if degraded; fix provided Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: x86-64 OS/Version: openSUSE 12.3 Status: NEW Severity: Major Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: peter.maloney@brockmann-consult.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=550422) --> (http://bugzilla.novell.com/attachment.cgi?id=550422) patch for /var/mkinitrd/scripts/, maybe not src files User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0 If your /boot is on a separate raid device from your /, mkinitrd does not add any information in the initrd to start the raid device, so boot will fail. I don't know why booting works if the RAID is clean. Perhaps systemd is starting it in this case. Ubuntu 12.04 (grub 1.99) can boot with degraded raid as long as you manually fix the metadata version of the device (change to 0.90, possibly 1.0, but not 1.2 which is default on CLI and in Ubuntu installer), so I was sad to see that the latest openSUSE does not work (even though previous versions did work). But I was happy to see that openSUSE will work with my fix and without changing the metatdata, because openSUSE uses grub 2.00 and the installer uses metadata 1.0 instead of 1.2. I have fixed the problem on my machine by editing the mkinitrd scripts. I don't know if I did a nice clean job that will work on other systems, so please validate it. I have also added some extra output in verbose mode. In my solution, I have checked to see if the mdadm.conf exists, and if not, generated one. This is because the openSUSE installer did not generate one for me in my most hackish of tests. I think this seems like a good way to prevent some problems, even if they are the users' fault. In my solution, I am not sure if there is a problem when you have no mdadm.conf or your mdadm.conf has entries for things you don't want to be required for boot, and then the initrd will try to start them too. I did a check in /sys/devices/virtual/block/ to see if there are devices before trying to handle them, and then if there are devices but no mdadm.conf, then I use <(mdadm -D --scan) to read the output instead of the file. Reproducible: Always Steps to Reproduce: Set up a test machine: 2 x 16 GB virtual disks md0 is raid1, sda1 and sdb1, and mounted on /boot as ext4 md1 is raid1, sda2 and sdb2, and is a LVM PV /dev/suse is the LVM VG containing PV /dev/md1 /dev/suse/root is from VG /dev/suse, and mounted on / as ext4 /dev/suse/swap is from VG /dev/suse, and is swap On command line, you could create the devices like this: mdadm --create /dev/md0 -n 2 -x 0 -l 1 -e 1.0 missing /dev/sdb1 mdadm --create /dev/md1 -n 2 -x 0 -l 1 -e 1.0 missing /dev/sdb2 mkfs.ext4 -L boot /dev/md0 pvcreate /dev/md1 vgcreate suse /dev/md1 lvcreate -L 4GB -n swap suse lvcreate -l 100%FREE -n root suse mkfs.ext4 -L root /dev/suse/root mkswap /dev/suse/swap After the machine is up, run this to ensure the machine should be ready to boot with either disk missing: grub2-install /dev/sda grub2-install /dev/sdb mkinitrd grub2-mkconfig -o /boot/grub2/grub.cfg Then shut it down; remove a disk (I removed the 2nd for most of my tests, because virtualbox snapshots mess up if you boot from the one you add afterwards). Then boot it up Actual Results: You get a very long wait (at least 60 seconds) and then you get emergency mode. Normal startup was blocked because fsck could not open /dev/md0; it could not open it because /dev/md0 is started and exists, but is not running (as if --run was not used when assembling). Expected Results: You get a successful boot with degraded arrays. The systemd log shows you something like this: Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.device/start timed out. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.device. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for /boot. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for Local File Systems. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for Remote File Systems (Pre). Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job remote-fs-pre.target/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job local-fs.target/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Triggering OnFailure= dependencies of local-fs.target. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job boot.mount/start failed with result 'dependency'. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Dependency failed for File System Check on /dev/disk/by-uuid/a16b10b0-d038-4946-ad88-97c0617bbf8c. Jul 30 12:29:11 peterrouter.bc.local systemd[1]: Job systemd-fsck@dev-disk-by\x2duuid-a16b10b0\x2dd038\x2d4946\x2dad88\x2d97c0617bbf8c.service/start failed with result 'dependency'. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.