Bug ID 916014
Summary EXT4 on LVM on LUKS on RAID5 doesn't ask for password on boot
Classification openSUSE
Product openSUSE Distribution
Version 13.2
Hardware x86-64
OS openSUSE 13.2
Status NEW
Severity Critical
Priority P5 - None
Component Bootloader
Assignee jsrain@suse.com
Reporter darren@freemaninstruments.com
QA Contact jsrain@suse.com
Found By ---
Blocker ---

I have been using LVM on LUKS for a while now. It's great while it's working,
but almost always requires some fiddling to get a freshly installed system to
boot. Fiddling often entails manually bringing the system far enough up that it
can pull in updated packages. For some reason, openSUSE almost always ships
broken in this regard, creating a lot of extra work.

My laptop has been upgraded from 12.x up to 13.1 and then 13.2. I've managed to
get it running fine. But now I'm installing a new file server and I've
absolutely had it with trying to get openSUSE to boot. Compared to the laptop,
this one adds a RAID5 under the LUKS, although I have an older server running
12.2 in this configuration just fine.

Here is a summary of what I've been through over the last couple of days.

Six discs, MBR partitioning, all 256 MB part 1 and the rest of the space part
2. Part 1 are all RAID1 together as /boot, part 2 are all RAID5 together to
hold the LUKS/LVM. Grub2 installed into /dev/sda MBR. Intend to install it on
all of them once it's working. Temporarily, drive 6 is actually external via
USB until another drive is sourced. Drive 6 doesn't have the /boot first
partition on it.

* 13.2 installer asks to unlock LUKS, correctly does so but then doesn't
discover the logical volumes inside. Hit "rescan devices", it asks for the
password again and then does find the logical volumes. Installation proceeds as
normal..

* 13.2 won't boot. It says it is waiting on a certain UUID to appear, which I
guess is the logical volume for root. It will never appear. Some clever
individual has decided that there should be no time-out, so it waits forever
and you don't get a shell.

* manually add rd.timeout=60 and rd.shell to the kernel command line. Since we
are now using grub2, every change to the command line takes a fair bit of
effort from the rescue system.

* Now Dracut drops to a shell. But even though I can easily access the logical
volume and create a symlink /dev/root that points to /dev/system/root, when I
exit the shell it always says "transaction is terminal" or similar. Other
systems running Dracut (Arch linux) actually boot at this point, but this one
won't.

Dracut is an abomination, by the way. How does it work? Nobody knows.

* I have tried passing luks=1 luks.uuid= luks.name= to the kernel, no
improvement. After much hacking I got it to the point where software raid was
no longer getting included in the initramfs. So I tried reinstalling a second
time, this time without including the update repo during install, and including
the timeout/shell options in the bootloader installation.

* Still wouldn't work. Despite being able to chroot to the installed system
(from rescue) and run YaST2 for package management and bootloader installation,
I just can't get it to either ask for a password to unlock LUKS on boot, or
actually do anything further once provided with /dev/root symlink.

* It is definitely including MD RAID, LVM and LUKS in the initramfs. From the
shell I can set everything up. It's just a matter of issuing the right
commands. I note that the related scripts all have the same priority of 90, and
I wonder if that's why it can't figure out that they have to run one at a time
and in the correct order.

* I tried installing 13.1, thinking that maybe the upgrade path is why my
laptop works. The installer doesn't have the bug where the first time you
unlock LUKS, it doesn't find the logical volumes. It finds them first go.
Installation is smooth.

* 13.1 doesn't boot. This time, I get the much easier to work with previous
init system, which does its iterative 255 loops over hoping to discover the
root for several minutes, before asking for the root password two or three
times. (Clever - there is no /etc/passwd) Then it gave me a shell anyway.
Finally! I thought. I can fix this.

* Couldn't run cryptsetup luksOpen - the root filesystem is read-only in the
13.1 rescue environment. So cryptsetup can't write a file it needs and won't
perform the action.

My God. OpenSUSE used to work most of the time, and be quite easily fixed. This
is ridiculous. The 13.x series appear to be incredibly fragile and inadequately
tested. I would expect this to be easily reproducible in a VM, it doesn't seem
to depend on my hardware.

Tomorrow I will try using 13.2 rescue to see if I can repair 13.1. Then I might
install 13.2 and try booting with the USB drive failed from the array and
disconnected, in case USB is delaying assembly of the array until it's too late
to unlock it.

But my other server is actually running a RAID4 with the parity drive on USB.
So at some point I made this work on 12.2. I remember it being a lot of effort,
but that's always been the case. In my experience, openSUSE just never ever
boots first go on LVM/LUKS. It's still an OS that's worth fixing after install,
assuming it can be done.


You are receiving this mail because: