Bug ID 1017695
Summary Boot on broken RAID 1 with missing disk fails. Looks linked to dracut premount hook script (start job running with nolimit timeout)
Classification openSUSE
Product openSUSE Distribution
Version Leap 42.2
Hardware All
OS openSUSE 42.2
Status NEW
Severity Critical
Priority P5 - None
Component Basesystem
Assignee bnc-team-screening@forge.provo.novell.com
Reporter cedric@solucionjava.com
QA Contact qa-bugs@suse.de
CC 9b3e05a5@opayq.com, arvidjaar@gmail.com, harald@redhat.com, mchang@suse.com, meissner@suse.com, trenn@suse.com
Depends on 1010852
Found By Community User
Blocker ---

Created attachment 708146 [details]
Boot screen

Here is the case:

I have 2 disks setup in RAID 1 as follow:
/dev/md0 (/dev/sda1 + /dev/sdb1) --> /boot
/dev/md1 (/dev/sda2 + /dev/sdb2) --> /

My Scenario:
1) Start server with RAID1 and array fully synchronized
2) Shutdown my server normally
3) Remove one of the disk (/dev/sdb) 
4) Try booting again with just 1 disk

My scenarion is working well on openSuse 11.3 --> 42.1

On openSuse 42.2, when I try to boot with 1 disk only, GRUB load fine (while
also located on an array), but later on it waits forever for the root
(/dev/md1) device. I would expect it to brake the array and go on.

I tested it with new install of 42.2 from DVD (with and without zypper update),
as well as online distro upgrade from 42.1. Same issue in all scenarios.

If array was already broken before shutdown, it does boot well (with broken
array).

After several days testing different possible solutions I found a workaround:
add "waitdev=30" to linux GRUB boot command line.

I also noticed that adding a new (empty) disk also solve the problem (it starts
with broken array).

Problem seems to be located in dracut-pre-mount script.

This is critical for me since I have already had several issues where "hard
shutdown" (electrical (battery) power failure) damages a disk, leading to a
similar scenario as the one described above.

For the moment I will use the "waitdev=n" parameter, but I would prefer to have
this bug fixed.

Well at least I leave it documented hoping it can save someone else time...


You are receiving this mail because: