http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Bug ID: 921570 Summary: after dist upgrade system boots to emergency mode Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: SUSE Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: steffen.hau@rz.uni-mannheim.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: ---
I've started upgrading systems from openSUSE 13.1 to 13.2. Virtual machines without special stuff like mdadm raid devices oder multipathed FC LUN's went fine.
But systems (IBM HS22 Blades) with mdadm raid devices are booting to emergency mode. The systems have three raid 1 devices (swap, / and /srv or /home), swap and / are correctly assembled but the third md device is missing and it also does not appear in /proc/mdstat. Manually running "mdadm -A --scan" brings it up and "systemctl default" continues booting. To dig deeper into this issue, I've installed openSUSE 13.2 from scratch on a spare blade and here md2 is correctly assembled.
This is the content of /etc/systemd/system/ of the upgraded system: /etc/systemd/system/dbus-org.opensuse.Network.AUTO4.service /etc/systemd/system/dbus-org.opensuse.Network.DHCP4.service /etc/systemd/system/dbus-org.opensuse.Network.DHCP6.service /etc/systemd/system/dbus-org.opensuse.Network.Nanny.service /etc/systemd/system/default.target /etc/systemd/system/default.target.wants/sysstat.service /etc/systemd/system/default.target.wants/systemd-readahead-collect.service /etc/systemd/system/default.target.wants/systemd-readahead-replay.service /etc/systemd/system/getty.target.wants/getty@tty1.service /etc/systemd/system/multi-user.target.wants/acpid.service /etc/systemd/system/multi-user.target.wants/apache2.service /etc/systemd/system/multi-user.target.wants/auditd.service /etc/systemd/system/multi-user.target.wants/cron.service /etc/systemd/system/multi-user.target.wants/dsmc.service /etc/systemd/system/multi-user.target.wants/irqbalance.service /etc/systemd/system/multi-user.target.wants/mcelog.service /etc/systemd/system/multi-user.target.wants/ntpd.service /etc/systemd/system/multi-user.target.wants/postfix.service /etc/systemd/system/multi-user.target.wants/remote-fs.target /etc/systemd/system/multi-user.target.wants/smartd.service /etc/systemd/system/multi-user.target.wants/sshd.service /etc/systemd/system/multi-user.target.wants/syslog-ng.service /etc/systemd/system/multi-user.target.wants/wicked.service /etc/systemd/system/network-online.target.wants/wicked.service /etc/systemd/system/network.service /etc/systemd/system/sysinit.target.wants/multipathd.service /etc/systemd/system/syslog.service /etc/systemd/system/system-update.target.wants/systemd-readahead-drop.service /etc/systemd/system/timers.target.wants/logrotate.timer /etc/systemd/system/wickedd.service.wants/wickedd-auto4.service /etc/systemd/system/wickedd.service.wants/wickedd-dhcp4.service /etc/systemd/system/wickedd.service.wants/wickedd-dhcp6.service /etc/systemd/system/wickedd.service.wants/wickedd-nanny.service /etc/systemd/system/wicked.service.wants/wickedd.service
I've made the scratch system identical to the problematic system (identical installed packages, conf files in /etc, active systemd services, and so on) and it still assembles md2. I've no more ideas where to search for possible causes. Please let me know what kind of information I should provide in order to help you to find the cause.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
--- Comment #1 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- I hit the submit button a bit to early.
After enabling multipathd.service on the the spare blade, it also boots to emergency mode. Disabling multipathd.service and rebooting makes the system boot again. The same applies to the updated system.
I have no clue why multipathd prevents the system from assembling md2. In emergency mode I checked "multipath -ll" and "dmsetup ls --tree" and both do not have the disk devices used for md2 in use.
I'll attach the journalctl -b output both for a failed (multipathd enabled) boot as well as for a successfull (multipathd disabled) boot. Disabling multipathd is no option as the system is equipped with a FC LUN from our SAN.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
--- Comment #2 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- Created attachment 626158 --> http://bugzilla.opensuse.org/attachment.cgi?id=626158&action=edit Output of journactl -b
Output of journalctl -b for both a failed an successful boot on the upgraded and the spare host
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Steffen Hau steffen.hau@rz.uni-mannheim.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #626158|Output of journactl -b |Output of journalctl -b description| |
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Steffen Hau steffen.hau@rz.uni-mannheim.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High Component|Kernel |Basesystem Assignee|kernel-maintainers@forge.pr |bnc-team-screening@forge.pr |ovo.novell.com |ovo.novell.com OS|SUSE Other |openSUSE 13.2
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Bernhard Wiedemann bwiedemann@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |crrodriguez@opensuse.org, | |rmilasan@suse.com Assignee|bnc-team-screening@forge.pr |hare@suse.com |ovo.novell.com |
--- Comment #3 from Bernhard Wiedemann bwiedemann@suse.com --- the failed one has kernel: device-mapper: multipath: version 1.7.0 loaded kernel: device-mapper: multipath service-time: version 0.2.0 loaded kernel: device-mapper: table: 253:0: multipath: error getting device kernel: device-mapper: ioctl: error adding target to table kernel: device-mapper: table: 253:0: multipath: error getting device kernel: device-mapper: ioctl: error adding target to table
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
--- Comment #4 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- These messages do net seem to be the cause. When I disable the multipathd systemd unit and start it manually after a reboot, the messages are also shown:
Mär 10 17:16:34 testhost kernel: device-mapper: multipath: version 1.7.0 loaded Mär 10 17:16:34 testhost kernel: device-mapper: multipath service-time: version 0.2.0 loaded Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table
But the devices are avalable: testhost:~ # multipath -ll 360050768018085377800000000000084 dm-0 IBM,2145 size=1000G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:6:0 sdd 8:48 active ready running | `- 2:0:6:0 sdf 8:80 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:0 sdc 8:32 active ready running `- 2:0:0:0 sde 8:64 active ready running testhost:~ # dmsetup ls --tree 360050768018085377800000000000084-part1 (253:1) └─360050768018085377800000000000084 (253:0) ├─ (8:64) ├─ (8:32) ├─ (8:80) └─ (8:48)
I can see the device-mapper lines an all systems (also 13.1), were multipathd is enabled, so they look harmless to me.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c6
--- Comment #6 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- I've just installed the dracut update, but that didn't help. md2 is still not assambled when multipathd.service is enabled. I'm now waiting for the multipath-tools update.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c7
--- Comment #7 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- I just wanted to ask when the scheduled multipath-tools update will arrive.
This issue prevents me from updating a lot of bare metal servers depending on multipath for over half a year now. The issue should be easy to reproduce: I just had to enable multipathd.service and md2 was missing.
Could you please again have a look or provide the updated multipath-tools package?
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Peter B auxsvr@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |auxsvr@gmail.com
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c9
--- Comment #9 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- Dear Johannes,
I wasn't able to find a package to download with the provided link. "Download package" says "no data".
I've found http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x8... but this one requires libdevmapper.so.1.02(DM_1_02_97)(64bit). So I've also fetched http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x8.... I'll try this out and report back wether this fixes the issue.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c10
Steffen Hau steffen.hau@rz.uni-mannheim.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(steffen.hau@rz.un | |i-mannheim.de) |
--- Comment #10 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- With both updates applied, system still boots to emergency mode. I'll try changing from multipathd.service to multipathd.socket.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c12
--- Comment #12 from Hannes Reinecke hare@suse.com --- If you were to enable multipath on a non-multipathed root system you need to blacklist the root filesystem.
Also, due to the timing involved multipath might claim the device for MD, so if MD references the devices as raw block devices (ie using /dev/sdX) it won't be able to start.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c13
--- Comment #13 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- (In reply to Hannes Reinecke from comment #12)
If you were to enable multipath on a non-multipathed root system you need to blacklist the root filesystem.
I don't know why I should have to do that. OpenSUSE 13.1 works fine with Swap, / and /srv each on RAID1 md devices and additional multipathed FC LUN's. OpenSUSE 13.2 also assembles SWAP and /, but misses /srv if multipathd.service is enabled.
Also, due to the timing involved multipath might claim the device for MD, so if MD references the devices as raw block devices (ie using /dev/sdX) it won't be able to start.
If already written that I have checked that point. While in emergency mode, dmsetup does not report /dev/sd[a,b]3 to be claimed (see #c4). I can manually assemble the missing array and continue booting.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c14
--- Comment #14 from Steffen Hau steffen.hau@rz.uni-mannheim.de --- The issue does not exist in 42.1. We will skip 13.2. You can close this issue.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570
Ben K Benjamin.nm@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |Benjamin.nm@yahoo.com