[Bug 921570] New: after dist upgrade system boots to emergency mode
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Bug ID: 921570 Summary: after dist upgrade system boots to emergency mode Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: SUSE Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: steffen.hau@rz.uni-mannheim.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- I've started upgrading systems from openSUSE 13.1 to 13.2. Virtual machines without special stuff like mdadm raid devices oder multipathed FC LUN's went fine. But systems (IBM HS22 Blades) with mdadm raid devices are booting to emergency mode. The systems have three raid 1 devices (swap, / and /srv or /home), swap and / are correctly assembled but the third md device is missing and it also does not appear in /proc/mdstat. Manually running "mdadm -A --scan" brings it up and "systemctl default" continues booting. To dig deeper into this issue, I've installed openSUSE 13.2 from scratch on a spare blade and here md2 is correctly assembled. This is the content of /etc/systemd/system/ of the upgraded system: /etc/systemd/system/dbus-org.opensuse.Network.AUTO4.service /etc/systemd/system/dbus-org.opensuse.Network.DHCP4.service /etc/systemd/system/dbus-org.opensuse.Network.DHCP6.service /etc/systemd/system/dbus-org.opensuse.Network.Nanny.service /etc/systemd/system/default.target /etc/systemd/system/default.target.wants/sysstat.service /etc/systemd/system/default.target.wants/systemd-readahead-collect.service /etc/systemd/system/default.target.wants/systemd-readahead-replay.service /etc/systemd/system/getty.target.wants/getty@tty1.service /etc/systemd/system/multi-user.target.wants/acpid.service /etc/systemd/system/multi-user.target.wants/apache2.service /etc/systemd/system/multi-user.target.wants/auditd.service /etc/systemd/system/multi-user.target.wants/cron.service /etc/systemd/system/multi-user.target.wants/dsmc.service /etc/systemd/system/multi-user.target.wants/irqbalance.service /etc/systemd/system/multi-user.target.wants/mcelog.service /etc/systemd/system/multi-user.target.wants/ntpd.service /etc/systemd/system/multi-user.target.wants/postfix.service /etc/systemd/system/multi-user.target.wants/remote-fs.target /etc/systemd/system/multi-user.target.wants/smartd.service /etc/systemd/system/multi-user.target.wants/sshd.service /etc/systemd/system/multi-user.target.wants/syslog-ng.service /etc/systemd/system/multi-user.target.wants/wicked.service /etc/systemd/system/network-online.target.wants/wicked.service /etc/systemd/system/network.service /etc/systemd/system/sysinit.target.wants/multipathd.service /etc/systemd/system/syslog.service /etc/systemd/system/system-update.target.wants/systemd-readahead-drop.service /etc/systemd/system/timers.target.wants/logrotate.timer /etc/systemd/system/wickedd.service.wants/wickedd-auto4.service /etc/systemd/system/wickedd.service.wants/wickedd-dhcp4.service /etc/systemd/system/wickedd.service.wants/wickedd-dhcp6.service /etc/systemd/system/wickedd.service.wants/wickedd-nanny.service /etc/systemd/system/wicked.service.wants/wickedd.service I've made the scratch system identical to the problematic system (identical installed packages, conf files in /etc, active systemd services, and so on) and it still assembles md2. I've no more ideas where to search for possible causes. Please let me know what kind of information I should provide in order to help you to find the cause. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 --- Comment #1 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- I hit the submit button a bit to early. After enabling multipathd.service on the the spare blade, it also boots to emergency mode. Disabling multipathd.service and rebooting makes the system boot again. The same applies to the updated system. I have no clue why multipathd prevents the system from assembling md2. In emergency mode I checked "multipath -ll" and "dmsetup ls --tree" and both do not have the disk devices used for md2 in use. I'll attach the journalctl -b output both for a failed (multipathd enabled) boot as well as for a successfull (multipathd disabled) boot. Disabling multipathd is no option as the system is equipped with a FC LUN from our SAN. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 --- Comment #2 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- Created attachment 626158 --> http://bugzilla.opensuse.org/attachment.cgi?id=626158&action=edit Output of journactl -b Output of journalctl -b for both a failed an successful boot on the upgraded and the spare host -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Steffen Hau <steffen.hau@rz.uni-mannheim.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #626158|Output of journactl -b |Output of journalctl -b description| | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Steffen Hau <steffen.hau@rz.uni-mannheim.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High Component|Kernel |Basesystem Assignee|kernel-maintainers@forge.pr |bnc-team-screening@forge.pr |ovo.novell.com |ovo.novell.com OS|SUSE Other |openSUSE 13.2 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Bernhard Wiedemann <bwiedemann@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |crrodriguez@opensuse.org, | |rmilasan@suse.com Assignee|bnc-team-screening@forge.pr |hare@suse.com |ovo.novell.com | --- Comment #3 from Bernhard Wiedemann <bwiedemann@suse.com> --- the failed one has kernel: device-mapper: multipath: version 1.7.0 loaded kernel: device-mapper: multipath service-time: version 0.2.0 loaded kernel: device-mapper: table: 253:0: multipath: error getting device kernel: device-mapper: ioctl: error adding target to table kernel: device-mapper: table: 253:0: multipath: error getting device kernel: device-mapper: ioctl: error adding target to table -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 --- Comment #4 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- These messages do net seem to be the cause. When I disable the multipathd systemd unit and start it manually after a reboot, the messages are also shown: Mär 10 17:16:34 testhost kernel: device-mapper: multipath: version 1.7.0 loaded Mär 10 17:16:34 testhost kernel: device-mapper: multipath service-time: version 0.2.0 loaded Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table But the devices are avalable: testhost:~ # multipath -ll 360050768018085377800000000000084 dm-0 IBM,2145 size=1000G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:6:0 sdd 8:48 active ready running | `- 2:0:6:0 sdf 8:80 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:0 sdc 8:32 active ready running `- 2:0:0:0 sde 8:64 active ready running testhost:~ # dmsetup ls --tree 360050768018085377800000000000084-part1 (253:1) └─360050768018085377800000000000084 (253:0) ├─ (8:64) ├─ (8:32) ├─ (8:80) └─ (8:48) I can see the device-mapper lines an all systems (also 13.1), were multipathd is enabled, so they look harmless to me. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c6 --- Comment #6 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- I've just installed the dracut update, but that didn't help. md2 is still not assambled when multipathd.service is enabled. I'm now waiting for the multipath-tools update. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c7 --- Comment #7 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- I just wanted to ask when the scheduled multipath-tools update will arrive. This issue prevents me from updating a lot of bare metal servers depending on multipath for over half a year now. The issue should be easy to reproduce: I just had to enable multipathd.service and md2 was missing. Could you please again have a look or provide the updated multipath-tools package? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Peter B <auxsvr@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |auxsvr@gmail.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c9 --- Comment #9 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- Dear Johannes, I wasn't able to find a package to download with the provided link. "Download package" says "no data". I've found http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x8... but this one requires libdevmapper.so.1.02(DM_1_02_97)(64bit). So I've also fetched http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x8.... I'll try this out and report back wether this fixes the issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c10 Steffen Hau <steffen.hau@rz.uni-mannheim.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(steffen.hau@rz.un | |i-mannheim.de) | --- Comment #10 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- With both updates applied, system still boots to emergency mode. I'll try changing from multipathd.service to multipathd.socket. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c12 --- Comment #12 from Hannes Reinecke <hare@suse.com> --- If you were to enable multipath on a non-multipathed root system you need to blacklist the root filesystem. Also, due to the timing involved multipath might claim the device for MD, so if MD references the devices as raw block devices (ie using /dev/sdX) it won't be able to start. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c13 --- Comment #13 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- (In reply to Hannes Reinecke from comment #12)
If you were to enable multipath on a non-multipathed root system you need to blacklist the root filesystem. I don't know why I should have to do that. OpenSUSE 13.1 works fine with Swap, / and /srv each on RAID1 md devices and additional multipathed FC LUN's. OpenSUSE 13.2 also assembles SWAP and /, but misses /srv if multipathd.service is enabled.
Also, due to the timing involved multipath might claim the device for MD, so if MD references the devices as raw block devices (ie using /dev/sdX) it won't be able to start. If already written that I have checked that point. While in emergency mode, dmsetup does not report /dev/sd[a,b]3 to be claimed (see #c4). I can manually assemble the missing array and continue booting.
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 http://bugzilla.opensuse.org/show_bug.cgi?id=921570#c14 --- Comment #14 from Steffen Hau <steffen.hau@rz.uni-mannheim.de> --- The issue does not exist in 42.1. We will skip 13.2. You can close this issue. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=921570 Ben K <Benjamin.nm@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Benjamin.nm@yahoo.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com