[Bug 851993] New: boot failure after 12.3 -> 13.1 upgrade if /boot is on RAID
https://bugzilla.novell.com/show_bug.cgi?id=851993 https://bugzilla.novell.com/show_bug.cgi?id=851993#c0 Summary: boot failure after 12.3 -> 13.1 upgrade if /boot is on RAID Classification: openSUSE Product: openSUSE 13.1 Version: Final Platform: x86-64 OS/Version: openSUSE 13.1 Status: NEW Severity: Critical Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: ar16@imapmail.org QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0 Starting with a working 12.3 server with boot-on-RAID & lvm-on-RAID for other partitions, I in-place-upgraded to 13.1. There were no errors on upgrade; and a mkinitrd is successful. @reboot, it no longer boots. Appears to hang at an LVM2/Dbus timeout waiting for the /boot device that's on RAID; here's a log snippet: http://pastebin.com/KcNPriuV The upgrade from 12.3 -> 13.3 moves LVM2 from sysvinit to systemd init; the unit files are new. There's at least 1 issue with LVM2 systemd unit (https://forums.opensuse.org/newthread.php?do=newthread&f=668) mdadm 3.3 has issues too (http://www.spinics.net/lists/raid/msg44943.html) There's obviously a dependency problem. Finding/fixing it has been elusive so far. Any suggestions how to specifically identify which dependency is broken? Fyi, all machines I've upgraded 12.3 -> 13.1 with boot&lvm-on-RAID fail in this manner. All machines I've upgraded 12.3 -> 13.1 with boot&LVM, but *NO* RAID, boot without fail. @ forums, a comment:
There was a problem with offline upgrade and raid: the DVD used a different device name for the raid than the running system, so that the raid is not found at some point. I don't know if it has been solved
points to a possible related comment: ---------------------------------------- Date: Thu, 21 Mar 2013 15:43:37 +0100 From: Claudio ML <> To: opensuse at opensuse.org Subject: [opensuse] Usual problem with upgrade from DVD to 12.3 with system using mdadm Hello all, I have seen there is the same problem of the previous versions of OpenSuSE also into 12.3. The problem is: If you try to upgrade from a DVD an OpenSuSE with software raid (mdadm), the installer don't see the previous version of openSuSE on the hard disk. This is caused because the kernel don't recognize correctly the md arrays. In example, if i have md0 (/ root file system), md1 (/var), md2 (/srv), the installer see this as md125, md126 and md127. So, no DVD upgrade possible for this machines. I have tryied to edit manually the fstab on the machine, adapting to what the installer see, but after the upgrade the system becomes un-bootable. Anyone have a good workaround for this problem? Cordially, Claudio. ---------------------------------------- this, too, may be of interest: Bug 849752 - 13.1RC2 Cannot boot from RAID1 volume(s) https://bugzilla.novell.com/show_bug.cgi?id=849752#c9 Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c1
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c2
Andrey Borzenkov
Appears to hang at an LVM2/Dbus timeout waiting for the /boot device that's on RAID; here's a log snippet: http://pastebin.com/KcNPriuV
Please, never truncate any logs you provide. In your logs boot.md is missing and it is not clear whether you omitted it or it is really did not start. Please boot with kernel parameter systemd.log_level=debug and attach output of "journalctl -b" after entering emergency mode. Also /proc/mdstat at this point would be interesting.
The upgrade from 12.3 -> 13.3 moves LVM2 from sysvinit to systemd init; the unit files are new.
According to logs you provided LVM2 activation worked fine; it is /boot device that was missing and it is not on LVM. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c3
--- Comment #3 from A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c4
--- Comment #4 from Andrey Borzenkov
I can't actually enter emergency mode during a failed boot process; I see the msg advising to enter emergency mode, but can't get to it.
Yes, apparently something is wrong with emergency.service as well. Could you attach /usr/lib/systemd/system/emergency.service? And to be sure that you do not have some other screening it - "find /path/to/root -name emergency.service".
I can grab serial console output, if that's sufficient. Is it?
Yes; in this case add "systemd.log_target=console console=xxx" (with appropriate definition for your serial console).
And/or, I can boot, fail, reboot to a resuce disk, chroot to this env, and perhaps get at the system.journal that way.
Yes, as long as you have configured persistence journal. This was not default in 12.3 or 13.1. In this case you do not need chroot, just mount filesystem containing /var/log/journal and use "journalctl -D /path/to/var/log/journal -b".
What's preferred?
Well, journal may hold some extra output, so would be better to get it. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c5
--- Comment #5 from A R
Could you attach /usr/lib/systemd/system/emergency.service
cat /usr/lib/systemd/system/emergency.service # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. [Unit] Description=Emergency Shell Documentation=man:sulogin(8) DefaultDependencies=no Conflicts=shutdown.target Before=shutdown.target [Service] Environment=HOME=/root WorkingDirectory=/root ExecStartPre=-/bin/plymouth quit ExecStartPre=-/bin/echo -e 'Welcome to emergency mode! After logging in, type "journalctl -xb" to view\\nsystem logs, "systemctl reboot" to reboot, "systemctl default" to try again\\nto boot into default mode.' ExecStart=-/usr/sbin/sulogin ExecStopPost=/usr/bin/systemctl --fail --no-block default Type=idle StandardInput=tty-force StandardOutput=inherit StandardError=inherit KillMode=process IgnoreSIGPIPE=no SendSIGHUP=yes cat /usr/lib/systemd/system/emergency.target # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. [Unit] Description=Emergency Mode Documentation=man:systemd.special(7) Requires=emergency.service After=emergency.service AllowIsolate=yes
What's preferred? Well, journal may hold some extra output, so would be better to get it.
Here's the boot config title openSUSE SERIAL CONSOLE root (hd0,0) kernel /vmlinuz root=/dev/VG_SVR/LV_SVR_DOM0_ROOT rootfstype=ext4 noresume showopts ide=nodma apm=off edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x31a console=tty0 console=ttyS0,57600n8 nomodeset debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug systemd.log_level=debug systemd.log_target=syslog-or-kmsg initrd /initrd
journalctl -D /path/to/var/log/journal -b.
When I exec journalctl -D /var/log/journal/e3d557b51266cc26571ec214493ee60e -b I just get, -- Logs begin at Fri 2013-11-22 12:07:59 PST, end at Sat 2013-11-23 23:27:24 PST. -- Instead, I've captured the output of journalctl -D /var/log/journal/e3d557b51266cc26571ec214493ee60e/system.journal it's here, in 2 parts: part1: http://pastebin.com/m5cCQ32G part2: http://pastebin.com/F9FEqg65 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c6
--- Comment #6 from Andrey Borzenkov
cat /usr/lib/systemd/system/emergency.service
Looks fine.
part1: http://pastebin.com/m5cCQ32G part2: http://pastebin.com/F9FEqg65
It smells like due to weird interaction with rsyslog.service emergency mode is aborted. Could you please try to disable it and see if you can enter emergency shell - this will make debugging much easier. rm /path/to/etc/systemd/system/syslog.service and try to reboot. You should be able to enter emergency shell now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c7
--- Comment #7 from A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c8
--- Comment #8 from Andrey Borzenkov
I can now enter emergency mode @ prompt.
OK, I opened bnc#852021 for rsyslog issue. Now it is no more problem of systemd, MD array is not assembled during boot. I guess Neil is in much better position to debug it further :) But you can attach output of "udevadm info --export-db", /etc/fstab and /proc/mdstat for a start. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c9
--- Comment #9 from A R
I opened bnc#852021 for rsyslog issue.
thanks.
... you can attach output of ...
udevadm info --export-db --> http://pastebin.com/ch9zDjJh cat /etc/fstab cat /proc/mdstat --> http://pastebin.com/fniwz0K1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c10
--- Comment #10 from Andrey Borzenkov
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c11
--- Comment #11 from A R
Could you also attach /etc/mdadm.conf?
cat /etc/mdadm.conf --> http://pastebin.com/FkZqd7wg -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c12
--- Comment #12 from Andrey Borzenkov
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c13
--- Comment #13 from A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c14
--- Comment #14 from A R
In 13.1 MD is incrementally assembled when each device appears. mdadm --incremental is run from udev rule when these links are not yet created
Are you referring to assembly in: cat /usr/lib/udev/rules.d/64-md-raid-assembly.rules ... # remember you can limit what gets auto/incrementally assembled by # mdadm.conf(5)'s 'AUTO' and selectively whitelist using 'ARRAY' ACTION=="add", RUN+="/sbin/mdadm --incremental $devnode --offroot" ... ? "DEVICE ..." assembly specification is needed. To get back "DEVICE ..." function, does this udev rule need to simply move later in exec, AFTER the link creation? Or is there a new/different udev rule required? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c15
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c16
--- Comment #16 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c17
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c18
A R
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c19
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c20
A R
It looks like md1 holds your root filesystem - correct?
yes md0 <- "/boot" md1 <- "/" md2 <- additional storage
What does
zcat /boot/initrd | cpio -i --to-stdout etc/mdadm.conf
report? Please double check it gives the same uuid as mdadm -Db /dev/md125
Same UUID returned for both zcat /boot/initrd | cpio -i --to-stdout etc/mdadm.conf AUTO -all ARRAY /dev/md125 metadata=1.0 name=<none>:server_root UUID=a1b...c3d 61591 blocks and mdadm -Db /dev/md125 ARRAY /dev/md125 metadata=1.0 name=<none>:server_root UUID=a1b...c3d -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c21
--- Comment #21 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c22
--- Comment #22 from A R
So the array is getting called md125 because that is what the mdadm.conf on the initrd say, and the initrd say that because that is what it is currently called.
Catch22, then. But, apparently, not a suprise.
I guess it would be nice to copy the name out of /etc/mdadm.conf onto the initrd. Though given that the name is "server_root", you really want /dev/md_server_root, but that is a little way off working perfectly yet.
Ideally, 'it' does what you 'tell it' to do -- in /etc/mdadm.conf. It's not a big deal, but does cause some head-scratching ...
I don't think I'm comfortable getting the mkintrd to automatically change the name of the root md device as it is too error prone.
Perhaps, semi-auto, then? I.e., a cmd-line flag/option to mdadm that writes the /dev/<name> as spec'd in /etc/mdadm.conf to the mkinitrd?
You could do it manually by: ...
Thanks, noted.
6/ edit the command line to change "root=/dev/md125" to "root=/dev/md0"
just fyi, my root fs is on LVM. grub contains title Xen root (hd0,0) kernel /xen.gz ... module /vmlinuz-xen root=/dev/VG_SVR/LV_SVR_DOM0_ROOT ... module /initrd-xen i.e., NOT "root=/dev/md125". Not sure if LVM mount 'vs' MD assembly is of any issue -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c23
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c24
--- Comment #24 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c25
--- Comment #25 from Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c26
Matjaz Godec
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c27
Stephan Brunsch
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c28
--- Comment #28 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c29
--- Comment #29 from Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c30
--- Comment #30 from Stephan Brunsch
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c31
--- Comment #31 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c33
--- Comment #33 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c35
Neil Brown
I created a mdmadm.conf in the initrd but it did not help
chrooting in md did not help. raid is found but cannot be mounted => missing personality ...
Adding "-f md" as an option to mkinitrd should ensure that the modules required for md are on the initrd. This will avoid the "missing personality" problem. The mdadm.conf on the initrd need to identify the array that root is on: md1 in your case. So something like: ARRAY /dev/md1 UUID=........ should be sufficient (with the correct uuid in there of course). Does that work? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c36
Stephan Brunsch
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c37
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c38
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=851993#c39
Neil Brown
participants (1)
-
bugzilla_noreply@novell.com