[Bug 1202589] New: during reboot/shutdown when stopping MD devices: infinite loop due to busy MD device
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 Bug ID: 1202589 Summary: during reboot/shutdown when stopping MD devices: infinite loop due to busy MD device Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: walter.haidinger@gmx.at QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Preliminary note: I'm still trying to figure out what package/change is causing this or when exactly this started. I'm updating on a daily basis. Unfortunately so far I can only describe the symptom. Problem: Shutdown scripts run in an infinite loop (ran at least for ~ 15 minutes) upon reboot or shutdown. Console shows: Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md2 (9:2). md: md2 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. ... The cycle repeats. Now, /dev/md1 holds /dev/ssd/root and thus it might a busy because of some other dependency (e.g. fs which couldn't be remounted and was re-mounted ro or else). Point is, reboot/shutdown did work previously until about a week or so where busy MD devices were logged but then the sequence continued. This is how it llooked like: Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM deSuccessfully changed into root pivot. Returning to initrd... dracut: Taking over mdmon processes. dracut Warning: Killing all remaining processes dracut Warning: Killing all remaining processes dracut Warning: Unmounted /oldroot. dracut Warning: Unmounted /oldroot. dracut: Disassembling device-mapper devices dracut: Waiting for mdraid devices to be clean. dracut: Disassembling mdraid devices. md1: detected capacity change from 440139776 to 0 md: md1 stopped. mdadm: stopped /dev/md1 Rebooting. Any hints for howto further debug this are appreciated. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c1 --- Comment #1 from Walter Haidinger <walter.haidinger@gmx.at> --- Just to clarify because the bug was assigned to Neil: I doubt it's a md related issue. The md device is rightfully busy and cannot be stopped. Rather the shutdown/reboot procedure should eventually give up and continue (as it did previously). Btw, just rebooted and the problem is still reproducible. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c5 --- Comment #5 from Walter Haidinger <walter.haidinger@gmx.at> --- md config is as follows: # linear array of two older disks used as emergency spare for md4 md2 : active linear sdi2[0] sdh[1] 15626694215 blocks super 1.2 0k rounding # data VG: main data array (3,5" disks) md4 : active raid6 md2[4](S) sdg1[6] sde1[7] sdf1[8] sdd1[5] 31251688448 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 2/8 pages [8KB], 1048576KB chunk # /boot md0 : active raid1 sdb1[3](W) sdc1[2] sda1[4] 1959884 blocks super 1.0 [3/3] [UUU] # ssd VG with /dev/ssd/root mounted (sda and sdc are ssd, sdb is a 2,5" disk) md1 : active raid1 sdb2[2](W) sda2[4] sdc2[3] 220069888 blocks super 1.2 [3/3] [UUU] And yes, stopping md2 is present in the cycle. But md2 isn't part of DM. However, noticed something else. Occasionally there's another log: "block device autoloading is deprecated and will be removed." A recent cycle: ... md: md2 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md2 (9:2). block device autoloading is deprecated and will be removed. md: md2 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md2 (9:2). md: md2 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md2 (9:2). md: md2 stopped. Stopping MD /dev/md1 (9:1). ... -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c8 --- Comment #8 from Walter Haidinger <walter.haidinger@gmx.at> --- Well, I downgraded to dracut 303, rebuilt initrd and the issue persists. also no /run/initramfs/shutdown... -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c9 --- Comment #9 from Walter Haidinger <walter.haidinger@gmx.at> --- Ok, I see. /run/initramfs/shutdown gets created upon shutdown by /usr/lib/dracut/dracut-initramfs-restore. Reverted back to latest dracut 309 and manually running /usr/lib/dracut/dracut-initramfs-restore nicely populates /run/initramfs including creating run/initramfs/shutdown. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c10 --- Comment #10 from Franck Bui <fbui@suse.com> --- (In reply to Walter Haidinger from comment #9)
Reverted back to latest dracut 309 and manually running /usr/lib/dracut/dracut-initramfs-restore nicely populates /run/initramfs including creating run/initramfs/shutdown.
And does the system properly shutdown in this case ? -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c12 --- Comment #12 from Walter Haidinger <walter.haidinger@gmx.at> --- No, the system does not shutdown correctly, i.e. pivot root to initramfs, even with dracut 303. Not even a workaround so far. Currently I use a serial connection and reboot the system via SysRq to "break" to loop. And yes, I've of course rebooted (in fact multiple times) to apply any changes to initrd. I'll try to debug the shutdown process as suggested in dracut-shutdown.service(8). With latest dracut 309 that is. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c13 --- Comment #13 from Walter Haidinger <walter.haidinger@gmx.at> --- To clarify: no, the system does not shutdown correctly even if dracut-initramfs-restore is run manually. As if /run/initramfs is wiped before that call during shutdown. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c14 --- Comment #14 from Neil Brown <nfbrown@suse.com> --- I think the problem relates to md2. It shouldn't get restarted as seems to be happening. Can you (temporarily) remove md2 from md4 and see if that avoids the problem? Is it possible to enable some udev tracing during the shutdown to see if something is happening there. I don't think the problem is with dracut. systemd runs systemd-shutdown and tries to shut down devices until it cannot make any more progress, and only then does it switch to dracut in the initrd. As it does seem to make progress on every loop - always stopping md2 - it just keeps going. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c15 --- Comment #15 from Walter Haidinger <walter.haidinger@gmx.at> --- Removed md2 but now the cycle shows md4 and the infinite loop still persists: Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md4 (9:4). md: md4 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. Stopping MD /dev/md4 (9:4). md: md4 stopped. Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. Not all DM devices detached, 1 left. Stopping MD devices. ... The break set as mentioned in dracut-shutdown.service(8) wasn't hit but that's pretty clear if systemd never relinquished control back to initramfs. How do I enable that udev tracing during shutdown? -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c16 --- Comment #16 from Neil Brown <nfbrown@suse.com> ---
How do I enable that udev tracing during shutdown?
I dnn't know for certain. Running udevctl control -l debug will tell udev to log everything, but it goes to the journal, not the console. I guess you could then look at the journal after you reboot. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c17 --- Comment #17 from Walter Haidinger <walter.haidinger@gmx.at> --- Well, for a start I sent SIGRTMIN+22 (loglevel=debug) and SIGRTMIN+27 (log=console) to systemd right before reboot. Signals are described in systemd(1). Will look into for howto enable specific udev debugging. Also reattached md2 as a spare for md4. A sync error is shown now. What's odd is that the system now did return to initrd... Captured serial console logs: Couldn't finalize remaining DM devices, MD devices, trying again. Stopping MD devices. sd-device-enumerator: Scan all dirs sd-device-enumerator: Scanning /sys/bus sd-device-enumerator: Scanning /sys/class Stopping MD /dev/md2 (9:2). Failed to sync MD block device /dev/md2, ignoring: Input/output error Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. sd-device-enumerator: Scan all dirs sd-device-enumerator: Scanning /sys/bus sd-device-enumerator: Scanning /sys/class Not all DM devices detached, 1 left. Couldn't finalize remaining DM devices, MD devices, trying again. Stopping MD devices. sd-device-enumerator: Scan all dirs sd-device-enumerator: Scanning /sys/bus sd-device-enumerator: Scanning /sys/class Stopping MD /dev/md2 (9:2). Failed to sync MD block device /dev/md2, ignoring: Input/output error Stopping MD /dev/md1 (9:1). Could not stop MD /dev/md1: Device or resource busy Not all MD devices stopped, 1 left. Detaching DM devices. sd-device-enumerator: Scan all dirs sd-device-enumerator: Scanning /sys/bus sd-device-enumerator: Scanning /sys/class Not all DM devices detached, 1 left. Couldn't finalize remaining DM devices, MD devices, trying again. Stopping MD devices. sd-device-enumerator: Scan all dirs sd-device-enumerator: Scanning /sys/bus sd-device-enumerator: Scanning /sys/Successfully changed into root pivot. Returning to initrd... dracut: Taking over mdmon processes. dracut Warning: Killing all remaining processes EXT4-fs (dm-1): unmounting filesystem. dracut Warning: Unmounted /oldroot. dracut: Disassembling device-mapper devices dracut: Waiting for mdraid devices to be clean. dracut: Disassembling mdraid devices. md1: detected capacity change from 440139776 to 0 md: md1 stopped. mdadm: stopped /dev/md1 Rebooting. -- You are receiving this mail because: You are on the CC list for the bug.
![](https://seccdn.libravatar.org/avatar/a895f78a81a109471893519443e4d933.jpg?s=120&d=mm&r=g)
http://bugzilla.opensuse.org/show_bug.cgi?id=1202589 http://bugzilla.opensuse.org/show_bug.cgi?id=1202589#c20 Walter Haidinger <walter.haidinger@gmx.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #20 from Walter Haidinger <walter.haidinger@gmx.at> --- Issue doesn't appear with recent kernel-default-5.19.7-1.1.x86_64 package which incorporates the reverted patch mentioned in #1202534. *** This bug has been marked as a duplicate of bug 1202534 *** -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com