[Bug 1129487] New: Since kernel 5.0.1-1 boot is being delayed by "a stop job is running for udev Kernel Device Manager"
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487 Bug ID: 1129487 Summary: Since kernel 5.0.1-1 boot is being delayed by "a stop job is running for udev Kernel Device Manager" Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: paul.pgp-7@gmx.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- TW snapshot 20190314 brought in V 5.0.1-1-default kernel. Since then, at every boot, I'm getting a 17sec delay part way through the boot process. I'm not using plymouth and see the message "a stop job is running for udev Kernel Device Manager" which lasts for 17sec (out of the 1m 30s timeout limit). Boot then continues as normal. There have been no changes to this machine other than a "zypper dup" from 20190312 to 20190314. If I boot using the previous 4.20.13-1 kernel I don't get the delay. This is running on a quite elderly AMD Athlon 64x2 based system. I guess this may be hardware specific. I initially posted to the openSuse user forum ( https://forums.opensuse.org/showthread.php/535308-Increased-boot-time-since-... ) where other users reported not seeing the problem. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c3
--- Comment #3 from Paul Tannington
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c4
--- Comment #4 from Franck Bui
It's surprising that udev itself takes so long to start.
So I misread your initial comment: the hang happens just before switching to rootfs when all services are requested to stop. Since the full debug logs have a negative side effect for reproducing, can you try to enable udev debug logs only and see if you can still reproduce ? (append "rd.udev.log_priority=debug" to the kernel command line). Also frpm the last logs you showed:
Mar 21 08:59:25 Orion-15 kernel: sd 6:0:0:3: [sdg] Attached SCSI removable disk
Can you try to remove this disk before booting and see if it helps ? Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c5
--- Comment #5 from Paul Tannington
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c6
--- Comment #6 from Franck Bui
The entire log is some 868k - I can attach it if required.
Yes please attach the entire log preferably by running "journalctl -b -o short-monotonic". -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c7
--- Comment #7 from Paul Tannington
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c8
--- Comment #8 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c9
--- Comment #9 from Paul Tannington
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c10
Franck Bui
From the debug logs provided in comment #7:
[ 8.388246] Orion-15 systemd-udevd[276]: sdd: Device (SEQNUM=1405, ACTION=add) is queued
[ 8.391547] Orion-15 systemd-udevd[290]: sdd: Processing device (SEQNUM=1405, ACTION=add)
[ 8.521679] Orion-15 systemd[1]: Stopping udev Kernel Device Manager...
[ 26.132085] Orion-15 kernel: sd 6:0:0:0: [sdd] Attached SCSI removable disk [ 26.147772] Orion-15 systemd-udevd[290]: sdd: GROUP 6 /usr/lib/udev/rules.d/50-udev-default.rules:59 [ 26.148431] Orion-15 systemd-udevd[290]: sdd: IMPORT builtin 'usb_id' /usr/lib/udev/rules.d/54 [...] [...] [ 26.157478] Orion-15 systemd-udevd[290]: sdd: Device (SEQNUM=1405, ACTION=add) processed [ 26.158906] Orion-15 systemd-udevd[290]: sdd: sd-device-monitor: Passed 1117 byte to netlink monitor
[ 26.356092] Orion-15 systemd[1]: Switching root.
It seems that udev worker whose PID=290 is stuck until the following kernel trace is emitted:
[ 26.132085] Orion-15 kernel: sd 6:0:0:0: [sdd] Attached SCSI removable disk
I don't know why the kernel event happened long before sdd seems usable. Can this be due to a change brought by the new kernel ? Also I'm not sure where the udev worker is stuck... it should be in worker_lock_block_device()[1] either stuck at open() (although it's unlikely because O_NONBLOCK is passed) or at flock(). [1] https://github.com/openSUSE/systemd/blob/openSUSE-Factory/src/udev/udevd.c#L... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c11
--- Comment #11 from Paul Tannington
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c12
--- Comment #12 from Franck Bui
Shall I downgrade those also and still try? (Which will now have to wait until tomorrow.)
Yes please also downgrade all dependencies. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c13
Martin Wilck
It seems that udev worker whose PID=290 is stuck until the following kernel trace is emitted:
[ 26.132085] Orion-15 kernel: sd 6:0:0:0: [sdd] Attached SCSI removable disk
Yes, and strangely, it's stuck before it even starts doing anything.
I don't know why the kernel event happened long before sdd seems usable. Can this be due to a change brought by the new kernel ?
Between the generation of the ADD uevent and the "Attached SCSI removable disk" message, there's at least one sd_revalidate_disk() call. But that's definitely not new. I can see one change in sd_revalidate_disk() that came with v5.0, e4a056987 "scsi: sd: fix entropy gathering for most rotational disks". I don't know why it would matter, but once we know if it's udev or the kernel, we could try it.
Also I'm not sure where the udev worker is stuck... it should be in worker_lock_block_device()[1] either stuck at open() (although it's unlikely because O_NONBLOCK is passed) or at flock().
The SCSI device might be blocked, because it's not yet fully initialized after the ADD event. I haven't checked all details, but that might explain it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c14
--- Comment #14 from Martin Wilck
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c15
--- Comment #15 from Paul Tannington
Yes please also downgrade all dependencies.
With the downgraded packages: udev-239-5.1.x86_64 libudev1-239-5.1.x86_64 systemd-239-5.1.x86_64 systemd-sysvinit-239-5.1.x86_64 and using kernel: kernel-default-5.0.5-1.2.x86_64 I've booted/rebooted this system 12 times without seeing the issue. @Martin, OK, I'll revert udev/systemd to the current V241 and get back to you later today. Thanks guys. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c16
Paul Tannington
@Paul,
When the system hangs 17s during boot, could you please hit Alt-Sysrq-w and provide us the journalctl output?
It might help us figure out what's hanging.
This is so very weird... Having updated the udev/systemd packages back to 241-1.1 (still with the 5.0.5-1.2 kernel) upon reboot it was normal without the "a stop job is runing ..." I subsequently rebooted a couple of times and also did a few cold starts, on each occasion boot was normal. The only changes made were the downgrade to 239 followed a short time later by the upgrade back to 241 - prior to those I was seeing the delay on every boot. I'm sorry, I feel I've wasted your time. If in the course of normal use over the next few days it happens I'll use the Alt-SysRq-w sequence and attach the journal. Apologies once more and many thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c17
Martin Wilck
I'm sorry, I feel I've wasted your time.
If in the course of normal use over the next few days it happens I'll use the Alt-SysRq-w sequence and attach the journal.
Apologies once more and many thanks.
Weird indeed. No need to apologize. I'm closing this for now; re-open if you see the issue again. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c18
--- Comment #18 from Martin Wilck
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487
http://bugzilla.opensuse.org/show_bug.cgi?id=1129487#c19
--- Comment #19 from Paul Tannington
Btw if you are using btrfs, you could attempt to reproduce with a snapshot. If you feel like it, that is.
ext4 I'm afraid, otherwise I'd have gave it a go. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com