Re: systemd-vconsole-setup.service failure
Hit reply instead of reply all. See below. -- Tony On Tue, Jan 9, 2024, 3:10 PM Tony Walker <tony.walker.iu@gmail.com> wrote:
Thanks and good points, Martin!
TL;DR I submitted a bug report (https://bugzilla.opensuse.org/show_bug.cgi?id=1218623), but missed an existing one (https://bugzilla.opensuse.org/show_bug.cgi?id=1218618). Not surprisingly, the maintainers were ahead of me and my understanding .I assumed that Plymouth was holding onto the tty and preventing systemd-vconsole-setup from resetting the tty. A simple view of my fix is that it kills Plymouth if it takes too long. However, it appears that Plymouth may be leaving the tty in graphics mode.
Longer story...
However, I am not quite certain whether your solution is correct. At least we should try to understand why it works.
I am not sure either. ;-) Systemd and the boot process are outside my area of expertise. I started using Linux in the mid 90's and learned all about rc scripts. When Debian moved to systemd, I just went along for the ride and became a dinosaur who knew just enough to start and stop daemons. This seemed like a good excuse to learn, so please correct me.
plymouth gets special treatment from systemd. Rather than stopping "plymouth-start.service", systemd runs "plymouth-quit.service". The reason is AFAIR that the plymouth splash screen is supposed to keep being displayed while the system is switching root. For this reason, plymouthd uses "KillMode=none". I suppose if you replace this with "mixed", plymouthd will be stopped when the initramfs shuts down. Do you see any difference in the visual appearance of the boot screen after applying your changes?
Yes, there is a brief (~1-2s) view of the console before and after Plymouth. That is, this is a crude fix. My thinking was that there are other ways to fix that. Part of my reasoning is that Fedora uses the exact settings I proposed, so if I could find the time to dig into Fedora, I might find some other changes elsewhere. Of course, because Fedora sticks its head in a toilet, we don't have to. Still, I thought the Fedora folks would probably have a good implementation given their proximity to its developers and it might be good to see what they do.
Crudely killing Plymouth when initramfs is done is exactly what I intended. My understanding is that systemd-vconsole-setup is called by systemd-localed.service, localectl, etc. My thinking was that Plymouth was holding the tty while the next boot step was trying to configure it. A classic race condition (I'm a programmer not an admin, so I always go for synchronization first).
2) add "IgnoreOnIsolate=true" to the "[Unit]" section, and
The effect of this directive is unclear to me for the case at hand. The man page says "If true, this unit will not be stopped when isolating another unit". But this unit hasn't been stopped before your changes, because it had "KillMode=none" set. Is this really needed to solve the issue?
I honestly don't know. My testing says no. However, I saw this and thought it might fix a corner case that I never saw but thought might happen eventually. Basically, I am completely ignorant of what Plymouth does. I thought Plymouth might fork some other processes and, because I am ignorant, IgnoreOnIsolate=false might cause systemd to kill Plymouth due to timing. That is, I wanted to kill Plymouth but not too quickly.
When I looked at the Fedora (rawhide) package, I saw that they do use IgnoreOnIsolate=true. Again, I don't know and we don't need to do what Fedora does. However, it did make me feel better.
On my system, systemd-vconsole-setup.service runs successfully very early during initramfs processing. It prints the "device or resource busy" error when it's started again later, after switching root, probably because the plymouthd instance from the initrd is still running ("KillMode=none"). It's probably unnecessary to start systemd- vconsole-setup again after switching root, because the first invocation from the initramfs should have already set the fonts, keymaps, etc.
Good point. I saw that too and did not bother to look at the logs from my last Debian system to see if the same thing happens there. However, I suspect so. As I said above, my understanding is that systemd-vconsole-setup is called by systemd-localed.service, localectl, etc. As such, I would expect to see repeated service starts during boot while the tty is reconfigured for different stages. Again, this is outside of my expertise, so I may be wrong.
Actually, looking at my boot log, it is weird to see that systemd- vconsole.setup is started and stopped more than 10 times during a single boot, while it should really be sufficient to do this once and for all very early at system startup.
As I said above, my assumption is it will be restarted several times. I am very likely wrong.
This looks like a systemd issue to me. I suggest you open a bug so that it can be properly analyzed.
I searched for the bug and opened one: https://bugzilla.opensuse.org/show_bug.cgi?id=1218623. Unfortunately, I missed an existing one: https://bugzilla.opensuse.org/show_bug.cgi?id=1218618. Sorry! I did find similar reports: https://bugzilla.opensuse.org/show_bug.cgi?id=1215282 and https://bugzilla.opensuse.org/show_bug.cgi?id=943312. In those bug reports, it appears that Plymouth leaves the tty in graphics mode. My thinking was that killing Plymouth would do some "garbage collection" and reset the terminal if this is the root of my problem. Either way, it seemed I had a partial solution.
Feel free to teach me or ask me questions.
-- Tony Walker <tony.walker.iu@gmail.com> PGP Key @ https://tonywalker1.github.io/ or https://keys.openpgp.org/ 9F46 <https://keys.openpgp.org/9F46> D66D FF6C 182D A5AC 11E1 8559 98D1 7543 319C
On Tue, Jan 9, 2024 at 4:46 AM Martin Wilck <mwilck@suse.com> wrote:
On Sun, 2024-01-07 at 12:35 -0500, Tony Walker wrote:
I finally found some time to debug and find a solution...
The problem seems to be a race-condition caused by plymouth not terminating quickly enough.
Thanks for debugging this and finding a workaround. It has been bothering me, too, but I never took the time to take deeper look.
However, I am not quite certain whether your solution is correct. At least we should try to understand why it works.
To make the error with vconsole-setuo stop, I modified /usr/lib/systemd/system/plymouth-start.service as follows:
1) change "KillMode" from none to "mixed",
plymouth gets special treatment from systemd. Rather than stopping "plymouth-start.service", systemd runs "plymouth-quit.service". The reason is AFAIR that the plymouth splash screen is supposed to keep being displayed while the system is switching root. For this reason, plymouthd uses "KillMode=none". I suppose if you replace this with "mixed", plymouthd will be stopped when the initramfs shuts down. Do you see any difference in the visual appearance of the boot screen after applying your changes?
2) add "IgnoreOnIsolate=true" to the "[Unit]" section, and
The effect of this directive is unclear to me for the case at hand. The man page says "If true, this unit will not be stopped when isolating another unit". But this unit hasn't been stopped before your changes, because it had "KillMode=none" set. Is this really needed to solve the issue?
3) run "dracut -f".
I rebooted a few times and the errors were not generated.
Let me know if this works for you and I will submit a bug and patch.
On my system, systemd-vconsole-setup.service runs successfully very early during initramfs processing. It prints the "device or resource busy" error when it's started again later, after switching root, probably because the plymouthd instance from the initrd is still running ("KillMode=none"). It's probably unnecessary to start systemd- vconsole-setup again after switching root, because the first invocation from the initramfs should have already set the fonts, keymaps, etc.
Actually, looking at my boot log, it is weird to see that systemd- vconsole.setup is started and stopped more than 10 times during a single boot, while it should really be sufficient to do this once and for all very early at system startup.
This looks like a systemd issue to me. I suggest you open a bug so that it can be properly analyzed.
Regards, Martin
-- Tony Walker <tony.walker.iu@gmail.com> PGP Key @ https://tonywalker1.github.io/ or https://keys.openpgp.org/ 9F46 D66D FF6C 182D A5AC 11E1 8559 98D1 7543 319C
On Fri, Jan 5, 2024 at 6:58 PM Tony Walker <tony.walker.iu@gmail.com> wrote:
I saw this too. I haven't had time to debug it, but my hunch is a missing dependency in it's unit file. That would cause a race condition where vconsole-setup tries to start before its dependencies are running.
Hope this helps.
-- Tony
On Fri, Jan 5, 2024, 6:24 PM Joe Salmeri <jmscdba@gmail.com> wrote:
Yesterday I updated several TW machines and also TW vms from 20231120 to 20231228.
After rebooting I ran systemctl --failed which shows
UNIT LOAD ACTIVE SUB DESCRIPTION ? systemd-vconsole-setup.service loaded failed failed Virtual Console Setup
systemctl status systemd-vconsole-setup.service provides the details
× systemd-vconsole-setup.service - Virtual Console Setup Loaded: loaded (/usr/lib/systemd/system/systemd-vconsole-setup.service; static) Active: failed (Result: exit-code) since Thu 2024-01-04 17:10:27 EST; 23h ago Duration: 5h 2.219s Docs: man:systemd-vconsole-setup.service(8) man:vconsole.conf(5) Main PID: 1105 (code=exited, status=1/FAILURE) CPU: 5ms
systemd[1]: Starting Virtual Console Setup... systemd-vconsole-setup[1105]: No usable source console found: Device or resource busy systemd[1]: systemd-vconsole-setup.service: Main process exited, code=exited, status=1/FAILURE systemd[1]: systemd-vconsole-setup.service: Failed with result 'exit-code'. systemd[1]: Failed to start Virtual Console Setup.
Not sure what is causing the "No usable source console found: Device or resource busy" error but it happens on multiple TW machines and also on my TW vms.
systemctl restart systemd-vconsole-setup.service
Restarts the service with no errors or problems so wondering if something timewise is not ready during the boot process ?
Is this a known issue ?
-- Regards,
Joe
On 1/9/24 15:56, Tony Walker wrote:
Hit reply instead of reply all. See below.
-- Tony
On Tue, Jan 9, 2024, 3:10 PM Tony Walker <tony.walker.iu@gmail.com> wrote:
Thanks and good points, Martin!
TL;DR I submitted a bug report (https://bugzilla.opensuse.org/show_bug.cgi?id=1218623), but missed an existing one (https://bugzilla.opensuse.org/show_bug.cgi?id=1218618). Not surprisingly, the maintainers were ahead of me and my understanding .I assumed that Plymouth was holding onto the tty and preventing systemd-vconsole-setup from resetting the tty. A simple view of my fix is that it kills Plymouth if it takes too long. However, it appears that Plymouth may be leaving the tty in graphics mode.
Longer story...
> However, I am not quite certain whether your solution is correct. > At least we should try to understand why it works.
I am not sure either. ;-) Systemd and the boot process are outside my area of expertise. I started using Linux in the mid 90's and learned all about rc scripts. When Debian moved to systemd, I just went along for the ride and became a dinosaur who knew just enough to start and stop daemons. This seemed like a good excuse to learn, so please correct me.
> plymouth gets special treatment from systemd. Rather than stopping > "plymouth-start.service", systemd runs "plymouth-quit.service". The > reason is AFAIR that the plymouth splash screen is supposed to keep > being displayed while the system is switching root. For this reason, > plymouthd uses "KillMode=none". I suppose if you replace this with > "mixed", plymouthd will be stopped when the initramfs shuts down. Do > you see any difference in the visual appearance of the boot screen > after applying your changes?
plymouth-start.service probably should still use a different KillMode considering that systemd complains about the service at startup and since the message says that KillMode=none is deprecated and will eventually be removed. systemd[1]: /usr/lib/systemd/system/plymouth-start.service:15: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Yes, there is a brief (~1-2s) view of the console before and after Plymouth. That is, this is a crude fix. My thinking was that there are other ways to fix that. Part of my reasoning is that Fedora uses the exact settings I proposed, so if I could find the time to dig into Fedora, I might find some other changes elsewhere. Of course, because Fedora sticks its head in a toilet, we don't have to. Still, I thought the Fedora folks would probably have a good implementation given their proximity to its developers and it might be good to see what they do.
Crudely killing Plymouth when initramfs is done is exactly what I intended. My understanding is that systemd-vconsole-setup is called by systemd-localed.service, localectl, etc. My thinking was that Plymouth was holding the tty while the next boot step was trying to configure it. A classic race condition (I'm a programmer not an admin, so I always go for synchronization first).
> > 2) add "IgnoreOnIsolate=true" to the "[Unit]" section, and > > The effect of this directive is unclear to me for the case at hand. The > man page says "If true, this unit will not be stopped when isolating > another unit". But this unit hasn't been stopped before your changes, > because it had "KillMode=none" set. Is this really needed to solve the > issue?
I honestly don't know. My testing says no. However, I saw this and thought it might fix a corner case that I never saw but thought might happen eventually. Basically, I am completely ignorant of what Plymouth does. I thought Plymouth might fork some other processes and, because I am ignorant, IgnoreOnIsolate=false might cause systemd to kill Plymouth due to timing. That is, I wanted to kill Plymouth but not too quickly.
When I looked at the Fedora (rawhide) package, I saw that they do use IgnoreOnIsolate=true. Again, I don't know and we don't need to do what Fedora does. However, it did make me feel better.
> On my system, systemd-vconsole-setup.service runs successfully very > early during initramfs processing. It prints the "device or resource > busy" error when it's started again later, after switching root, > probably because the plymouthd instance from the initrd is still > running ("KillMode=none"). It's probably unnecessary to start systemd- > vconsole-setup again after switching root, because the first invocation > from the initramfs should have already set the fonts, keymaps, etc.
Good point. I saw that too and did not bother to look at the logs from my last Debian system to see if the same thing happens there. However, I suspect so. As I said above, my understanding is that systemd-vconsole-setup is called by systemd-localed.service, localectl, etc. As such, I would expect to see repeated service starts during boot while the tty is reconfigured for different stages. Again, this is outside of my expertise, so I may be wrong.
> Actually, looking at my boot log, it is weird to see that systemd- > vconsole.setup is started and stopped more than 10 times during a > single boot, while it should really be sufficient to do this once and > for all very early at system startup.
As I said above, my assumption is it will be restarted several times. I am very likely wrong.
> This looks like a systemd issue to me. I suggest you open a bug so that > it can be properly analyzed.
I searched for the bug and opened one: https://bugzilla.opensuse.org/show_bug.cgi?id=1218623. Unfortunately, I missed an existing one: https://bugzilla.opensuse.org/show_bug.cgi?id=1218618. Sorry! I did find similar reports: https://bugzilla.opensuse.org/show_bug.cgi?id=1215282 and https://bugzilla.opensuse.org/show_bug.cgi?id=943312. In those bug reports, it appears that Plymouth leaves the tty in graphics mode. My thinking was that killing Plymouth would do some "garbage collection" and reset the terminal if this is the root of my problem. Either way, it seemed I had a partial solution.
Feel free to teach me or ask me questions.
-- Tony Walker <tony.walker.iu@gmail.com> PGP Key @ https://tonywalker1.github.io/ or https://keys.openpgp.org/ 9F46 <https://keys.openpgp.org/9F46> D66D FF6C 182D A5AC 11E1 8559 98D1 7543 319C
On Tue, Jan 9, 2024 at 4:46 AM Martin Wilck <mwilck@suse.com> wrote: > > On Sun, 2024-01-07 at 12:35 -0500, Tony Walker wrote: > > I finally found some time to debug and find a solution... > > > > The problem seems to be a race-condition caused by plymouth not > > terminating quickly enough. > > Thanks for debugging this and finding a workaround. It has been > bothering me, too, but I never took the time to take deeper look. > > However, I am not quite certain whether your solution is correct. > At least we should try to understand why it works. > > > To make the error with vconsole-setuo > > stop, I modified /usr/lib/systemd/system/plymouth-start.service as > > follows: > > > > 1) change "KillMode" from none to "mixed", > > plymouth gets special treatment from systemd. Rather than stopping > "plymouth-start.service", systemd runs "plymouth-quit.service". The > reason is AFAIR that the plymouth splash screen is supposed to keep > being displayed while the system is switching root. For this reason, > plymouthd uses "KillMode=none". I suppose if you replace this with > "mixed", plymouthd will be stopped when the initramfs shuts down. Do > you see any difference in the visual appearance of the boot screen > after applying your changes? > > > 2) add "IgnoreOnIsolate=true" to the "[Unit]" section, and > > The effect of this directive is unclear to me for the case at hand. The > man page says "If true, this unit will not be stopped when isolating > another unit". But this unit hasn't been stopped before your changes, > because it had "KillMode=none" set. Is this really needed to solve the > issue? > > > 3) run "dracut -f". > > > > I rebooted a few times and the errors were not generated. > > > > Let me know if this works for you and I will submit a bug and patch. > > > > On my system, systemd-vconsole-setup.service runs successfully very > early during initramfs processing. It prints the "device or resource > busy" error when it's started again later, after switching root, > probably because the plymouthd instance from the initrd is still > running ("KillMode=none"). It's probably unnecessary to start systemd- > vconsole-setup again after switching root, because the first invocation > from the initramfs should have already set the fonts, keymaps, etc. > > Actually, looking at my boot log, it is weird to see that systemd- > vconsole.setup is started and stopped more than 10 times during a > single boot, while it should really be sufficient to do this once and > for all very early at system startup. > > This looks like a systemd issue to me. I suggest you open a bug so that > it can be properly analyzed. > > Regards, > Martin > > > > > > -- > > Tony Walker <tony.walker.iu@gmail.com> > > PGP Key @ https://tonywalker1.github.io/ or https://keys.openpgp.org/ > > 9F46 D66D FF6C 182D A5AC 11E1 8559 98D1 7543 319C > > > > On Fri, Jan 5, 2024 at 6:58 PM Tony Walker <tony.walker.iu@gmail.com> > > wrote: > > > > > > I saw this too. I haven't had time to debug it, but my hunch is a > > > missing dependency in it's unit file. That would cause a race > > > condition where vconsole-setup tries to start before its > > > dependencies are running. > > > > > > Hope this helps. > > > > > > -- Tony > > > > > > On Fri, Jan 5, 2024, 6:24 PM Joe Salmeri <jmscdba@gmail.com> wrote: > > > > > > > > Yesterday I updated several TW machines and also TW vms from > > > > 20231120 to > > > > 20231228. > > > > > > > > After rebooting I ran systemctl --failed which shows > > > > > > > > UNIT LOAD ACTIVE SUB > > > > DESCRIPTION > > > > ? systemd-vconsole-setup.service loaded failed > > > > failed > > > > Virtual Console Setup > > > > > > > > systemctl status systemd-vconsole-setup.service provides the > > > > details > > > > > > > > × systemd-vconsole-setup.service - Virtual Console Setup > > > > Loaded: loaded > > > > (/usr/lib/systemd/system/systemd-vconsole-setup.service; static) > > > > Active: failed (Result: exit-code) since Thu 2024-01-04 > > > > 17:10:27 > > > > EST; 23h ago > > > > Duration: 5h 2.219s > > > > Docs: man:systemd-vconsole-setup.service(8) > > > > man:vconsole.conf(5) > > > > Main PID: 1105 (code=exited, status=1/FAILURE) > > > > CPU: 5ms > > > > > > > > systemd[1]: Starting Virtual Console Setup... > > > > systemd-vconsole-setup[1105]: No usable source console found: > > > > Device or > > > > resource busy > > > > systemd[1]: systemd-vconsole-setup.service: Main process exited, > > > > code=exited, status=1/FAILURE > > > > systemd[1]: systemd-vconsole-setup.service: Failed with result > > > > 'exit-code'. > > > > systemd[1]: Failed to start Virtual Console Setup. > > > > > > > > Not sure what is causing the "No usable source console found: > > > > Device or > > > > resource busy" error but it happens on multiple TW machines and > > > > also on > > > > my TW vms. > > > > > > > > systemctl restart systemd-vconsole-setup.service > > > > > > > > Restarts the service with no errors or problems so wondering if > > > > something timewise is not ready during the boot process ? > > > > > > > > Is this a known issue ? > > > > > > > > -- > > > > Regards, > > > > > > > > Joe >
-- Regards, Joe
On Wed, 2024-01-10 at 14:36 -0500, Joe Salmeri wrote:
plymouth-start.service probably should still use a different KillMode considering that systemd complains about the service at startup and since the message says that KillMode=none is deprecated and will eventually be removed.
Well, as already noted, the point of this KillMode value is that plymouthd wants to survive the switch root step. If this doesn't work, the splash image will go away for a few seconds and the screen will show some text messages, which (I assume) will confuse and/or annoy plymouthd's target audience.
I am unsure if recent versions of systemd make this possible by some means other than "KillMode=none". Martin
On 1/10/24 15:36, Martin Wilck via openSUSE Factory wrote:
On Wed, 2024-01-10 at 14:36 -0500, Joe Salmeri wrote:
plymouth-start.service probably should still use a different KillMode considering that systemd complains about the service at startup and since the message says that KillMode=none is deprecated and will eventually be removed. Well, as already noted, the point of this KillMode value is that plymouthd wants to survive the switch root step. If this doesn't work, the splash image will go away for a few seconds and the screen will show some text messages, which (I assume) will confuse and/or annoy plymouthd's target audience.
I am unsure if recent versions of systemd make this possible by some means other than "KillMode=none".
Thanks Martin, appreciate your input and you would certainly know much better than I would :-) I just wanted to point out that the journal contained that message about the KilMode=none deprecation. I've seen that message for over a year ( and probably longer ) so maybe that's one reason it hasn't been removed yet. -- Regards, Joe
participants (3)
-
Joe Salmeri
-
Martin Wilck
-
Tony Walker