[Bug 1020327] New: [Build 20170116] openQA test fails in consoletest_finish
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327 Bug ID: 1020327 Summary: [Build 20170116] openQA test fails in consoletest_finish Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other URL: http://openqa.opensuse.org/tests/335884/modules/consol etest_finish/steps/11 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KDE Workspace (Plasma) Assignee: opensuse-kde-bugs@opensuse.org Reporter: dimstar@opensuse.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- ## Observation openQA test in scenario opensuse-Tumbleweed-NET-x86_64-update_13.2@64bit fails in [consoletest_finish](http://openqa.opensuse.org/tests/335884/modules/consoletest_finish/steps/11) ## Reproducible Fails since (at least) Build [20170115](http://openqa.opensuse.org/tests/334735) ## Expected result Last good: [20170112](http://openqa.opensuse.org/tests/334127) (or more recent) ## Further details As far as I could debug this so far, KDE/Plasma ends up on tty8 instead of tty7, which cause this issue from showing the way it does. openQA assumes the X-session on tty7 when it switches between console mode and X mode. potential candidate that was updated in this snapshot round is systemd (not blaiming (yet), only guessing) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c1
--- Comment #1 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c2
--- Comment #2 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c3
Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c4
--- Comment #4 from Franck Bui
I did some investigation on this (based solely on log files though, haven't tried to reproduce it locally yet).
Before 232: Dec 30 05:26:27 linux-oikq display-manager[652]: Starting service kdm..done Dec 30 05:26:27 linux-oikq kdm[695]: plymouth is running Dec 30 05:26:27 linux-oikq kdm[695]: plymouth is active on VT 7, reusing for :0 Dec 30 05:26:27 linux-oikq kdm[695]: plymouth should quit after server startup
With 232: Jan 17 03:25:43 linux-oikq display-manager[1412]: Starting service kdm..done Jan 17 03:25:43 linux-oikq kdm[1507]: plymouth is running Jan 17 03:25:43 linux-oikq kdm[1507]: plymouth is running Jan 17 03:25:43 linux-oikq systemd[1]: Received SIGRTMIN+21 from PID 270 (plymouthd). Jan 17 03:25:43 linux-oikq kdm[1507]: plymouth is NOT running
The SIGRTMIN+21 is a signal from plymouth to systemd, only emitted during quit. However, kdm didn't tell plymouth to quit yet, which is only the case if it has not an active vt. And this is easily confirmed by looking at the video: Plymouth is missing!
I'm not sure to understand. Basically if KDM notices plymouth is active on tty7 then it reuses the terminal otherwise KDM selects tty8 instead of tty7 (which should be free). Could you explain the logic here ? Just in case, plymouth is required to quit once systemd reached "graphical.target". $ systemctl cat plymouth-quit.service # /usr/lib/systemd/system/plymouth-quit.service [Unit] ... Conflicts=graphical.target and display-manager.service uses Type=Forking (not sure why). So depending on how KDM is ordered and when exactly it's started during the boot, it might or not see plymouth running. This probably explains why we see different behaviors. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c5
Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c7
--- Comment #7 from Franck Bui
From that, there're a couple issues:
- why does plymouth-start.service wants systemd-vconsole-setup.service: I don't see the point as vconsole-setup will automatically configure any tty detected by udev ? - systemd-vconsole-setup.service is embedded in initramfs and this doesn't seem to be needed. - why does plymouth crash in this case (might be fixed in plymouth git repo as Fabian reported a better behavior on IRC) ? - Why does KDM select either tty7 or tty8 ? in both case tty7 seems to have been activated by plymouth ? I'll add the relevant logs. Cheers. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c8
--- Comment #8 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c9
--- Comment #9 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c10
--- Comment #10 from Fabian Vogt
Ok, here's my best bet:
Since systemd v232, systemd-vconsole-setup.service has no more the RemainAfterExit=true property.
Since plymouth-start.service has Wants=systemd-vconsole-setup.service, the later is now started several time within the initramfs: it's pulled in by initrd.target and by initrd-switch-root.target. And during the second time vconsole-setup is configuring tty7, which was probably already activated by plymouth-start.
This explains why cryptlvm succeeds, as systemd-vconsole-setup fails there and thus it does not confuse plymouth.
This has the bad effect to "confuse" plymouth as vconsole-setup is now configuring tty7 after plymouth started using it. This makes actually plymouth crashes.
The crashes may happen either right before switching to the rootfs or after. If the former then systemd will start it again after switching to the new rootfs otherwise it won't.
Now regarding KDM, not sure if it's expected but it selects:
- tty7 if plymouth is not running (IOW if it crashed after switching to new rootfs)
- tty8 if plymouth is running (IOW if plymouth crashed before switching to the new rootfs)
So far I haven't seen plymouthd crash during tests. Plymouth is running in *both* cases ("plymouth is running" is printed with systemd 228 and 232) and kdm selects tty8 because tty7 is blocked by plymouth but not active (as it got confused).
From that, there're a couple issues:
- why does plymouth-start.service wants systemd-vconsole-setup.service: I don't see the point as vconsole-setup will automatically configure any tty detected by udev ?
Probably due to ordering. While plymouth is running, the active ttys can't be configured. However, every tty has to be configured, so this needs to happen *before* plymouth starts.
- systemd-vconsole-setup.service is embedded in initramfs and this doesn't seem to be needed.
It is because of above. There are various bug reports about wrong keyboard layout if systemd-vconsole-setup fails during the initrd phase.
- why does plymouth crash in this case (might be fixed in plymouth git repo as Fabian reported a better behavior on IRC) ?
I will try that again with everything set to the default and plymouth from git master.
- Why does KDM select either tty7 or tty8 ? in both case tty7 seems to have been activated by plymouth ?
kdm does not get tty7 (or does not consider it as valid answer) when asking plymouth for its active vt. So it chooses to try the next one instead. This is good as using tty7 would probably fail here. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c11
--- Comment #11 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c12
--- Comment #12 from Franck Bui
The root cause is that systemd-vconsole-setup touches the same consoles more than once.
Not exactly. It seems to first configure tty1 and then tty7 however plymouth is already on it at this time. This is due to plymouth-start.service which has Wants=systemd-vconsole-setup.service *and* systemd-vconsole-setup.service having RemainsAfterExit=no (since v232). A possible fix might be: - remove Wants=systemd-vconsole-setup.service from plymouth-start.service. I'm not sure to understand the point to do so actually. If it's done by plymouth in order to configure the console which it's going to use (tty7) then that's probably wrong as the active console when vconsole-setup will be running will be a different console (tty1). - include in the initramfs /usr/lib/udev/rules.d/90-vconsole.rules This seems to be enough for running vconsole-setup on tty7 when plymouth activates (or switches to) it but before it actually uses it, so no crash happens in this case. But someone with knowledge in plymouth and in the console/tty stuff should double check. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c13
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c14
--- Comment #14 from Fabian Vogt
Ok, here's my best bet:
Since systemd v232, systemd-vconsole-setup.service has no more the RemainAfterExit=true property.
Confirmed. After changing that back it works fine. This proves that systemd-vconsole-setup configures consoles more than once, doesn't it?
- include in the initramfs /usr/lib/udev/rules.d/90-vconsole.rules This seems to be enough for running vconsole-setup on tty7 when plymouth activates (or switches to) it but before it actually uses it, so no crash happens in this case.
This is already in the initrd. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c15
--- Comment #15 from Franck Bui
(In reply to Franck Bui from comment #7)
Ok, here's my best bet:
Since systemd v232, systemd-vconsole-setup.service has no more the RemainAfterExit=true property.
Confirmed. After changing that back it works fine. This proves that systemd-vconsole-setup configures consoles more than once, doesn't it?
Indeed. But it seems that the second time systemd-vconsole-setup.service is started is not well defined and can happen after plymouthd started using the VT. And this confuses plymouthd somehow. OTOH I'm wondering if 90-vconsole.rules is not enough for initializing the console, IOW if there are any wrong side effects to drop systemd-vconsole-setup.service from initrd...
- include in the initramfs /usr/lib/udev/rules.d/90-vconsole.rules This seems to be enough for running vconsole-setup on tty7 when plymouth activates (or switches to) it but before it actually uses it, so no crash happens in this case.
This is already in the initrd.
Weird it was not present on one of the system I used for my testing purposes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c16
--- Comment #16 from Franck Bui
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c17
--- Comment #17 from Fabian Vogt
Ok let's try to make some progress...
systemd-setup-console.service is started several times during the early boot.
This was actually already the case with v228 where the service was started 3 times I think:
- 1 time started by plymouth-start.service due to a dependency Wants=systemd-setup-console.service used in the service file.
This is before plymouth.
- 2 times started by 90-vconsole.rules. I don't really understand why the rule is called a second time here but it seems that one of the vtconsole device is removed (after being added the first time) and then added back again right after. This happens during udev coldplug all devices (systemd-udev-trigger.service).
AFAIK this only configures the newly added devices, so this is always ok as well. This means that none of those invocations by v228 can affect plymouth, which is the correct behaviour.
Now with v232, systemd-setup-console.service has RemainsAfterExit=false. I don't know if this change is correct or not but it has the downside to start one more time systemd-setup-console.service lately during the early boot.
This is incorrect and needs to be changed back (or the service must not be started at all) AFAICS. Once a console got configured, it stays configured.
And this new console configuration seems to confuse plymouth for some reason and makes it crash.
Even if configuring the console several times is not nice, I don't think plymouth is supposed to crash in anyways (BTW I think there is already a bug open for the crash of plymouth) and this bug should be fixed.
@Fabian, in comment #11, you reported that plymouth git master (and also the version in Base:System) has a fix for preventing plymouth from crahsing.
So my suggestion here is to fix plymouth by either upgrading Base:System to git master or by identifying the fix and backport it to Factory.
I already requested that, zaitor had an update already prepared and now submitted it as sr#451329.
In the meantime I think we could also drop the Wants=systemd-setup-console.service from plymouth-start.service because setup-console is supposed to be done via a udev rule (which shouldhappen before plymouth is started).
@Fabian, WDYT ?
udev alone is not enough, otherwise the .service wouldn't be needed at all. When the service fails, bugs like 927250 and its three dups happen. (Or the udev behaviour is buggy, in which case the .service should be removed entirely, I guess) I'd proceed this way for now: - Change back RemainAfterExit in systemd - Update plymouth to sr#451329 - Submit both to factory And wait for the first few test runs to come back. If that works reliably, we can look into optimizing it further. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c18
--- Comment #18 from Franck Bui
(In reply to Franck Bui from comment #16)
Now with v232, systemd-setup-console.service has RemainsAfterExit=false. I don't know if this change is correct or not but it has the downside to start one more time systemd-setup-console.service lately during the early boot.
This is incorrect and needs to be changed back (or the service must not be started at all) AFAICS.
The service is *only* started because plymouth is pulling it.
Once a console got configured, it stays configured.
This assertion is false in both case (v228 and v232). The console is configured several times by udev because it receives a "add/remove/add" sequence for one console. And most important any application is not supposed to *crash*.
And this new console configuration seems to confuse plymouth for some reason and makes it crash.
Even if configuring the console several times is not nice, I don't think plymouth is supposed to crash in anyways (BTW I think there is already a bug open for the crash of plymouth) and this bug should be fixed.
@Fabian, in comment #11, you reported that plymouth git master (and also the version in Base:System) has a fix for preventing plymouth from crahsing.
So my suggestion here is to fix plymouth by either upgrading Base:System to git master or by identifying the fix and backport it to Factory.
I already requested that, zaitor had an update already prepared and now submitted it as sr#451329.
Good.
In the meantime I think we could also drop the Wants=systemd-setup-console.service from plymouth-start.service because setup-console is supposed to be done via a udev rule (which shouldhappen before plymouth is started).
@Fabian, WDYT ?
udev alone is not enough, otherwise the .service wouldn't be needed at all.
udev should be enough. The .service is mostly useless now (I think no other service is requiring it but plymouth).
When the service fails, bugs like 927250 and its three dups happen. (Or the udev behaviour is buggy, in which case the .service should be removed entirely, I guess)
I'd proceed this way for now:
- Change back RemainAfterExit in systemd
I'm not sure about this change at all and this should be discussed upstream first. But IMHO this not the (root) issue here it just exposes the shortcoming in plymouth.
- Update plymouth to sr#451329
It seems to me the real fix here: plymouth won't be crashing anymore. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c19
--- Comment #19 from Fabian Vogt
When the service fails, bugs like 927250 and its three dups happen. (Or the udev behaviour is buggy, in which case the .service should be removed entirely, I guess)
Can you comment on this^^^^?
I'd proceed this way for now:
- Change back RemainAfterExit in systemd
I'm not sure about this change at all and this should be discussed upstream first. But IMHO this not the (root) issue here it just exposes the shortcoming in plymouth.
AFAICS it's still wrong. Either systemd-vconsole-setup.service needs to go completely or set RemainAfterExit=true. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c20
--- Comment #20 from Franck Bui
(In reply to Franck Bui from comment #18)
When the service fails, bugs like 927250 and its three dups happen. (Or the udev behaviour is buggy, in which case the .service should be removed entirely, I guess)
Can you comment on this^^^^?
Sorry but I can't parse other bugs currently, my plate is full ;)
I'd proceed this way for now:
- Change back RemainAfterExit in systemd
I'm not sure about this change at all and this should be discussed upstream first. But IMHO this not the (root) issue here it just exposes the shortcoming in plymouth.
AFAICS it's still wrong. Either systemd-vconsole-setup.service needs to go completely or set RemainAfterExit=true.
The only reason I can see for the existence of systemd-vconsole-setup.service is that others (3rd party) services are mentioning it. And plymouth is one of those... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c29
--- Comment #29 from Fabian Vogt
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c30
--- Comment #30 from Franck Bui
So, I'm back after some further analysis on this.
Thanks a lot for investigating deeper in this.
The second one is 0001-let-it-become-a-real-daemon.patch which totally breaks console locking. With broken console locking you cannot guarantee for anything anymore and systemd-vconsole-setup breaks the console that plymouth uses, which ultimately led to the crash in plymouth's terminal keyboard driver.
Is the crash in plymouth's driver fixable ? With the broken patch removed, will systemd-vconsole-setup fail when opening the vtconsole if it has been locked by plymouth ?
The patch is AFAICS wrong as the reason it got introduced was a misconfiguration of systemd services (bsc#892526). Removal of this patch means however that systemd-vconsole-setup cannot configure the console while plymouth is running (which it never did with RemainAfterExit=true before v232) so we likely need the Wants= and After= for systemd-vconsole-setup.service in plymouth-start.service to not bring back console font/keymap issues.
Well I don't see any need to keep systemd-vconsole-setup.service but the vtconsole stuff is an obscure area to me so I may miss some useful use cases. If we agree on the fact that this service is unneeded I can open an issue upstream and ask to remove the service completely. If upstream doesn't agree then we could at least ask for the correctness of setting RemainAfterExists=no.
Franck's idea of removing the Wants= from plymouth-start.service worked because systemd-vconsole-setup.service didn't get pulled in from anything else and so never ran at all. However, issue #1 prevented it from working altogether in openQA.
Yeah I realized that this morning.
Now with 0001-let-it-become-a-real-daemon.patch removed, everything will work just fine *if* we make sure that plymouth gets stopped when needed (e.g. by YaST firstboot, X and other display servers).
Result of this is in https://build.opensuse.org/package/show/home:favogt:ply-hell/plymouth Please review and test!
I will give it a test. Thanks ! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c31
--- Comment #31 from Fabian Vogt
(In reply to Fabian Vogt from comment #29)
So, I'm back after some further analysis on this.
Thanks a lot for investigating deeper in this.
The second one is 0001-let-it-become-a-real-daemon.patch which totally breaks console locking. With broken console locking you cannot guarantee for anything anymore and systemd-vconsole-setup breaks the console that plymouth uses, which ultimately led to the crash in plymouth's terminal keyboard driver.
Is the crash in plymouth's driver fixable ?
As it is "just" an assert, the error could just be ignored. However, I consider simultaneous access to the VT as undefined behaviour as it causes all kind of weird issues. Just ignoring this would for instance allow systemd-vconsole-setup to switch the keyboard layout *while* entering the decryption password or changing the fontmap while displaying a text splash etc.
With the broken patch removed, will systemd-vconsole-setup fail when opening the vtconsole if it has been locked by plymouth ?
I don't know, a quick test with plymouth running on tty1 showed that it returns 0.
The patch is AFAICS wrong as the reason it got introduced was a misconfiguration of systemd services (bsc#892526). Removal of this patch means however that systemd-vconsole-setup cannot configure the console while plymouth is running (which it never did with RemainAfterExit=true before v232) so we likely need the Wants= and After= for systemd-vconsole-setup.service in plymouth-start.service to not bring back console font/keymap issues.
Well I don't see any need to keep systemd-vconsole-setup.service but the vtconsole stuff is an obscure area to me so I may miss some useful use cases.
If we agree on the fact that this service is unneeded I can open an issue upstream and ask to remove the service completely. If upstream doesn't agree then we could at least ask for the correctness of setting RemainAfterExists=no.
*RemainAfterExit=true AFAIK upstream rewrote the service (and binary) completely, so that may require some retesting with the newest version (which they'll probably demand anyway).
Franck's idea of removing the Wants= from plymouth-start.service worked because systemd-vconsole-setup.service didn't get pulled in from anything else and so never ran at all. However, issue #1 prevented it from working altogether in openQA.
Yeah I realized that this morning.
Now with 0001-let-it-become-a-real-daemon.patch removed, everything will work just fine *if* we make sure that plymouth gets stopped when needed (e.g. by YaST firstboot, X and other display servers).
Result of this is in https://build.opensuse.org/package/show/home:favogt:ply-hell/plymouth Please review and test!
I will give it a test.
Make sure to get at least rev 17, I did some changes in the patch to avoid DRM initialization delays.
Thanks !
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c32
--- Comment #32 from Franck Bui
Make sure to get at least rev 17, I did some changes in the patch to avoid DRM initialization delays.
So I gave it a test and Xorg was always started on tty7, however I could still see plymouth crashed: Jan 26 11:15:35 localhost systemd[1]: plymouth-start.service: Main process exited, code=dumped, status=6/ABRT I thought that should have been fixed by your changes but apparently this doesn't work as expected or I'm miss understood your changes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327
http://bugzilla.opensuse.org/show_bug.cgi?id=1020327#c33
--- Comment #33 from Fabian Vogt
(In reply to Fabian Vogt from comment #31)
Make sure to get at least rev 17, I did some changes in the patch to avoid DRM initialization delays.
So I gave it a test and Xorg was always started on tty7, however I could still see plymouth crashed:
Jan 26 11:15:35 localhost systemd[1]: plymouth-start.service: Main process exited, code=dumped, status=6/ABRT
I thought that should have been fixed by your changes but apparently this doesn't work as expected or I'm miss understood your changes.
Plymouth does not crash here, it always shows up successfully. Did you rebuild the initrd/use the correct packages, etc.? What's your setup for testing? I'm using a current TW (minimal X) with systemd-323 from Base:System and zypper dup'd to my branch. All this in a KVM VM with cirrus and qxl for testing. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com