[Bug 1207243] New: Update from 0800 UTC 230118 does not complete startup, kernel 6.1.4
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 Bug ID: 1207243 Summary: Update from 0800 UTC 230118 does not complete startup, kernel 6.1.4 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Major Priority: P5 - None Component: Maintenance Assignee: screening-team-bugs@suse.de Reporter: chris@twoten.is QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Both in normal and rescue mode the laptop (Dell XPS 15 7590) does not complete startup process. In normal mode there's no feedback visible, screen goes black, fan runs periodically but no apparent response from OS e.g. cannot switch tty, caps lock doesn't activate. In rescue mode the fail is similar but the screen remains visible; last message was tagged T390 and referenced nvme0 (system SSD). Failed boots appear to log possibly related lines: 2023-01-18T08:11:40.832204+00:00 linux-nyw8 kernel: [760085.543888][T12422] block nvme0n1: No UUID available providing old NGUID 2023-01-18T08:11:40.856207+00:00 linux-nyw8 kernel: [760085.568170][T12438] block nvme0n1: No UUID available providing old NGUID 2023-01-18T08:11:40.884204+00:00 linux-nyw8 kernel: [760085.594053][T12455] block nvme0n1: No UUID available providing old NGUID 2023-01-18T08:11:40.908217+00:00 linux-nyw8 kernel: [760085.617856][T12471] block nvme0n1: No UUID available providing old NGUID Booting to last used kernel (6.0.12) restores operation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 Andreas Stieger <Andreas.Stieger@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Maintenance |Kernel Assignee|screening-team-bugs@suse.de |kernel-bugs@opensuse.org Summary|Update from 0800 UTC 230118 |kernel 6.1.4 fails to boot: |does not complete startup, |nvme0n1: No UUID available |kernel 6.1.4 |providing old NGUID -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c1 Daniel Wagner <daniel.wagner@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |daniel.wagner@suse.com --- Comment #1 from Daniel Wagner <daniel.wagner@suse.com> --- This message is triggered when user space tries to read /sys/class/nvme/nvme0/nvme0n1/uuid Given that the normal boot process doesn't work indicates that systemd waits for a device to appear with a given uuid. Can you provide information on the nvme device? nvme id-ctrl /dev/nvme0 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c2 --- Comment #2 from Daniel Wagner <daniel.wagner@suse.com> --- FWIW, the nvme log is likely to be a red herring. The warning was there for a long time (since v4.12) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c3 --- Comment #3 from Chris Puttick <chris@twoten.is> --- Installed nvme-cli, nvme id-ctrl /dev/nvme0 output: NVME Identify Controller: vid : 0x1c5c ssvid : 0x1c5c sn : AD9BN4767109YBG55 mn : PC601 NVMe SK hynix 512GB fr : 80002111 rab : 4 ieee : ace42e cmic : 0 mdts : 6 cntlid : 0x1 ver : 0x10300 rtd3r : 0x7a120 rtd3e : 0x1e8480 oaes : 0x200 ctratt : 0 rrls : 0 cntrltype : 0 fguid : 00000000-0000-0000-0000-000000000000 crdt1 : 0 crdt2 : 0 crdt3 : 0 nvmsr : 0 vwci : 0 mec : 0 oacs : 0x17 acl : 3 aerl : 7 frmw : 0x16 lpa : 0x2 elpe : 255 npss : 4 avscc : 0x1 apsta : 0x1 wctemp : 359 cctemp : 360 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 edstt : 30 dsto : 1 fwug : 0 kas : 0 hctma : 0x1 mntmt : 273 mxtmt : 358 sanicap : 0x2 hmminds : 0 hmmaxd : 0 nsetidmax : 0 endgidmax : 0 anatt : 0 anacap : 0 anagrpmax : 0 nanagrpid : 0 pels : 0 domainid : 0 megcap : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x5f fuses : 0 fna : 0 vwc : 0x1 awun : 0 awupf : 0 icsvscc : 1 nwpc : 0 acwu : 0 ocfs : 0 sgls : 0 mnan : 0 maxdna : 0 maxcna : 0 subnqn : ioccsz : 0 iorcsz : 0 icdoff : 0 fcatt : 0 msdbd : 0 ofcs : 0 ps 0 : mp:6.3000W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- active_power_workload:- ps 1 : mp:2.4000W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- active_power_workload:- ps 2 : mp:1.9000W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- active_power_workload:- ps 3 : mp:0.0500W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- active_power_workload:- ps 4 : mp:0.0040W non-operational enlat:1000 exlat:9000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- active_power_workload:- -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c4 --- Comment #4 from Chris Puttick <chris@twoten.is> --- NB parsing the logs shows the nvme entry only occurred during the failed boots (but may still be a red herring) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c5 --- Comment #5 from Daniel Wagner <daniel.wagner@suse.com> --- If you didn't see it before, it means user space was not trying to read the uuid. Hmm, I think we need to figure out why first. Did you update just the kernel or also parts of the user space? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c6 --- Comment #6 from Daniel Wagner <daniel.wagner@suse.com> --- I just realize on my own system I also see these warnings, also running kernel 6.0.12-1-default. Could still be a red herring... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c7 --- Comment #7 from Chris Puttick <chris@twoten.is> --- I ran zypper ref && zypper dup for the first time since last year. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c8 --- Comment #8 from Daniel Wagner <daniel.wagner@suse.com> --- With 6.0 the printk_ratelimited() was replaced with dev_warn_ratelimited() and that seems also to make it appear. I didn't see this message in older logs neither. So this explains why suddenly we are seeing this message. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c9 --- Comment #9 from Daniel Wagner <daniel.wagner@suse.com> --- Would it possible just to upgrade the kernel and see if the regression is caused by the kernel? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c10 --- Comment #10 from Daniel Wagner <daniel.wagner@suse.com> --- Or do you run already the new user space with the older kernel successfully? In this case we know it's the kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c11 --- Comment #11 from Chris Puttick <chris@twoten.is> --- zypper dup shows no updates available in current running OS. All I did to restore operation was use the boot menu to select the previous kernel so assuming everything else is running latest. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c14 --- Comment #14 from Chris Puttick <chris@twoten.is> --- Rescue/maintenance/failsafe, whatever the menu item is named this year. Have just run zypper dup and got kernel 6.1.7 with the same outcome. 6.0.12 still boots as expected. I can't easily see log entries pertaining to the failed boots. journalctl only seems to have log entries for the current boot stored. /var/log/messages logs the shutdown before the restart but nothing else until the known good boot following the failed boot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c15 --- Comment #15 from Chris Puttick <chris@twoten.is> --- Have now persuaded journalctl to be in a sensible configuration, but it doesn't log anything particularly insightful the fail per se. Last line logged on failed boots (both normal and recovery) is systemd-journald[610]: Time spent on flushing to /var/log/journal/ on the normal boot process with the older kernel, this line is followed with systemd-journald[605]: Received client request to flush runtime journal. systemd-journald[605]: File /var/log/journal/84403c8f9fc24658abd0248aa7f8cc73/system.journal corrupted or uncleanly shut down, renaming and replacing. kernel: thermal LNXTHERM:00: registered as thermal_zone10 kernel: ACPI: thermal: Thermal Zone [THM] (25 C) kernel: intel-lpss 0000:00:15.0: enabling device (0000 -> 0002) kernel: mc: Linux media interface: v0.10 kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit kernel: mei_me 0000:00:16.0: enabling device (0000 -> 0002) kernel: ACPI: bus type thunderbolt registered I took a pic of the lines showing on the failed boot in recovery mode and will attach it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c16 --- Comment #16 from Chris Puttick <chris@twoten.is> --- Created attachment 864312 --> http://bugzilla.opensuse.org/attachment.cgi?id=864312&action=edit photo of screen following crashed start up sequence -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c17 --- Comment #17 from Daniel Wagner <daniel.wagner@suse.com> --- This looks like the system is stuck switching to the desktop environment. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c18 --- Comment #18 from Chris Puttick <chris@twoten.is> --- This problem has persisted through multiple updates, only the 6.0.12 kernel completes starting up - if it's not kernel related as such, what do I need to do to identify the actual cause (and why does it boot ok with 6.0.12)? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1207243 http://bugzilla.opensuse.org/show_bug.cgi?id=1207243#c19 --- Comment #19 from Daniel Wagner <daniel.wagner@suse.com> --- As user space is clearly starting (there systemd output), I think enabling the debug out verbose flags for the kernel and systemd would help a lot. And if possible try to start without the graphical environment, IIRC this a different systemd target for this. All of this can be set on the kernel command line. Though I don't remember this stuff off hand. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com