[Bug 1208938] New: Boot fails with kernel 6.2.1-1-default
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 Bug ID: 1208938 Summary: Boot fails with kernel 6.2.1-1-default Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: martti.laaksonen@sci.fi QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 865281 --> http://bugzilla.opensuse.org/attachment.cgi?id=865281&action=edit kernel-6.2.1-1-default netconsole amdgpu crash After updating my TW system this morning (zypper ref; zypper dup) the system won't boot anymore with the latest kernel 6.2.1-1-default. The boot seems to start normally then the display goes to power save mode and the system seems to hang altogether. I wasn't even able to ssh to the system which to me suggests that it was really hung. Booting with the previous kernel 6.1.2-1-default works just fine. I managed to setup netconsole and with that I was able to see that it crashed when initializing(?) amdgpu. netconsole log is attached. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c1 --- Comment #1 from Martti Laaksonen <martti.laaksonen@sci.fi> --- Tested also with kernel 6.2.1-1-vanilla and 6.2.2-1.g62a3141-default (from Kernel:stable repo). Same result. Netconsole logs attached. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c2 --- Comment #2 from Martti Laaksonen <martti.laaksonen@sci.fi> --- Created attachment 865287 --> http://bugzilla.opensuse.org/attachment.cgi?id=865287&action=edit kernel-6.2.1-1-vanilla netconsole amdgpu crash -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c3 --- Comment #3 from Martti Laaksonen <martti.laaksonen@sci.fi> --- Created attachment 865288 --> http://bugzilla.opensuse.org/attachment.cgi?id=865288&action=edit kernel-6.2.2-1.g62a3141-default netconsole amdgpu crash -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c7 Benjamin Sabatini <sunscape1@hotmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sunscape1@hotmail.com --- Comment #7 from Benjamin Sabatini <sunscape1@hotmail.com> --- Just wanted to add that I also had this issue. The same kernel parameter allowed me to boot. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c11 Danil S <S48GS@hotmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |S48GS@hotmail.com --- Comment #11 from Danil S <S48GS@hotmail.com> --- kernel 6.2.4-1-default boot works without amd_iommu=off in kernel 6.2.2 it was not working without turning off iommu but with kernel 6.2.4-1-default - integrated AMD GPU does not work journalctl boot log: 21:23:49 localhost kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart 21:23:49 localhost kernel: amdgpu: sdma_bitmap: 3 21:23:49 localhost kernel: memmap_init_zone_device initialised 131072 pages in 0ms 21:23:49 localhost kernel: amdgpu: HMM registered 512MB device memory 21:23:49 localhost kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8 21:23:49 localhost kernel: kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c13 --- Comment #13 from Danil S <S48GS@hotmail.com> --- I was not correct in my previous comment. Everything work in kernel 6.2.4-1-default without bugs, include working AMD GPU. I had Nvidia driver installed and for some reason it was starting x11 on discrete GPU. error message kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8 kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors is still there, but everything works
May I ask you to test Kernel:HEAD
I tested - kernel 6.3.0-rc2-2.gaf164d0-default everything works same as in 6.2.4-1-default same error as above in boot log, but wayland/x11 starting/works on amd gpu I was not able to build/install Nvidia driver on 6.3.0: /tmp/selfgz11967/NVIDIA-Linux-x86_64-525.89.02/kernel/nvidia/nv-mmap.c: In function 'nvidia_mmap_numa': /tmp/selfgz11967/NVIDIA-Linux-x86_64-525.89.02/kernel/nvidia/nv-mmap.c:455:19: error: assignment of read-only member 'vm_flags' but its up to Nvidia I think, not related to this bugreport in kernel 6.2.4-1-default Nvidia driver works, so I just back to this kernel version. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c14 --- Comment #14 from Danil S <S48GS@hotmail.com> --- Comparing boot log from my previous kernel 6.0.0-1-default: https://paste.opensuse.org/pastes/454f677811ee from 6.2.4-1-default now: https://paste.opensuse.org/pastes/679ecfd8376f only difference is: Old: amdgpu: HMM registered 512MB device memory amdgpu: Topology: Add APU node [0x15d8:0x1002] kfd kfd: amdgpu: added device 1002:15d8 New: amdgpu: HMM registered 512MB device memory kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8 kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c15 --- Comment #15 from Martti Laaksonen <martti.laaksonen@sci.fi> --- (In reply to Jiri Slaby from comment #12)
May I ask you to test Kernel:HEAD [1] to see if it's upstream or backports-to-6.2 problem?
[1] https://download.opensuse.org/repositories/Kernel:/HEAD/standard/
Can confirm what Danil S already said. With kernel-default-6.3~rc2-2.1.gaf164d0 kfd error is still present, but otherwise everything seems to work more or less the same. [ 0.000000] bender kernel: Linux version 6.3.0-rc2-2.gaf164d0-default (geeko@buildhost) (gcc (SUSE Linux) 12.2.1 20230124 [revision 193f7e62815b4089dfaed4c2bd34fd4f10209e27], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.40.20230127-2) #1 SMP PREEMPT_DYNAMIC Mon Mar 13 14:08:49 UTC 2023 (af164d0) [ 0.000000] bender kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.0-rc2-2.gaf164d0-default root=/dev/mapper/system-root resume=/dev/system/swap mitigations=auto ... [ 9.289017] bender kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 9.289079] bender kernel: amdgpu: sdma_bitmap: 3 [ 9.316755] bender kernel: memmap_init_zone_device initialised 524288 pages in 4ms [ 9.316766] bender kernel: amdgpu: HMM registered 2048MB device memory [ 9.316852] bender kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8 [ 9.317049] bender kernel: kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors [ 9.317119] bender kernel: amdgpu 0000:0b:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c16 --- Comment #16 from Danil S <S48GS@hotmail.com> --- I think I found what not working, and I think it connected to this amd_iommu feature and this bugreport: Un-Suspend to RAM not working when AMD GPU used as main GPU, it was working before 6.2 kernel, I used it alot. on 6.2.4 kernel: Suspend works, system power off and does un-suspend but freezes and display just static frame on screen, no keyboard mouse working, numlock also not working. I tried to press "power button" and wait to see if system react and in log bottom at end it visible, so it look like system freeze on "using" amd-gpu I think. Suspend with Un-Suspend works in kernel 6.2.4 only when main GPU is Nvidia, not amd. only lines related to amd in log: ``` 13:11:22 home-danil kernel: amdgpu 0000:07:00.0: [drm] *ERROR* flip_done timed out 13:11:22 home-danil kernel: amdgpu 0000:07:00.0: [drm] *ERROR* [CRTC:67:crtc-0] commit wait timed out 13:11:12 home-danil kernel: amdgpu 0000:07:00.0: [drm] *ERROR* [CRTC:67:crtc-0] flip_done timed out ``` Log from - I pressed "suspend" system turned off, I pressed un-suspend log after I pressed un-suspend so system restart https://paste.opensuse.org/pastes/57f648c0e816 -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1208938 http://bugzilla.opensuse.org/show_bug.cgi?id=1208938#c17 --- Comment #17 from Danil S <S48GS@hotmail.com> --- Update: On kernel 6.2.8-1-default everything works, no bugs. Suspend/unsuspend also works with no problems on AMD. there is still same error: kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8 kernel: kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors but since everything works maybe its just placeholder idk sorry for flood -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com