[Bug 1202727] New: Upstream kernel commit 30de14b1884ba makes s390 to stop working on qemu
https://bugzilla.suse.com/show_bug.cgi?id=1202727 Bug ID: 1202727 Summary: Upstream kernel commit 30de14b1884ba makes s390 to stop working on qemu Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: S/390-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: mpdesouza@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- If the following commit is reverted commit 30de14b1884ba609fc1acfba5b40309e3a6ccefe Author: Sven Schnelle <svens@linux.ibm.com> Date: Fri Apr 8 14:51:26 2022 +0200 s390: current_stack_pointer shouldn't be a function qemu can boot a s390 correct. The commit was added in 5.18-rc3. With this commit reverted qemu-kvm can boot v6.0-rc2 without any problem. I found this bug while trying to debug a livepatch issue: https://github.com/SUSE/qa_test_klp/issues/17 I enabled dyndebug while runnig the livepatch testing, but it didn't show anything useful for the bug: [ 863.108916] livepatch: enabling patch 'klp_tc_10_livepatch' [ 863.112892] livepatch: 'klp_tc_10_livepatch': starting patching transition [ 864.310918] livepatch: 'klp_tc_10_livepatch': patching complete [ 864.972595] livepatch: 'klp_tc_10_livepatch': starting unpatching transition [ 866.395047] livepatch: 'klp_tc_10_livepatch': unpatching complete [ 1003.993309] livepatch: enabling patch 'klp_tc_10_livepatch' [ 1003.993320] livepatch: 'klp_tc_10_livepatch': initializing patching transition [ 1003.997171] livepatch: 'klp_tc_10_livepatch': starting patching transition [ 1003.997396] livepatch: klp_try_switch_task: swapper/1:0 is running [ 1003.997435] livepatch: klp_try_switch_task: swapper/2:0 is running [ 1003.997467] livepatch: klp_try_switch_task: swapper/3:0 is running [ 1003.997485] livepatch: klp_try_switch_task: swapper/4:0 is running [ 1003.997504] livepatch: klp_try_switch_task: swapper/5:0 is running [ 1003.997509] livepatch: klp_try_switch_task: swapper/6:0 is running [ 1003.997542] livepatch: klp_try_switch_task: swapper/7:0 is running [ 1003.997575] livepatch: klp_try_switch_task: swapper/8:0 is running [ 1003.997610] livepatch: klp_try_switch_task: swapper/9:0 is running [ 1005.430335] livepatch: 'klp_tc_10_livepatch': completing patching transition [ 1005.430843] livepatch: 'klp_tc_10_livepatch': patching complete [ 1006.878375] livepatch: 'klp_tc_10_livepatch': initializing unpatching transition [ 1006.878421] livepatch: 'klp_tc_10_livepatch': starting unpatching transition [ 1006.878644] livepatch: klp_try_switch_task: swapper/1:0 is running [ 1006.878664] livepatch: klp_try_switch_task: swapper/2:0 is running [ 1006.878693] livepatch: klp_try_switch_task: swapper/3:0 is running [ 1006.878722] livepatch: klp_try_switch_task: swapper/4:0 is running [ 1006.878748] livepatch: klp_try_switch_task: swapper/5:0 is running [ 1006.878780] livepatch: klp_try_switch_task: swapper/6:0 is running [ 1006.878811] livepatch: klp_try_switch_task: swapper/7:0 is running [ 1006.878830] livepatch: klp_try_switch_task: swapper/8:0 is running [ 1006.878853] livepatch: klp_try_switch_task: swapper/9:0 is running [ 1008.390362] livepatch: 'klp_tc_10_livepatch': completing unpatching transition [ 1008.394954] livepatch: 'klp_tc_10_livepatch': unpatching complete I believe that maybe the two issues can be linked, due to the nature of the patch, which can have some impacts on ftrace. Any advice would be welcome. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mbenes@suse.com, | |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c1 --- Comment #1 from Marcos de Souza <mpdesouza@suse.com> --- I added the wrong error output. Now with the error: (none):/home/kgrtst/mpdesouza/qa_test_klp # ./run.sh == Patch caller of graph traced callee == [14:28:06] Test Case 10: Patch caller of graph traced callee [14:28:06] *** Compiling kernel live patch [14:28:07] make: Entering directory '/home/kgrtst/mpdesouza/linux' [14:28:26] CC [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.o [14:28:37] MODPOST /tmp/live-patch/tc_10/Module.symvers [14:28:38] CC [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.mod.o [14:28:42] LD [M] /tmp/live-patch/tc_10/klp_tc_10_livepatch.ko [14:28:42] make: Leaving directory '/home/kgrtst/mpdesouza/linux' [14:28:42] *** Compile test support module [14:28:42] make: Entering directory '/home/kgrtst/mpdesouza/linux' [14:29:03] CC [M] /tmp/live-patch/tc_10/klp_test_support_mod.o [14:29:14] MODPOST /tmp/live-patch/tc_10/Module.symvers [14:29:14] CC [M] /tmp/live-patch/tc_10/klp_test_support_mod.mod.o [14:29:18] LD [M] /tmp/live-patch/tc_10/klp_test_support_mod.ko [14:29:19] make: Leaving directory '/home/kgrtst/mpdesouza/linux' [14:29:19] *** Load test support module [ 246.523106] klp_test_support_mod: loading out-of-tree module taints kernel. [14:29:19] *** Enable graph tracing for orig_do_sleep [14:29:19] *** Starting uninterruptible sleeper [14:29:19] *** Inserting live patch [ 247.204554] klp_tc_10_livepatch: tainting kernel with TAINT_LIVEPATCH [ 247.205261] livepatch: enabling patch 'klp_tc_10_livepatch' [ 247.209206] livepatch: 'klp_tc_10_livepatch': starting patching transition [14:29:20] *** Check that live patch is blocked [ 248.318683] livepatch: 'klp_tc_10_livepatch': patching complete [14:29:21] TEST CASE ABORT patching finished prematurely [14:29:21] *** Removing patches [ 249.012780] livepatch: 'klp_tc_10_livepatch': starting unpatching transition [14:29:21] *** Disabling and removing module klp_tc_10_livepatch [ 250.402688] livepatch: 'klp_tc_10_livepatch': unpatching complete [14:29:24] *** Removing module klp_test_support_mod [14:29:24] rmmod: ERROR: Module klp_test_support_mod is in use -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c2 Marcos de Souza <mpdesouza@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |claudio.fontana@suse.com, | |ptesarik@suse.com --- Comment #2 from Marcos de Souza <mpdesouza@suse.com> --- Adding Petr and Claudio -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c3 --- Comment #3 from Claudio Fontana <claudio.fontana@suse.com> --- Hi Marcos, does this problem affect bare metal also, or does this only show up with QEMU/kvm? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c4 --- Comment #4 from Marcos de Souza <mpdesouza@suse.com> --- (In reply to Claudio Fontana from comment #3)
Hi Marcos,
does this problem affect bare metal also, or does this only show up with QEMU/kvm?
Unfortunately, I couldn't test on bare metal. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c5 --- Comment #5 from Marcos de Souza <mpdesouza@suse.com> --- To exemplify, I'm testing on a s390x machine. Using the current v6.0-rc4: commit 7e18e42e4b280c85b76967a9106a13ca61c16179 (HEAD, tag: v6.0-rc4, origin/master, origin/HEAD) Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Sep 4 13:10:01 2022 -0700 Linux 6.0-rc4 And executing qemu-system-s390x, it does not show any output: qemu-system-s390x -kernel arch/s390/boot/vmlinux -nographic But, if the problematic commit (30de14b1884ba609fc1acfba5b40309e3a6ccefe) is reverted on top of 6.0-rc4, it panics as expected: commit 82364b4f5173954e5e6d324531a950e3b898db09 (HEAD) Author: Marcos <mpdesouza@suse.com> Date: Mon Sep 5 17:45:27 2022 +0000 Revert "s390: current_stack_pointer shouldn't be a function" This reverts commit 30de14b1884ba609fc1acfba5b40309e3a6ccefe. qemu-system-s390x -kernel arch/s390/boot/vmlinux -nographic KASLR disabled: CPU has no PRNG [ 0.422349] Linux version 6.0.0-rc4-00001-g82364b4f5173 (kgrtst@s390zlpa) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18) #36 SMP Mon Sep 5 17:51:20 UTC 2022 [ 0.423773] setup: Linux is running under KVM in 64-bit mode [ 0.462163] setup: The maximum memory size is 128MB [ 0.462679] setup: Relocating AMODE31 section of size 0x00003000 [ 0.464744] cpu: 1 configured CPUs, 0 standby CPUs ... I hope this helps to explain what is the problem and how to reproduce it. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c6 --- Comment #6 from Marcos de Souza <mpdesouza@suse.com> --- One more interesting piece of info is that on the s390x machine the qemu version is 3.1.1.1. But, on my x86 laptop with openSUSE Tumbleweed and qemu 7.0.0, I can reproduce the same success and failure when using qemu-system-s390x. I believe this is not a problem of qemu itself, but as I don't have a machine to install the problematic kernel and boot, I can't be sure. Any help to debug the issue is welcomed. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c7 --- Comment #7 from Marcos de Souza <mpdesouza@suse.com> --- More info: The kernel is compiled with gcc 7.5.0, from SLE 15-SP1 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c8 --- Comment #8 from Petr Tesa����k <ptesarik@suse.com> --- Let me think... One difference between the inline function and a global register value is that the function contained an "asm volatile", so the compiler was not allowed to re-order execution. Is KLP using the value of "current_stack_pointer" in a context that might change %r15 by any chance? If that's the case, you may have to save the stack pointer into another variable like this: const unsigned long current_sp = current_stack_pointer The "const" qualifier may be needed to tell the compiler that the value is initialized once and never changed. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c9 Marcos de Souza <mpdesouza@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(ptesarik@suse.com | |) --- Comment #9 from Marcos de Souza <mpdesouza@suse.com> --- (In reply to Petr Tesa����k from comment #8)
Let me think...
One difference between the inline function and a global register value is that the function contained an "asm volatile", so the compiler was not allowed to re-order execution. Is KLP using the value of "current_stack_pointer" in a context that might change %r15 by any chance?
If that's the case, you may have to save the stack pointer into another variable like this:
const unsigned long current_sp = current_stack_pointer
The "const" qualifier may be needed to tell the compiler that the value is initialized once and never changed.
But do you see a reason for that commit to make the boot process stuck? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c10 Petr Tesa����k <ptesarik@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED Assignee|kernel-bugs@opensuse.org |ptesarik@suse.com Flags|needinfo?(ptesarik@suse.com | |) | --- Comment #10 from Petr Tesa����k <ptesarik@suse.com> --- (In reply to Marcos de Souza from comment #9)
But do you see a reason for that commit to make the boot process stuck?
Of course, something similar might also be in the boot path. However, I misunderstood this bug. I saw "livepatch" in the log dump, so I thought it was affecting only KLP. If normal boot is also broken, then let me debug it there. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c11 Petr Tesa����k <ptesarik@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |IN_PROGRESS --- Comment #11 from Petr Tesa����k <ptesarik@suse.com> --- I was able to reproduce a boot crash with kernel 6.0.0-rc6-1.g2132e28-default (from Kernel:HEAD), even when running under QEMU. However, as soon as I attached a debugger, the kernel booted fine. So, I have configured QEMU to save a crash dump, but I am unable to process the dump file, because current crash does not like the debuginfo (DWARF version too new). I'm afraid that fixing that would require an upgrade of the embedded gdb, which is a non-trivial task. It's getting a bit frustrating at this point... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c12 --- Comment #12 from Petr Tesa����k <ptesarik@suse.com> --- Hm, the issue seems to be more complicated. I installed a new Tumbleweed QEMU VM in a SLES15 SP4 z/VM system. The 6.0.0-rc6 kernel boots fine there! For reference, this is qemu-6.2.0-150400.37.5.3. That means, I can currently reproduce the failure only in a fully emulated QEMU VM. Unless someone else can reproduce it in a native environment, I suspect this is a QEMU bug. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c13 --- Comment #13 from Petr Tesa����k <ptesarik@suse.com> --- I was able to attach gdb to a crashed fully emulated VM (accel=tcg) and read the kernel log. It looks a bit weird to me: Linux version 6.0.0-rc6-1.g2132e28-default (geeko@buildhost) (gcc (SUSE Linux) 12.2.1 20220830 [revision e927d1cf141f221c5a32574bde0913307e140984], GNU ld (GNU Binutils; openSUSE:Factory:zSystems) 2.39.0.20220810-1) #1 SMP Sun Sep 18 20:58:57 UTC 2022 (2132e28) setup: Linux is running under KVM in 64-bit mode setup: The maximum memory size is 2048MB setup: Relocating AMODE31 section of size 0x00003000 cpu: 1 configured CPUs, 0 standby CPUs Write protected kernel read-only data: 16940k Kernel stack overflow. CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc6-1.g2132e28-default #1 openSUSE Tumbleweed (unreleased) 23ff4c3ab0b429314c21910f4f8cb21b0ff3e788 Hardware name: QEMU 2964 QEMU (KVM/Linux) Krnl PSW : 0400c00180000000 00000000006d020c (mpihelp_add_n+0xc/0x70) R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000000 0000000001262a28 ffffffff00000005 ffffffff00000005 0000000000004000 0000000000000000 00000000011abb08 0000000000000000 0000000000000002 0000000000000001 0000000000004000 000000007fbea000 0000000000000000 0000000000000000 00000000006d9668 0000000000000000 Krnl Code: 00000000006d01fe: 0707 bcr 0,%r7 00000000006d0200: c00400000000 brcl 0,00000000006d0200 #00000000006d0206: eb9bf0600024 stmg %r9,%r11,96(%r15) >00000000006d020c: 1355 lcr %r5,%r5 00000000006d020e: b90400b2 lgr %r11,%r2 00000000006d0212: b9140015 lgfr %r1,%r5 00000000006d0216: 1305 lcr %r0,%r5 00000000006d0218: b90400a1 lgr %r10,%r1 Call Trace: [<0000000000115824>] show_regs+0x54/0x90 [<0000000000101120>] kernel_stack_overflow+0x40/0x60 [<0000000000000200>] 0x200 Last Breaking-Event-Address: [<0000000000000000>] 0x0 Kernel panic - not syncing: Corrupt kernel stack, can't continue. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c14 --- Comment #14 from Petr Tesa����k <ptesarik@suse.com> --- Another update. My VM boots fine if the kernel is booted directly, using the -kernel and -initrd options to qemu-system-s390x. However, when I try to kexec the same kernel manually, that fails somewhere in the boot process (but not as early as with grub-emu). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c15 --- Comment #15 from Petr Tesa����k <ptesarik@suse.com> --- For reference, this is how the kexec attempt fails: [ 9.625911][ T119] Unable to handle kernel pointer dereference in virtual kernel address space [ 9.626024][ T119] Failing address: 6f2abf8710e68000 TEID: 6f2abf8710e68803 [ 9.626075][ T119] Fault in home space mode while using kernel ASCE. [ 9.626294][ T119] AS:0000000001828007 R3:0000000000000024 [ 9.626617][ T119] Oops: 0038 ilc:3 [#1] SMP [ 9.626770][ T119] Modules linked in: [ 9.626968][ T119] CPU: 0 PID: 119 Comm: systemd-gpt-aut Not tainted 6.0.0-rc6-1.g2132e28-default #1 openSUSE Tumbleweed (unreleased) 23ff4c3ab0b429314c21910f4f8cb21b0ff3e788 [ 9.627130][ T119] Hardware name: QEMU 2964 QEMU (KVM/Linux) [ 9.627244][ T119] Krnl PSW : 0404e00180000000 000000000040e320 (__mod_memcg_lruvec_state+0x30/0xe0) [ 9.627583][ T119] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 9.627805][ T119] Krnl GPRS: 0000000000000001 0000000000000130 0000000002ec5a40 0000000000000130 [ 9.627862][ T119] 0000000000000001 6f2abf8710e67235 0000000000000000 0000000008100073 [ 9.627913][ T119] 0700000002f3a200 000000000038d360 00000372000ba6c0 0000000000000001 [ 9.627963][ T119] 0000000002f3a200 0000000000000000 00000380001836e8 00000380001836b8 [ 9.628697][ T119] Krnl Code: 000000000040e312: b90400b4 lgr %r11,%r4 [ 9.628697][ T119] 000000000040e316: eb330003000d sllg %r3,%r3,3 [ 9.628697][ T119] #000000000040e31c: b9040013 lgr %r1,%r3 [ 9.628697][ T119] >000000000040e320: e31050100108 ag %r1,4112(%r5) [ 9.628697][ T119] 000000000040e326: e31003b80008 ag %r1,952 [ 9.628697][ T119] 000000000040e32c: e34010000008 ag %r4,0(%r1) [ 9.628697][ T119] 000000000040e332: e34010000024 stg %r4,0(%r1) [ 9.628697][ T119] 000000000040e338: e33020900008 ag %r3,144(%r2) [ 9.629286][ T119] Call Trace: [ 9.629710][ T119] [<000000000040e320>] __mod_memcg_lruvec_state+0x30/0xe0 [ 9.629940][ T119] [<000000000040f4c2>] __mod_lruvec_page_state+0xa2/0xe0 [ 9.629997][ T119] [<000000000012fcaa>] page_table_alloc+0x16a/0x290 [ 9.630051][ T119] [<000000000038d360>] __do_fault+0x80/0xf0 [ 9.630099][ T119] [<000000000039369c>] __handle_mm_fault+0xcfc/0x1240 [ 9.630157][ T119] [<0000000000393cae>] handle_mm_fault+0xce/0x220 [ 9.630199][ T119] [<0000000000129508>] do_exception+0x1d8/0x4d0 [ 9.630549][ T119] [<0000000000129e2a>] do_dat_exception+0x2a/0x50 [ 9.630672][ T119] [<0000000000a5ab54>] __do_pgm_check+0xf4/0x1b0 [ 9.630805][ T119] [<0000000000a6b64c>] pgm_check_handler+0x11c/0x170 [ 9.630868][ T119] [<00000000006faee4>] __clear_user+0x24/0x70 [ 9.630953][ T119] ([<00000000004b85c8>] load_elf_binary+0xc18/0x1a10) [ 9.631022][ T119] [<00000000004341b0>] bprm_execve+0x2a0/0x640 [ 9.631074][ T119] [<000000000043474e>] do_execveat_common.isra.0+0x1fe/0x270 [ 9.631125][ T119] [<0000000000434a7e>] __s390x_sys_execve+0x5e/0x70 [ 9.631175][ T119] [<0000000000a5ade4>] __do_syscall+0x1d4/0x200 [ 9.631224][ T119] [<0000000000a6b4c2>] system_call+0x82/0xb0 [ 9.631303][ T119] Last Breaking-Event-Address: [ 9.631343][ T119] [<000000000040f40c>] __mod_lruvec_state+0x4c/0x60 [ 9.632144][ T119] ---[ end trace 0000000000000000 ]--- [ 9.736079][ T116] Unable to handle kernel pointer dereference in virtual kernel address space [ 9.736181][ T116] Failing address: 0000000000000000 TEID: 0000000000000483 [ 9.736218][ T116] Fault in home space mode while using kernel ASCE. [ 9.736296][ T116] AS:0000000001828007 R3:000000007ffec007 S:000000007fff1800 P:000000000000003d [ 9.736473][ T116] Oops: 0004 ilc:3 [#2] SMP [ 9.736539][ T116] Modules linked in: [ 9.736632][ T116] CPU: 0 PID: 116 Comm: dracut-rootfs-g Tainted: G D 6.0.0-rc6-1.g2132e28-default #1 openSUSE Tumbleweed (unreleased) 23ff4c3ab0b429314c21910f4f8cb21b0ff3e788 [ 9.736696][ T116] Hardware name: QEMU 2964 QEMU (KVM/Linux) [ 9.736726][ T116] Krnl PSW : 0404d00180000000 000000000040f4f2 (__mod_lruvec_page_state+0xd2/0xe0) [ 9.736809][ T116] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 9.736870][ T116] Krnl GPRS: ffffffffffffffff 00000000042d5b00 0000000000000000 0000000000000026 [ 9.736908][ T116] 0000000000000001 00000372000b9f40 00000000044b1e00 0000000008000075 [ 9.736945][ T116] 0700000008100073 00000000003939d8 00000372000b9f40 000000007fffe1c0 [ 9.736981][ T116] 0000000002d62200 0000000000000000 0000038000443c40 0000038000443c10 [ 9.737104][ T116] Krnl Code: 000000000040f4e2: c0f4fffb0927 brcl 15,0000000000370730 [ 9.737104][ T116] 000000000040f4e8: ec51ffff00d9 aghik %r5,%r1,-1 [ 9.737104][ T116] #000000000040f4ee: a7f4ffc6 brc 15,000000000040f47a [ 9.737104][ T116] >000000000040f4f2: e3b020880024 stg %r11,136(%r2) [ 9.737104][ T116] 000000000040f4f8: a7f4ffe2 brc 15,000000000040f4bc [ 9.737104][ T116] 000000000040f4fc: 0707 bcr 0,%r7 [ 9.737104][ T116] 000000000040f4fe: 0707 bcr 0,%r7 [ 9.737104][ T116] 000000000040f500: c00400349b40 brcl 0,0000000000aa2b80 [ 9.737371][ T116] Call Trace: [ 9.737422][ T116] [<000000000040f4f2>] __mod_lruvec_page_state+0xd2/0xe0 [ 9.737490][ T116] [<000000000012fcaa>] page_table_alloc+0x16a/0x290 [ 9.737542][ T116] [<00000000003939d8>] __handle_mm_fault+0x1038/0x1240 [ 9.737589][ T116] [<0000000000393cae>] handle_mm_fault+0xce/0x220 [ 9.737632][ T116] [<0000000000129508>] do_exception+0x1d8/0x4d0 [ 9.737676][ T116] [<0000000000129e2a>] do_dat_exception+0x2a/0x50 [ 9.737719][ T116] [<0000000000a5ab54>] __do_pgm_check+0xf4/0x1b0 [ 9.737768][ T116] [<0000000000a6b64c>] pgm_check_handler+0x11c/0x170 [ 9.737816][ T116] Last Breaking-Event-Address: [ 9.737878][ T116] [<000000000040f4b8>] __mod_lruvec_page_state+0x98/0xe0 [ 9.737930][ T116] ---[ end trace 0000000000000000 ]--- [ 9.829692][ T118] Unable to handle kernel pointer dereference in virtual kernel address space [ 9.829814][ T118] Failing address: 0000000000000000 TEID: 0000000000000483 [ 9.829861][ T118] Fault in home space mode while using kernel ASCE. [ 9.829993][ T118] AS:0000000001828007 R3:000000007ffec007 S:000000007fff1800 P:000000000000003d [ 9.830118][ T118] Oops: 0004 ilc:3 [#3] SMP [ 9.830214][ T118] Modules linked in: [ 9.830265][ T118] CPU: 0 PID: 118 Comm: systemd-fstab-g Tainted: G D 6.0.0-rc6-1.g2132e28-default #1 openSUSE Tumbleweed (unreleased) 23ff4c3ab0b429314c21910f4f8cb21b0ff3e788 [ 9.830340][ T118] Hardware name: QEMU 2964 QEMU (KVM/Linux) [ 9.830373][ T118] Krnl PSW : 0404d00180000000 000000000040f4f2 (__mod_lruvec_page_state+0xd2/0xe0) [ 9.830458][ T118] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 9.830521][ T118] Krnl GPRS: ffffffffffffffff 0000000002eb9e40 0000000000000000 0000000000000026 [ 9.830562][ T118] 0000000000000001 00000372000b9c40 00000000044b3700 0000000008100073 [ 9.830599][ T118] 070003ff861e0000 00000000003939d8 00000372000b9c40 000000007fffe1c0 [ 9.830635][ T118] 0000000002f38000 0000000000000000 0000038000133c40 0000038000133c10 [ 9.830723][ T118] Krnl Code: 000000000040f4e2: c0f4fffb0927 brcl 15,0000000000370730 [ 9.830723][ T118] 000000000040f4e8: ec51ffff00d9 aghik %r5,%r1,-1 [ 9.830723][ T118] #000000000040f4ee: a7f4ffc6 brc 15,000000000040f47a [ 9.830723][ T118] >000000000040f4f2: e3b020880024 stg %r11,136(%r2) [ 9.830723][ T118] 000000000040f4f8: a7f4ffe2 brc 15,000000000040f4bc [ 9.830723][ T118] 000000000040f4fc: 0707 bcr 0,%r7 [ 9.830723][ T118] 000000000040f4fe: 0707 bcr 0,%r7 [ 9.830723][ T118] 000000000040f500: c00400349b40 brcl 0,0000000000aa2b80 [ 9.830999][ T118] Call Trace: [ 9.831035][ T118] [<000000000040f4f2>] __mod_lruvec_page_state+0xd2/0xe0 [ 9.831088][ T118] [<000000000012fcaa>] page_table_alloc+0x16a/0x290 [ 9.831136][ T118] [<00000000003939d8>] __handle_mm_fault+0x1038/0x1240 [ 9.831194][ T118] [<0000000000393cae>] handle_mm_fault+0xce/0x220 [ 9.831235][ T118] [<0000000000129508>] do_exception+0x1d8/0x4d0 [ 9.831276][ T118] [<0000000000129e2a>] do_dat_exception+0x2a/0x50 [ 9.831317][ T118] [<0000000000a5ab54>] __do_pgm_check+0xf4/0x1b0 [ 9.831362][ T118] [<0000000000a6b64c>] pgm_check_handler+0x11c/0x170 [ 9.831406][ T118] Last Breaking-Event-Address: [ 9.831431][ T118] [<000000000040f4b8>] __mod_lruvec_page_state+0x98/0xe0 [ 9.831757][ T118] ---[ end trace 0000000000000000 ]--- [ 99.908357][ T1] Unable to handle kernel pointer dereference in virtual kernel address space [ 99.908447][ T1] Failing address: 0000000000000000 TEID: 0000000000000483 [ 99.908488][ T1] Fault in home space mode while using kernel ASCE. [ 99.908682][ T1] AS:0000000001828007 R3:000000007ffec007 S:000000007fff1800 P:000000000000003d [ 99.909027][ T1] Oops: 0004 ilc:3 [#4] SMP [ 99.909229][ T1] Modules linked in: [ 99.909316][ T1] CPU: 0 PID: 1 Comm: systemd Tainted: G D 6.0.0-rc6-1.g2132e28-default #1 openSUSE Tumbleweed (unreleased) 23ff4c3ab0b429314c21910f4f8cb21b0ff3e788 [ 99.909411][ T1] Hardware name: QEMU 2964 QEMU (KVM/Linux) [ 99.909464][ T1] Krnl PSW : 0404d00180000000 000000000040f4f2 (__mod_lruvec_page_state+0xd2/0xe0) [ 99.909662][ T1] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 99.909749][ T1] Krnl GPRS: 0000000000000001 0000000003b7ba00 0000000000000020 0000000000000026 [ 99.909790][ T1] 0000000000000001 00000372000ba240 00000000044b0f00 0000000008100073 [ 99.909851][ T1] 0700000008100073 00000000003938f2 00000372000ba240 000000007fffe1c0 [ 99.909894][ T1] 0000000003a6c400 0000000000000000 000003800000bc40 000003800000bc10 [ 99.909985][ T1] Krnl Code: 000000000040f4e2: c0f4fffb0927 brcl 15,0000000000370730 [ 99.909985][ T1] 000000000040f4e8: ec51ffff00d9 aghik %r5,%r1,-1 [ 99.909985][ T1] #000000000040f4ee: a7f4ffc6 brc 15,000000000040f47a [ 99.909985][ T1] >000000000040f4f2: e3b020880024 stg %r11,136(%r2) [ 99.909985][ T1] 000000000040f4f8: a7f4ffe2 brc 15,000000000040f4bc [ 99.909985][ T1] 000000000040f4fc: 0707 bcr 0,%r7 [ 99.909985][ T1] 000000000040f4fe: 0707 bcr 0,%r7 [ 99.909985][ T1] 000000000040f500: c00400349b40 brcl 0,0000000000aa2b80 [ 99.910420][ T1] Call Trace: [ 99.910467][ T1] [<000000000040f4f2>] __mod_lruvec_page_state+0xd2/0xe0 [ 99.910551][ T1] [<000000000012fcaa>] page_table_alloc+0x16a/0x290 [ 99.910607][ T1] [<00000000003938f2>] __handle_mm_fault+0xf52/0x1240 [ 99.910658][ T1] [<0000000000393cae>] handle_mm_fault+0xce/0x220 [ 99.910720][ T1] [<0000000000129508>] do_exception+0x1d8/0x4d0 [ 99.910827][ T1] [<0000000000129e2a>] do_dat_exception+0x2a/0x50 [ 99.910873][ T1] [<0000000000a5ab54>] __do_pgm_check+0xf4/0x1b0 [ 99.910922][ T1] [<0000000000a6b64c>] pgm_check_handler+0x11c/0x170 [ 99.910969][ T1] Last Breaking-Event-Address: [ 99.910995][ T1] [<000000000040f4b8>] __mod_lruvec_page_state+0x98/0xe0 [ 99.911607][ T1] ---[ end trace 0000000000000000 ]--- [ 99.912219][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c16 --- Comment #16 from Petr Tesa����k <ptesarik@suse.com> --- So, to recap what I have tested: z/VM, boot by kexec: WORKS QEMU accel=kvm, direct boot: WORKS QEMU accel=kvm, boot by kexec: WORKS QEMU accel=kvm, boot by grub-emu: WORKS QEMU, accel=tcg, direct boot: WORKS QEMU, accel=tcg, boot by kexec: FAILS QEMU, accel=tcg, boot by grub-emu: FAILS Given that grub-emu uses kexec to boot the kernel, it's at least somewhat consistent... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c17 --- Comment #17 from Petr Tesa����k <ptesarik@suse.com> --- I have also rebuilt the kernel without the offending commit, but I was not able to boot it with kexec. FTR I was not able to kexec a rebuild of the kernel with the commit included either, so I'm probably building it incorrectly. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c18 --- Comment #18 from Miroslav Bene�� <mbenes@suse.com> --- There is commit e3c11025bcd2 ("s390: avoid using global register for current_stack_pointer") now. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c19 --- Comment #19 from Petr Tesa����k <ptesarik@suse.com> --- I believe a GCC bug can explain the weird nature of this bug, but not the failure to boot Tumbleweed under QEMU in TCG mode. AFAICS Tumbleweed kernels are built with gcc-12.2, which includes the fix from gcc commit 3ad7fed1cc87. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c20 --- Comment #20 from Marcos de Souza <mpdesouza@suse.com> --- (In reply to Miroslav Bene�� from comment #18)
There is commit e3c11025bcd2 ("s390: avoid using global register for current_stack_pointer") now.
When building with this commit applied, it works. I can boot the built kernel. My env: SLE15-SP1 gcc 7.5 I can sucessfully boot kernel v6.1-rc6, which has the commit applied. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 https://bugzilla.suse.com/show_bug.cgi?id=1202727#c21 --- Comment #21 from Petr Tesa����k <ptesarik@suse.com> --- (In reply to Marcos de Souza from comment #20)
(In reply to Miroslav Bene�� from comment #18)
There is commit e3c11025bcd2 ("s390: avoid using global register for current_stack_pointer") now.
When building with this commit applied, it works. I can boot the built kernel.
My env: SLE15-SP1 gcc 7.5
I can sucessfully boot kernel v6.1-rc6, which has the commit applied.
This is expected, because it effectively reverts the commit that you bisected earlier. However, the root cause must be different from what the explanation given in the upstream commit: Unfortunately on s390 it uncovers old gcc bug which is fixed only since gcc-9.1 [gcc commit 3ad7fed1cc87]. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 Ihno Krumreich <ihno@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High CC| |ihno@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 Ihno Krumreich <ihno@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jkosina@suse.com Flags| |needinfo?(jkosina@suse.com) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1202727 Ihno Krumreich <ihno@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|ptesarik@suse.com | -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com