http://bugzilla.opensuse.org/show_bug.cgi?id=1202138 Bug ID: 1202138 Summary: PowerPC machine crashes frequently after upgrading to Leap 15.4 Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.4 Hardware: PowerPC-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: marius.kittler@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- After upgrading a PowerPC machine (the openQA worker qa-power8-4-kvm.qa.suse.de) from Leap 15.3 to Leap 15.4 that machine crashes frequently. It usually does not stay on for more than a few hours. Downgrading the machine (by rolling back to the last BTRFS snapshot with Leap 15.3) allows the machine to run stable again. Note that there's actually a second machine (the openQA worker qa-power8-5-kvm.qa.suse.de) that we managed to operate without crashes on Leap 15.4. Both machines seems very similar to me so I'm not sure whether one is crashing and the other one not. Note that in the first place both machines did not boot on Leap 15.4. It seemed to be stuck at some point and the kernel logged messages like ``` [ 197.877239][ C62] watchdog: BUG: soft lockup - CPU#62 stuck for 25s! [swapper/62:0]` quite frequently ``` quite frequently. So I added the kernel parameter `nmi_watchdog=0`. With this parameter both machines could boot on Leap 15.4 (but as mentioned only qa-power8-5 runs stable without crashing). So the kernel command line used is: ``` root=UUID=89ca2dff-86af-478b-8d4c-2a45ca689fd5 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 crashkernel=210M nmi_watchdog=0 ``` Not sure what other details could be relevant. Unfortunately the journal does not have any interesting message right before the crash. Via SOL I could once see a kernel panic being logged: ``` QA-Power8-4-kvm login: [ 365.807470][ T3923] EXT4-fs error (device sdb1) in ext4_free_inode:362: Corrupt filesystem [ 438.050890][ T94] Kernel panic - not syncing: corrupted stack end detected inside scheduler [ 438.051046][ T94] CPU: 16 PID: 94 Comm: ksof ``` (The filesystem error is likely just a symptom of the crashes.) Any advice what I could try? Maybe another kernel parameter? Maybe booting Leap 15.4 but with the kernel version from 15.3 (not sure how I'd do that, though - so any advice would be welcome if that idea sounds helpful)? By the way, that's `/proc/cpuinfo` on the problematic machine: ``` root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 crashkernel=210M martchus@QA-Power8-4-kvm:~> cat /proc/cpuinfo processor : 0 cpu : POWER8, altivec supported clock : 3857.000000MHz revision : 2.0 (pvr 004d 0200) [��� repeated 7 more times with processor 8, 16, 24, 32, 40, 48 and 56] timebase : 512000000 platform : PowerNV model : 8348-21C machine : PowerNV 8348-21C firmware : OPAL MMU : Hash ``` On the stable machine it looks very similar but it is actually a model with more processors: ``` processor : 0 cpu : POWER8 (raw), altivec supported clock : 3857.000000MHz revision : 2.0 (pvr 004d 0200) [��� repeated 13 more times with processor 8, 16, 24, ���] timebase : 512000000 platform : PowerNV model : 8335-GCA machine : PowerNV 8335-GCA firmware : OPAL MMU : Hash ``` That's the relevant ticket on the openQA infrastructure tracker (for additional context): https://progress.opensuse.org/issues/114565 -- You are receiving this mail because: You are the assignee for the bug.