Bug ID | 1172541 |
---|---|
Summary | Mysterious crashes with kernel 5.6.14 |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | x86-64 |
OS | openSUSE Factory |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Kernel |
Assignee | kernel-bugs@opensuse.org |
Reporter | jimc@jfcarter.net |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
Version: kernel-default-5.6.14-1.1.x86_64 and aarch64 for Tumbleweed I did a dist-upgrade on 2020-06-02. After I waited for download.opensuse.org to recover from its illness, the upgrade went smoothly. But after I rebooted all the machines, 5 to 10 minutes later two of them went catatonic, and a third (my laptop) froze up a few minutes after being booted up two days later. After I power cycled the gateway machine, it again froze about five minutes after booting. On those hosts I reverted (by the grub menu) to kernel-default-5.6.12-1.3.x86_64 and aarch64. There were no further catatonic incidents. The other hosts continued (on 5.6.14) for up to 36 hours and did not go catatonic (but I eventually reverted them also). Here are the symptoms: On two machines I was working on the console (XFCE) when it happened. It seemed perfectly normal, but suddenly keystrokes had no effect and continuously updating apps (load meter etc.) ceased updating. On the gateway, network thru traffic was no longer forwarded. After I rebooted it I checked syslog (debug level). After the last normal message (e.g. DHCP lease renewed) there was a burst of \0's, and the dmesg dump appeared immediately after, starting with "microcode updated early to revision..." and "Jun 4 08:47:03 jacinth kernel: [ 0.000000] Linux version 5.6.12-1-default (geeko@buildhost) (gcc version 9.3.1 20200406 [revision 6db837a5288ee3ca5ec504fbd5a765817e556ac2] (SUSE Linux)) #1 SMP Tue May 12 17:44:12 UTC 2020 (9bff61b)", i.e. the very first expected messages from dmesg. (I carefully booted it into 5.6.12 not 5.6.14.) I have not been able to find any useful symptoms that might shed light on what is killing the machines. Machines running 5.6.12 did not crash, both before the dist-upgrade and after I reverted. For what it's worth, here is some data about the various machines. These went catatonic: jacinth Gateway, directory server, never sleeps. Intel NUC6CAYH, Celeron(R) CPU J3455 @ 1.50GHz, Intel HD Graphics 500 (i915). It runs hostapd as a wireless access point, driver 8812au from rtl8812au-kmp-default-5.6.4.2+git20200318.49e98ff_k5.6.12_1-1.7.x86_64 xena Laptop, wireless, powered off when the human is sleeping. Acer Aspire A515-54-51D1, Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz, Intel UHD Graphics 620 (Whiskey Lake) (i915) diamond Normal desktop. When it crashed nobody was using it. At night it hibernates. Intel NUC7i5BNH, Core(TM) i5-7260U CPU @ 2.20GHz, Intel Iris Plus Graphics 640 (i915) These survived up to 36 hours on 5.6.14 (called "non-catatonic" below): claude Webserver (1.5 hits/min). VM (KVM) on Jacinth, x86_64, video=cirrus holly Desktop replacement. Raspberry Pi-3B, Broadcom BCM2837 @1.2GHz and VideoCore IV (vc4). aarch64. RPi's can't sleep. iris Audio-video player (17kbyte/sec), hibernates when unused. Intel NUC6CAYH, Celeron(R) CPU J3455 @ 1.50GHz (like Jacinth), Intel HD Graphics 500 (i915) oso Development VM (KVM) on Diamond, x86_64, video=cirrus petra Development VM (KVM) on Xena, x86_64, video=cirrus surya Cloud server, VM (KVM) at Linode, never sleeps :-), Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, no GPU at all. Thinking that X-Windows activity might be correlated with catatonia, for about 6 hours I set the non-catatonic machines (except Surya, has no console) to show screensaver eye candy continuously. I was wrong; none of them crashed. Iris hardware is identical to Jacinth, yet it did not go catatonic while Jacinth failed more times than any other host. I don't really know what effective action could be taken, beyond trying to guess which kernel patch may have been the culprit. Anyway, thank you for whatever you can do with this info.