[Bug 1174737] New: kernel-default-5.3.18-lp152.33.1.x86_64 hard lockup
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 Bug ID: 1174737 Summary: kernel-default-5.3.18-lp152.33.1.x86_64 hard lockup Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: novell@tower-net.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 840209 --> http://bugzilla.opensuse.org/attachment.cgi?id=840209&action=edit boot-no-kdump.log The latest kernel update kernel-default-5.3.18-lp152.33.1.x86_64 (33.1) produces hard lockup crashes after some minutes up to an hour of working. Can't see any error messages in the logs. Also when booting the kernel with enabled kdump, the kernel doesn't crash any longer. The older kernel package "kernel-default-5.3.18-lp152.26.2.x86_64" (26.2) works without problems (also without kdump enabled). I'm attaching 2 boot journal logs. One with enabled kdump, the other with disabled kdump which locked up. If I can provide some more information please ask. Thanks Markus -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c1 --- Comment #1 from Markus Kolb <novell@tower-net.de> --- Created attachment 840210 --> http://bugzilla.opensuse.org/attachment.cgi?id=840210&action=edit boot-kdump.log -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c2 Neil Rickert <nwr10cst-oslnx@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nwr10cst-oslnx@yahoo.com --- Comment #2 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- I had a system crash yesterday, as described here: https://lists.opensuse.org/opensuse/2020-07/msg00616.html I'm not finding relevant logs. I'm suspecting that it might be the same problem, but I cannot be sure. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c7 --- Comment #7 from Markus Kolb <novell@tower-net.de> --- I think it is fixed in this version https://download.opensuse.org/repositories/Kernel:/openSUSE-15.2/standard/x8... Up to now it isn't crashed. When there will be an official released version? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c8 --- Comment #8 from Markus Kolb <novell@tower-net.de> --- I've cheered too soon... it's happened once again, but with the 33.1-version it has been quite often. I try to collect a dump or some console output or anything in the next days... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c9 --- Comment #9 from Neil Rickert <nwr10cst-oslnx@yahoo.com> --- I had another system lockup/crash today. Once again, I was running a virtual machine (with Tumbleweed). And everything froze, including my main desktop (the host system on which the VM was running. On reboot, I went with kernel 5.3.18-lp152.26-default so that I can try to avoid another repeat. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c10 --- Comment #10 from Markus Kolb <novell@tower-net.de> --- Created attachment 840298 --> http://bugzilla.opensuse.org/attachment.cgi?id=840298&action=edit dmesg.txt Hi, I've got an oops in the dmesg.txt and a 130 MB vmcore with the vmlinux-5.3.18-lp152.93.gdeee35b-default.gz Should I upload it anywhere? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c11 --- Comment #11 from Markus Kolb <novell@tower-net.de> --- (In reply to Neil Rickert from comment #9)
I had another system lockup/crash today.
Hey Neil, do you also have Intel graphics based on the i915 driver? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c12 --- Comment #12 from Neil Rickert <nwr10cst-oslnx@yahoo.com> ---
do you also have Intel graphics based on the i915 driver?
Yes, I do. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c13 --- Comment #13 from Markus Kolb <novell@tower-net.de> --- Created attachment 840310 --> http://bugzilla.opensuse.org/attachment.cgi?id=840310&action=edit dmesg.txt The next dmesg.txt from a new kdump... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c18 --- Comment #18 from Markus Kolb <novell@tower-net.de> --- Created attachment 840387 --> http://bugzilla.opensuse.org/attachment.cgi?id=840387&action=edit lspci-v.txt (In reply to Thomas Zimmermann from comment #17) [...]
When you go for a coffee, do you suspend the screen? is the compile job still visible?
Yes, right, forgot about it, it is blank and sleep after 10min and display switch-off after 30min. Also brightness reduction is enabled. I'm trying to reproduce a crash when this is switched off.
Here it happened always during typing or clicking in VS Code, Firefox or Terminals, but that's over 90% of the use-case.
The stack traces point to the graphics driver. I can only guess that typing increases the number of pageflips, which then leads to the error.
Ok. Sounds reasonable.
Could you post the output of 'sudo lspci -v', please? Maybe it's related to a certain HW generation.
Attached... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c21 --- Comment #21 from Markus Kolb <novell@tower-net.de> --- I've used the 33.1 version (which seems to be the most crashy) yesterday the whole day with disabled display power management and no LID close (which would be S3 mode). At the end I switched on power management, waited for display off. And used the system for maybe 30 minutes without crash. Then I closed the LID for S3, opened it again for wake-up and could only work 5-10 minutes when a crash occurred on an enter key in VS Code. I've tried to find out if there is a concrete relation to S3 mode, but it is not reproducible with just S3 mode. So fresh boot, going to S3 and waking up is no guarantee for crashing in short time. Today I try to work with enabled display power management but no LID close, so no S3. I'll tell you if there is crash or no crash in this situation. Afterwards I'll install Takashi's development kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c22 --- Comment #22 from Markus Kolb <novell@tower-net.de> --- The crash problem still exists in Takashi's development kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c23 --- Comment #23 from Takashi Iwai <tiwai@suse.com> --- (In reply to Markus Kolb from comment #22)
The crash problem still exists in Takashi's development kernel.
OK, thanks. Have you tested the recent upstream kernel (e.g. 5.7.x) at all? It's find in OBS Kernel:stable repo. http://download.opensuse.org/repositories/Kernel:/stable/standard/ If it moved already to 5.8.x, the latest 5.7.x kernel is found in my kernel archive repo OBS home:tiwai:kernel:5.7, http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/ Ubuntu's bug report mentioned that the bug disappeared after 5.7.x, so this should have been confirmed in our side, too. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c24 --- Comment #24 from Markus Kolb <novell@tower-net.de> --- (In reply to Takashi Iwai from comment #23)
(In reply to Markus Kolb from comment #22)
The crash problem still exists in Takashi's development kernel.
OK, thanks.
Have you tested the recent upstream kernel (e.g. 5.7.x) at all? It's find in OBS Kernel:stable repo. http://download.opensuse.org/repositories/Kernel:/stable/standard/
If it moved already to 5.8.x, the latest 5.7.x kernel is found in my kernel archive repo OBS home:tiwai:kernel:5.7,
http://download.opensuse.org/repositories/home:/tiwai:/kernel:/5.7/standard/
Ubuntu's bug report mentioned that the bug disappeared after 5.7.x, so this should have been confirmed in our side, too.
In 5.8 from Kernel:/stable it doesn't crash. (But as a side-note virtualbox kernel modules doesn't compile. Works only with the virtualbox development branch at the moment.) And I'm pretty sure now that it is related to the power management! It doesn't crash in 5.3.18-33.1 if there was no S3 suspend. Although it lasts sometimes very long after the wake up, but it seems to be a requirement for the bug. The display power management settings doesn't make any difference. Didn't check any other suspend modes. I must have remembered it wrong with #c16. Now I'll check also your 5.7... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c25 --- Comment #25 from Markus Kolb <novell@tower-net.de> --- Created attachment 840475 --> http://bugzilla.opensuse.org/attachment.cgi?id=840475&action=edit bootlog-5.7.log In you 5.7.12-1.g9c98feb I always get a i915 0000:00:02.0: GPU HANG: ecode 8:1:85dffffb, in X [1282] -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c26 --- Comment #26 from Markus Kolb <novell@tower-net.de> --- Created attachment 840476 --> http://bugzilla.opensuse.org/attachment.cgi?id=840476&action=edit sys-class-drm-card0-error -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c28 --- Comment #28 from Markus Kolb <novell@tower-net.de> --- Created attachment 840503 --> http://bugzilla.opensuse.org/attachment.cgi?id=840503&action=edit Xorg.0.log.old intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20200313 I've no manual config for Xorg. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c30 --- Comment #30 from Markus Kolb <novell@tower-net.de> --- Hey Takashi, ;-) uname -a ; uptime Linux linux-zbgf 5.3.18-lp152.1.g1b51c72-default #1 SMP Mon Aug 10 17:46:58 UTC 2020 (1b51c72) x86_64 x86_64 x86_64 GNU/Linux 15:30:20 up 21:02, 1 user, load average: 0.14, 0.23, 0.51 I think good work. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174737 http://bugzilla.opensuse.org/show_bug.cgi?id=1174737#c31 --- Comment #31 from Takashi Iwai <tiwai@suse.com> --- Good to hear, at least it improves things, as it seems. Now I rebased to the latest SLE15-SP2 and pushed for merge. It'll be merged later once after the internal build test finishes. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com