Comment # 2 on bug 1172541 from
I installed kernel-default-5.7.0-3.1.gad96a07.x86_64 from OBS 
Kernel:stable.  Unfortunately the most revealing machines could not use
it: Xena uses EFI boot, you can't switch to "legacy", and the kernel's
signature is not acceptable to the EFI checker.  Jacinth is the Wi-Fi
access point and needs rtl8812au-kmp-default-5.6.4.2+$git.k$version, which
is not available for kernel 5.7.0.  Anyway the other machines have been
running about 24 hours with no catatonia.  I'll check back when the
production version is out.  

I was reading changelogs at 
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog- $version.  
5.6.13 and 5.6.16 had nothing relevant; 5.6.15 had a fix for i915 that
also seemed irrelevant,  5.6.14 had a hit:
Commit f18d3beac6bc2e1bddfba202f5327200acbda54c by Chris Wilson.
This is a fix (possibly gone awry) for the following issue and 2 others:
https://gitlab.freedesktop.org/drm/intel/-/issues/1746 OP Arnout Engelen 
("1 month ago", about 2020-05-(early))  "i915_gem_evict_something 
stalls".  He has kernel 5.7.0-rc1 and gets what sounds a lot like my 
problem including no useful log output.  He and Chris Wilson worked out
what kconfig items to turn on (rebuild kernel), and he got 400Mb of
log files which he gzipped and posted :-)  He cross references two 
seemingly relevant bugs from kernel 5.5.2 and 5.3.8.  I never saw the
actual changelog for 5.7.0-rc1.

Having got a hit for i915, and since I was becoming impatient reading
all the non-hits, I decided to limit the search to i915 only.  The
next one was filed under "5.7".
Commit e7cea7905815ac938e6e90b0cb6b91bcd22f6a15
Linus pulls a bunch of DRM fixes including i915 including something
from Chris Wilson closely related to the one above under 5.6.14.

I'm a little confused by the timing and version interlocking, but I'm
calling it like this: Arnout Engelen is running NixOS with a
self-compiled kernel, which was 5.7.0-rc1.  Chris Wilson came up with a
fix which landed in 5.6.14 and bit me.  I'm guessing that something was
wrong with the fix, and Chris Wilson improved it (I never saw the 
changelog stanza for this).  Linus eventually got it in time for 5.7.0
(post-rc1).

I hope I haven't fixated on a plausible but irrelevant issue, and the
real problem is in some of the changes that I skipped over.  

About the stack traceback: if I read the changelog right, in rare cases
it's possible for the i915 driver to get into an infinite loop looking
for tasks to evict from an assignment table.  That would explain sudden
catatonia.  But there's no "oops" and nothing is written to syslog or
dmesg.


You are receiving this mail because: