--- Comment #2 from James Carter firstname.lastname@example.org --- I installed kernel-default-5.7.0-3.1.gad96a07.x86_64 from OBS Kernel:stable. Unfortunately the most revealing machines could not use it: Xena uses EFI boot, you can't switch to "legacy", and the kernel's signature is not acceptable to the EFI checker. Jacinth is the Wi-Fi access point and needs rtl8812au-kmp-default-126.96.36.199+$git.k$version, which is not available for kernel 5.7.0. Anyway the other machines have been running about 24 hours with no catatonia. I'll check back when the production version is out.
I was reading changelogs at https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog- $version. 5.6.13 and 5.6.16 had nothing relevant; 5.6.15 had a fix for i915 that also seemed irrelevant, 5.6.14 had a hit: Commit f18d3beac6bc2e1bddfba202f5327200acbda54c by Chris Wilson. This is a fix (possibly gone awry) for the following issue and 2 others: https://gitlab.freedesktop.org/drm/intel/-/issues/1746 OP Arnout Engelen ("1 month ago", about 2020-05-(early)) "i915_gem_evict_something stalls". He has kernel 5.7.0-rc1 and gets what sounds a lot like my problem including no useful log output. He and Chris Wilson worked out what kconfig items to turn on (rebuild kernel), and he got 400Mb of log files which he gzipped and posted :-) He cross references two seemingly relevant bugs from kernel 5.5.2 and 5.3.8. I never saw the actual changelog for 5.7.0-rc1.
Having got a hit for i915, and since I was becoming impatient reading all the non-hits, I decided to limit the search to i915 only. The next one was filed under "5.7". Commit e7cea7905815ac938e6e90b0cb6b91bcd22f6a15 Linus pulls a bunch of DRM fixes including i915 including something from Chris Wilson closely related to the one above under 5.6.14.
I'm a little confused by the timing and version interlocking, but I'm calling it like this: Arnout Engelen is running NixOS with a self-compiled kernel, which was 5.7.0-rc1. Chris Wilson came up with a fix which landed in 5.6.14 and bit me. I'm guessing that something was wrong with the fix, and Chris Wilson improved it (I never saw the changelog stanza for this). Linus eventually got it in time for 5.7.0 (post-rc1).
I hope I haven't fixated on a plausible but irrelevant issue, and the real problem is in some of the changes that I skipped over.
About the stack traceback: if I read the changelog right, in rare cases it's possible for the i915 driver to get into an infinite loop looking for tasks to evict from an assignment table. That would explain sudden catatonia. But there's no "oops" and nothing is written to syslog or dmesg.