--- Comment #4 from James Carter email@example.com --- I installed kernel-default-5.7.1-1.2.x86_64 (except aarch64 on one machine). The three machines that froze up on kernel-default-5.6.14-1.1.x86_64, also froze up on 5.7.1. I kept the other six on 5.7.1 for about 48 hours witn no freezeups. Again there is no trace of the problem in the logs, when it happens.
I have two machines (Intel NUC6CAYH, Celeron J3455, Intel HD Graphics 500, 2x4Gb RAM, Realtek RTL8111/8168/8411 NIC) with identical hardware, for a quick replacement if the more critical one fails: this is the main router, music library, directory server, etc. The other is a leaf node that does audio performance. The router is always the first to freeze up, whereas the leaf node has never frozen yet, despite substantial net traffic for the music. I suspect that hardware is irrelevant, but it's way too soon to blow off this possibility. I suspect that the roles exercise a vulnerable data path on the router, but there is no evidence of this either.
However, when I put 5.7.1 on my "non-failing" hosts I got a rash of connection failures. This is like chasing ghosts: here's the issue that I worked on first, since it was the most mission-critical. I don't seriously expect anyone to figure it out, but I'm getting it on record in case it provides a clue. My publicly exposed webserver is on a VM (running 5.7.1 on the Celeron). IPv4+6 connections from the wild side come into the main router (running 5.6.12) and get DNAT to the webserver. HTTP and HTTPS on IPv4 work. IPv6 from the wild side used to work, not now, but with the "comorbid" network issues IPv6 is too complicated to give any clues. A tester on the webserver VM attempts https://www.jfcarter.net/ and it times out, both IPv4 and IPv6. I did lots of troubleshooting including tcpdumps and reboots (into 5.7.1). The client would initiate a TLS connection with SNI, the server would send its cert which the client would accept as valid, but the server then just sat there, not sending a TLS session ticket. Finally I tried a scorched earth solution: rebooted all the leaf nodes and VMs into 5.6.12. The net issues miraculously vanished (not seen for 2 days), and specifically, web service was back to normal.
It looks like kernel 5.7.2 is now out. I'm going to install it on one vulnerable machine and see if it freezes. But in the meantime I'm going to learn to use git and I'll try to use the bisect feature to find the commit that messed it up. I wish there were a symptom that would appear immediately and left the machine alive enough that you could log in remotely and boot back to 5.6.12. But the world isn't arranged for our convenience.
If I ever accidentally uninstall 5.6.12 I'll have a real problem. Several times I've wanted to revert a bad update, but my only recourse was either to wait for SuSE to release a fixed version or to fix it myself, e.g. https://bugzilla.opensuse.org/show_bug.cgi?id=1172256 . The present issue (bug 1172541) finally got me moving to set up an archive of all RPMs installed on any of my machines, keeping no longer installed versions for a month or two, in case of reversion. Did I reimplement the wheel, that is, does SuSE have such an archive server? If not, maybe you should.