On 5/22/21 1:20 PM, Kays wrote:
Hello everyone,

I am quite new to the opensuse community, I hope this is the right place to talk about this issue. Please let me know if it is not!

Since the beginning of last year when I switched to opensuse TW (from Debian testing/unstable) I repeatedly ran into issues with the NVIDIA proprietary drivers (that I was absolutely not used to coming from Debian) and I wanted to ask how to deal with these issues (fortunately apart from that TW has so far always been rock-solid!).

For me (and many others) the problems started with kernel 5.9 that broke CUDA. At that time I was quite new to openuse and rolling distros and therefore I made a couple of mistakes that wasted a lot of time. In particular I did not know how to get back to an older kernel or driver version. For some reason I noticed the problem only way after the introduction of 5.9 and – I don't quite remember that part – either I didn't have any pre-5.9 kernel present on the system anymore or for some reason the older kernel had another issue. Anyway I was stuck with 5.9 and couldn't get CUDA to work that I depended on (also why is CUDA not included in the TW repos?).
One thing that – as far as I'm aware – complicated things further is the fact that for some reason after a driver or kernel up- or downgrade the kernel modules of the NVIDIA driver were not correctly built for all of the installed kernels. Maybe there is also some misunderstanding on my side about how this works, but with my limited experience on opensuse I always ran into some problem (like version mismatch) when booting into an older kernel. Fortunately after a very long time of trying to figure out how to get out of this mess that I got myself into, finally the 5.9-compatible driver was released (but there also was some confusion about from which version on the fix was actually included) and I could check off that box, thinking that this was a once-in-a-lifetime problem due to a breaking change in the kernel that would most likely not reappear in the nearer future (also because I learned some tricks like how to use snapper and tumbleweed-cli and felt safe about potential future issues – great!).

Unfortunately I was wrong about this being a one-time thing. Now with kernel 5.12 and NVIDIA 460.80 I have a new issue, NVenc is completely broken. And again I only noticed the issue when it was too late (my bad). As far as I am aware the NVIDIA 460.80 driver was released on May 16 and my last snapper snapshot is from May 17 if I don't want to rollback to November 2020. I am not entirely sure if the kernel or the driver broke NVenc, but it doesn't matter because with kernels I have a similar story: I have only 2 versions of 5.12 installed and an old 5.8 version that for some reason will not boot. Seems like I'm stuck with a broken combination again.

End of rant/problem description. Of course I know that some measures that I could have taken very easily, like checking for malfunction immediately after the updates, would have saved me a lot of trouble. I will definitely be more careful in the future.

However I would still like to hear your thoughts about how to better deal with this issue. What are the best practices when dealing with new driver and kernel versions?
And when the system is already borked, what to do? Snapper has saved me a couple of times already, but aren't there any other solutions? In particular, I think it should be possible to install an older kernel from the TW repo history, but can I also do this for the NVIDIA driver? I'm not aware of any older version of the driver present in the repos. For me as a newcomes this really is a bit of a mess...

Thank you in advance for your thoughts and help!
Joe

Joe, I am an unsophisticated, long-term user of Tumbleweed with Nvidia, and haven't had significant problems with the driver for a long time.  (Knock on wood.)

* I installed the hard way.  https://en.opensuse.org/SDB:NVIDIA_the_hard_way .   I have read too many reports of problems with the nvidia repositories.

* I follow the reports of the webmaster at http://rglinuxtech.com/ , who has made it his business to keep track of compatibility issues between the nvidia drivers and new kernels.

* I download the drivers from: https://download.nvidia.com/XFree86/Linux-x86_64/ , and always expect to boot to terminal to # sh NVIDIA-Linux-x86-etc.run after every kernel update.  (Followed by mkinitrd). 

I am much less expert than other contributors to this mailing list, but my Tumbleweed-with-proprietary-nvidia works, and has worked for quite awhile with this approach. 

If the rglinuxtech site warns of problems, I have to take care, and may delay dup'ing to a new kernel.