[Bug 1174204] New: API mismatch in NVIDIA driver after update to 450.57
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 Bug ID: 1174204 Summary: API mismatch in NVIDIA driver after update to 450.57 Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: X11 3rd Party Driver Assignee: gfx-bugs@suse.de Reporter: marix@marix.org QA Contact: sndirsch@suse.com Found By: --- Blocker: --- After applying the NVIDIA driver update to 450.57 I end up with an unsable NVIDIA driver and X.org falling back to 1024p with software rendering. Looking at the journal shows the following: NVRM: API mismatch: the client has the version 450.57, but NVRM: this kernel module has the version 440.100. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version. However, as I have in the meantime completely removed the driver, rebootet with Nouveau–which I am also using to write this–and re-installed the driver it is completely unclear to me where the old kernel module should come from. Zypper also claims that all my packages are on the same version: S | Name | Typ | Version | Arch | Repository ---+----------------------------+------------+-------------------------------------+--------+------------------------ | nvidia-computeG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-computeG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-firmware-installer | Paket | 1.1-lp152.1.1 | noarch | hardware | nvidia-firmware-installer | Quellpaket | 1.1-lp152.1.1 | noarch | hardware | nvidia-gfxG04-kmp-default | Paket | 390.138_k5.3.18_lp152.19-lp152.14.1 | x86_64 | nVidia Graphics Drivers | nvidia-gfxG04-kmp-preempt | Paket | 390.138_k5.3.18_lp152.19-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-gfxG05-kmp-default | Paket | 450.57_k5.3.18_lp152.19-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-gfxG05-kmp-preempt | Paket | 450.57_k5.3.18_lp152.19-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-glG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | nvidia-glG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers | nvidia-texture-tools | Paket | 2.0.8-lp152.3.9 | x86_64 | Haupt-Repository (OSS) | pcp-pmda-nvidia-gpu | Paket | 4.3.1-lp152.4.3 | x86_64 | Haupt-Repository (OSS) | skelcd-EULA-NVIDIA-compute | Paket | 2020.05.04-lp152.1.1 | x86_64 | Haupt-Repository (OSS) | x11-video-nvidiaG04 | Paket | 390.138-lp152.14.1 | x86_64 | nVidia Graphics Drivers i+ | x11-video-nvidiaG05 | Paket | 450.57-lp152.28.1 | x86_64 | nVidia Graphics Drivers I have tried explicitly running mkinitrd but this did not change the situation. One additional thing I ran into is that when removing the driver to switch to Nouveau, the system still behaves as before the uninstallation and lsmod will show the Nvidia driver still being loaded after reboot: nvidia_drm 53248 0 nvidia_modeset 1118208 1 nvidia_drm nvidia 20721664 1 nvidia_modeset ipmi_msghandler 69632 1 nvidia drm_kms_helper 229376 2 nvidia_drm,nouveau drm 544768 5 drm_kms_helper,nvidia_drm,ttm,nouveau Only explicitly invoking mkinitrd will actually cause the Nvidia driver not to be loaded on boot and provide me with a working Nouveau driver. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c1 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Status|NEW |IN_PROGRESS CC| |marix@marix.org Assignee|gfx-bugs@suse.de |sndirsch@suse.com Flags| |needinfo?(marix@marix.org) --- Comment #1 from Stefan Dirsch <sndirsch@suse.com> --- Seems the kernel module build of 450 failed or the 440 module is being preferred for some reason. I suggest to uninstall nvidia-gfxG05-kmp-default package, remove all remaining nvidia modules below /lib/modules: cd /lib/modules find . -name nvidia*.ko -print | xargs rm and then reinstall nvidia-gfxG05-kmp-default package. Check then this: find /lib/modules -name nvidia*.ko -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c3 Mark Scott <george.spiggott@talktalk.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |george.spiggott@talktalk.ne | |t --- Comment #3 from Mark Scott <george.spiggott@talktalk.net> --- Dear Stefan, I too have been hit with the same issue as above and you solution worked for me. Thanks -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 Valur Olafsson <valurolafsson@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |valurolafsson@gmail.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c4 Matthias Bach <marix@marix.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(marix@marix.org) | --- Comment #4 from Matthias Bach <marix@marix.org> --- Created attachment 839785 --> http://bugzilla.opensuse.org/attachment.cgi?id=839785&action=edit Result of nvidia-bug-report.sh Sadly this didn't fix the issue for me. One interesting thing I noted: Before removing the modules I had a /lib/modules//5.3.18-lp152.20.7-default/updates/nvidia.ko, along with many modules for LEap 15.1 and 15.2 kernels. After removing all modules and running the driver installation I have /lib/modules//5.3.18-lp152.19-default/updates/nvidia.ko. So I did actually have a module with a higher version number lying around. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c5 --- Comment #5 from James Rome <jamesrome@alum.mit.edu> --- I have this same issue. in 15.2. I get no graphics at all. e NVidia drivers got updated. Now I cannot activate them with # prime-select nvidia It says it cannot query the GPU. I uninstalled and reinstalled the packages, and prime-select still fails. Help please. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c6 James Rome <jamesrome@alum.mit.edu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamesrome@alum.mit.edu --- Comment #6 from James Rome <jamesrome@alum.mit.edu> --- Can we delete all the 4.4 and 4.12 files in /lib/modules? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c7 --- Comment #7 from James Rome <jamesrome@alum.mit.edu> --- (In reply to James Rome from comment #6)
Can we delete all the 4.4 and 4.12 files in /lib/modules?
And, I do not have an nvidia file in /lib/modules: drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.10-default drwxr-xr-x 1 root root 14 Oct 8 2018 4.12.14-lp150.12.13-default drwxr-xr-x 1 root root 14 Oct 16 2018 4.12.14-lp150.12.16-default drwxr-xr-x 1 root root 14 Nov 7 2018 4.12.14-lp150.12.19-default drwxr-xr-x 1 root root 14 Dec 15 2018 4.12.14-lp150.12.22-default drwxr-xr-x 1 root root 14 Jan 17 2019 4.12.14-lp150.12.25-default drwxr-xr-x 1 root root 14 Feb 19 2019 4.12.14-lp150.12.28-default drwxr-xr-x 1 root root 24 Aug 7 2018 4.12.14-lp150.12.4-default drwxr-xr-x 1 root root 14 Apr 12 2019 4.12.14-lp150.12.45-default drwxr-xr-x 1 root root 14 May 16 2019 4.12.14-lp150.12.48-default drwxr-xr-x 1 root root 14 May 27 2019 4.12.14-lp150.12.58-default drwxr-xr-x 1 root root 14 Jun 17 2019 4.12.14-lp150.12.61-default drwxr-xr-x 1 root root 14 Aug 18 2018 4.12.14-lp150.12.7-default drwxr-xr-x 1 root root 14 Sep 22 2019 4.12.14-lp151.28.10-default drwxr-xr-x 1 root root 14 Oct 10 2019 4.12.14-lp151.28.13-default drwxr-xr-x 1 root root 14 Oct 30 2019 4.12.14-lp151.28.16-default drwxr-xr-x 1 root root 14 Nov 13 2019 4.12.14-lp151.28.20-default drwxr-xr-x 1 root root 14 Dec 9 2019 4.12.14-lp151.28.25-default drwxr-xr-x 1 root root 14 Mar 8 10:06 4.12.14-lp151.28.32-default drwxr-xr-x 1 root root 14 Mar 25 18:30 4.12.14-lp151.28.36-default drwxr-xr-x 1 root root 14 Jul 16 2019 4.12.14-lp151.28.4-default drwxr-xr-x 1 root root 14 Apr 20 11:14 4.12.14-lp151.28.40-default drwxr-xr-x 1 root root 14 Jun 11 15:02 4.12.14-lp151.28.44-default drwxr-xr-x 1 root root 14 Jul 3 10:36 4.12.14-lp151.28.48-default drwxr-xr-x 1 root root 14 Jul 3 12:44 4.12.14-lp151.28.52-default drwxr-xr-x 1 root root 14 Aug 11 2019 4.12.14-lp151.28.7-default drwxr-xr-x 1 root root 278 Jul 30 2017 4.4.27-2-default drwxr-xr-x 1 root root 278 May 26 2018 4.4.76-1-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-default drwxr-xr-x 1 root root 292 Jul 16 12:53 5.3.18-lp152.19-preempt drwxr-xr-x 1 root root 462 Jul 16 12:53 5.3.18-lp152.20.7-default drwxr-xr-x 1 root root 292 Jul 15 18:23 5.3.18-lp152.20.7-preempt drwxr-xr-x 1 root root 484 Jul 16 12:53 5.3.18-lp152.26-default drwxr-xr-x 1 root root 314 Jul 15 18:19 5.3.18-lp152.26-preempt -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c8 --- Comment #8 from James Rome <jamesrome@alum.mit.edu> --- I wish this was editable. There are NVidia modules in /lib/modules/5.3.18-lp152.19-preempt/updates. But surely /lib/modules/5.3.18-lp152.26-preempt/updates would be newer, but nothing is there. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c9 --- Comment #9 from Matthias Bach <marix@marix.org> --- (In reply to Matthias Bach from comment #4)
Sadly this didn't fix the issue for me.
I just realised I failed. I only ran `find /lib/modules -name nvidia.ko -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c10 --- Comment #10 from Matthias Bach <marix@marix.org> --- (In reply to Matthias Bach from comment #9)
(In reply to Matthias Bach from comment #4)
Sadly this didn't fix the issue for me.
I just realised I failed. I only ran `find /lib/modules -name nvidia.ko -delete`. Will retry with `find /lib/modules -name nvidia.ko -delete`.
So doing this properly does fix the issue. Thanks! Still weird that I had /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though when the current package builds /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko which now gets linked from /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c11 --- Comment #11 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Matthias Bach from comment #10)
So doing this properly does fix the issue. Thanks!
Good!
Still weird that I had /lib/modules/5.3.18-lp152.20.7-default/updates/nvidia*.ko though
So I assume these were the 440.110 ones still, which weren't removed during uninstallation of old package for some reason.
when the current package builds /lib/modules/5.3.18-lp152.19-default/updates/nvidia*.ko
That's correct.
which now gets linked from /lib/modules/5.3.18-lp152.20.7-default/weak-updates/updates/nvidia*.ko.
That's how it is supposed to be. Create symlinks for all kernels sharing the same kABI. Our weak-updates concept. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c12 --- Comment #12 from Stefan Dirsch <sndirsch@suse.com> --- @James Rome Please follow instructions of comment#1. They make sure nothing is left below /lib/modules. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jamesrome@alum.mi | |t.edu) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c13 --- Comment #13 from James Rome <jamesrome@alum.mit.edu> --- Yes, using find /lib/modules -name nvidia*.ko -delete and removing and reinstalling the drivers fixed it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c14 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jamesrome@alum.mi | |t.edu) | --- Comment #14 from Stefan Dirsch <sndirsch@suse.com> --- Ok. So at least we have a workaround. But now I'm afraid this happens for everyone for this update 440.100 --> 450.57. :-( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c15 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|API mismatch in NVIDIA |NVIDIA driver after update |driver after update to |440.100 --> 450.57 fails |450.57 |due to remaining old kernel | |modules --- Comment #15 from Stefan Dirsch <sndirsch@suse.com> --- Now I know what happens. Up to 440.100 mistakenly kernel modules were rebuilt and installed for the kernel, against it has been locally built. Currently this is 5.3.18-lp152.20.7. With 450.57 I switched this back to our weak-modules concept, i.e. kernel modules are installed to a fixed kernel version (here: 5.3.18-lp152.19; even if it doesn't exist on the system), then weak-modules symlinks are created for all other installed kernels. Example 440.100 packages 450.57 packages ----------- .19 fixed GA Kernel no kernel moules 450.57 modules .20 build kernel 440.100 modules 440.100 modules (no weak symlinks created) *** .85 another kernel no kernel modules weak symlinks to .19 fixed kernel (450.57 modules) *** because modules with the same name already exist As a fix I could remove the old modules before installing the new ones. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 http://bugzilla.opensuse.org/show_bug.cgi?id=1174204#c16 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #16 from Stefan Dirsch <sndirsch@suse.com> --- Fixed and pushed packages towards nvidia. Consider this a reliable workaround as long as this update is not available yet: rpm -e nvidia-gfxG05-kmp-default --nodeps find /lib/modules -name nvidia*.ko -delete zypper in nvidia-gfxG05-kmp-default Fixed packages contain the following RPM changelog: Thu Jul 16 19:36:52 UTC 2020 - Stefan Dirsch <sndirsch@suse.com> - remove still existing old kernel modules during installation of new modules, since otherwise weak-modules doesn't work (boo#1174204) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1174204 Vadim Krevs <vkrevs@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vkrevs@yahoo.com -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com