[Bug 1182666] New: TW 20210222 - Update broken for NVIDIA
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 Bug ID: 1182666 Summary: TW 20210222 - Update broken for NVIDIA Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: X11 3rd Party Driver Assignee: gfx-bugs@suse.de Reporter: axel.braun@gmx.de QA Contact: sndirsch@suse.com Found By: --- Blocker: --- upgrading 20210220 to 20210222 fails and ends in terminal window more /var/log/Xorg.0.log | grep EE (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 69.552] (EE) Failed to load module "intel" (module does not exist, 0) [ 69.558] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.558] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.558] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.563] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.563] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.563] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.568] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.568] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.568] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.573] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.573] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.573] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.573] (EE) No devices detected. [ 69.573] (EE) [ 69.573] (EE) no screens found(EE) [ 69.573] (EE) [ 69.573] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 69.573] (EE) [ 69.591] (EE) Server terminated with error (1). Closing log file. journalctl -xb | grep nvidia Feb 24 12:08:45 X1E kernel: audit: type=1400 audit(1614164925.078:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=552 comm="app armor_parser" Feb 24 12:08:45 X1E kernel: audit: type=1400 audit(1614164925.078:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=552 com m="apparmor_parser" Feb 24 12:08:55 X1E suse-prime[1323]: Boot: setting-up nvidia card Feb 24 12:08:56 X1E suse-prime[1392]: trying switch ON nvidia: [bbswitch] NVIDIA card is ON Feb 24 12:08:56 X1E prime-select[1394]: modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/5.10.16-1-default -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c1 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Status|NEW |IN_PROGRESS CC| |axel.braun@gmx.de Flags| |needinfo?(axel.braun@gmx.de | |) --- Comment #1 from Stefan Dirsch <sndirsch@suse.com> --- Looks like rebuild of kernel module didn't work for some reason. I don't see any build failures against TW kernels in my test environment. So maybe you're also running kernels from a different repository. Please attach the result of running nvidia-bug-report.sh. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c2 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(axel.braun@gmx.de | |) | --- Comment #2 from Axel Braun <axel.braun@gmx.de> --- Created attachment 846445 --> http://bugzilla.opensuse.org/attachment.cgi?id=846445&action=edit bug report as requested Hi Stefan, this is the bug report using the last working snapshot - not sure if that helps. I use standard kernel. As Philip Raets reported, reinstalling the nvidia-driver, which triggers a rebuild, fixed it -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c3 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |WORKSFORME --- Comment #3 from Stefan Dirsch <sndirsch@suse.com> --- No, this doesn't help. Maybe the kernel module wasn't rebuild because you only updated the kernel package and not the kernel-<flavor>-devel package, which triggers the rebuild. I don't know. Whatever, apparently this can't be investigated any longer. Closing. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c4 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|WORKSFORME |--- Assignee|gfx-bugs@suse.de |axel.braun@gmx.de --- Comment #4 from Axel Braun <axel.braun@gmx.de> --- Created attachment 846460 --> http://bugzilla.opensuse.org/attachment.cgi?id=846460&action=edit error log unfortunately the error is reproducible - find the log with failed x-Session attached. The devel kernel is installed as well, and the drivers are recompiled. Anyhow, no graphical interface -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c5 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #846445|0 |1 is obsolete| | --- Comment #5 from Axel Braun <axel.braun@gmx.de> --- Created attachment 846461 --> http://bugzilla.opensuse.org/attachment.cgi?id=846461&action=edit installation log -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c6 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |IN_PROGRESS Flags| |needinfo?(axel.braun@gmx.de | |) --- Comment #6 from Stefan Dirsch <sndirsch@suse.com> --- Ok. You should try to figure out why the nvidia kernel module doesn't load. Something like this sudo dmesg -c > /dev/null modprobe nvidia dmesg Where are the modules. Do they exist? find /lib/modules/ -name nvidia*.ko -exec ls -l {} \; -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c7 --- Comment #7 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Axel Braun from comment #5)
Created attachment 846461 [details] installation log
Looks like kernel module rebuild was successful. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c8 Axel Braun <axel.braun@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(axel.braun@gmx.de | |) | --- Comment #8 from Axel Braun <axel.braun@gmx.de> --- Created attachment 846465 --> http://bugzilla.opensuse.org/attachment.cgi?id=846465&action=edit in search for the modules.... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c9 --- Comment #9 from Stefan Dirsch <sndirsch@suse.com> --- You're using kernel 5.10.16-1-default Old kernel modules are still available in 5.10.14.-1-default directory. weak-updates symlinks are available for 5.10.16-1-default but are symlinking to non-existing modules in 5.10.9-1-default. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c10 --- Comment #10 from Axel Braun <axel.braun@gmx.de> --- (In reply to Stefan Dirsch from comment #9)
You're using
kernel 5.10.16-1-default
Old kernel modules are still available in 5.10.14.-1-default directory. weak-updates symlinks are available for 5.10.16-1-default but are symlinking to non-existing modules in 5.10.9-1-default.
Yes, seen this as well, but question is - how comes? You have seen the installation log, where it compiled the kernels, but obviously did not put it into the right place -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c11 --- Comment #11 from Stefan Dirsch <sndirsch@suse.com> --- # rpm --triggers -qp nvidia-gfxG05-kmp-default-460.39_k5.10.9_1-34.1.x86_64.rpm [...] triggerpostun scriptlet (using /bin/sh) -- kernel-default for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do if [ ! -d $dir/kernel ]; then test -d $dir/updates && rm -f $dir/updates/nvidia*.ko fi done Maybe checking for /lib/modules/5.10.16-1-default/kernel no longer works and this file doesn't exist although 5.10.16-1-default is the currently installed and running kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c12 --- Comment #12 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Axel Braun from comment #10)
Yes, seen this as well, but question is - how comes? You have seen the installation log, where it compiled the kernels, but obviously did not put it into the right place
Or removed it right again. See comment #11. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c13 --- Comment #13 from Axel Braun <axel.braun@gmx.de> --- (In reply to Stefan Dirsch from comment #11)
Maybe checking for /lib/modules/5.10.16-1-default/kernel no longer works and this file doesn't exist although 5.10.16-1-default is the currently installed and running kernel.
The directory /lib/modules/5.10.16-1-default/kernel exists and looks OK. Surprisingly there is still a bunch of modules from the day onwards where I set-up the system (starting at 5.5.x kernels...) - this is not automatically cleaned up? Anything else I can check/lookup/try? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c14 --- Comment #14 from Stefan Dirsch <sndirsch@suse.com> --- Ok. Then I don't know why the modules could have been removed immediately again right after build and installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c15 Michael Hirmke <opensuse@mike.franken.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |opensuse@mike.franken.de --- Comment #15 from Michael Hirmke <opensuse@mike.franken.de> --- Not sure if this is the same problem, but: On Leap 15.2 I actually have kernel 5.3.18-lp152.63-default. When adding the NVidia repo and installing nvidia-glG05, the files can be found in 5.3.18-lp152.19-default/updates afterwards - and nothing was copied or linked to 5.3.18-lp152.63-default. So no module is loaded on reboot. After manually copying or linking the modules to 5.3.18-lp152.63-default/updates and running "depmod -a" the modules are loaded on nex reboot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c16 --- Comment #16 from Michael Hirmke <opensuse@mike.franken.de> --- ls -l 5.3.18-lp152.19-default total 36 -rw-r--r-- 1 root root 271 Feb 25 13:07 modules.alias -rw-r--r-- 1 root root 213 Feb 25 13:07 modules.alias.bin -rw-r--r-- 1 root root 0 Feb 25 13:07 modules.builtin.bin -rw-r--r-- 1 root root 172 Feb 25 13:07 modules.dep -rw-r--r-- 1 root root 321 Feb 25 13:07 modules.dep.bin -rw-r--r-- 1 root root 0 Feb 25 13:07 modules.devname -rw-r--r-- 1 root root 55 Feb 25 13:07 modules.softdep -rw-r--r-- 1 root root 3768 Feb 25 13:07 modules.symbols -rw-r--r-- 1 root root 4321 Feb 25 13:07 modules.symbols.bin drwxr-xr-x 2 root root 4096 Feb 25 13:07 updates ls -l 5.3.18-lp152.19-default/updates/ total 100944 -rw-r--r-- 1 root root 5772760 Feb 25 13:07 nvidia-drm.ko -rw-r--r-- 1 root root 2246168 Feb 25 13:07 nvidia-modeset.ko -rw-r--r-- 1 root root 42845424 Feb 25 13:07 nvidia-uvm.ko -rw-r--r-- 1 root root 52490664 Feb 25 13:07 nvidia.ko So everything has been newly created during installation of nvidia-glG05. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c17 --- Comment #17 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Michael Hirmke from comment #15)
Not sure if this is the same problem, but: On Leap 15.2 I actually have kernel 5.3.18-lp152.63-default. When adding the NVidia repo and installing nvidia-glG05, the files can be found in 5.3.18-lp152.19-default/updates afterwards
This is correct behaviour.
- and nothing was copied or linked to 5.3.18-lp152.63-default.
That's an issue. weak-updates(2) should have created symlinks from 5.3.18-lp152.63-default/weak-updates to 5.3.18-lp152.19-default/updates .
So no module is loaded on reboot. After manually copying or linking the modules to 5.3.18-lp152.63-default/updates and running "depmod -a" the modules are loaded on nex reboot.
Yes, good workaround. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c18 --- Comment #18 from Stefan Dirsch <sndirsch@suse.com> --- Created attachment 846518 --> http://bugzilla.opensuse.org/attachment.cgi?id=846518&action=edit I can't reproduce this issue on my TW installation I've updated all the kernel packages kernel-default kernel-default-devel kernel-devel from 5.9.1.-1 to 5.10.16-1 and kernel modules are installed to the correct directory and are working. (we are not wokring with weak-updates on TW since Kernels are not considered compatible) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|axel.braun@gmx.de |sndirsch@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|TW 20210222 - Update broken |TW 20210222 - Kernel update |for NVIDIA |breaks NVIDIA kernel | |modules -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c19 --- Comment #19 from Michael Hirmke <opensuse@mike.franken.de> --- I've retried it several times on Leap - no weak-update mechanism! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c20 --- Comment #20 from Axel Braun <axel.braun@gmx.de> --- After cleaning /lib/modules/ from some old 5.5, 5.6 etc versions I triggered a reinstall from the command line for nvidia-gfxG05-kmp-default|460.39_k5.10.9_1-34.1| nvidia-glG05|460.39-34.1 nvidia-computeG05|460.39-34.1 x11-video-nvidiaG05|460.39-34.1 Which brought back the modules: X1E:/home/docb # find /lib/modules -name nvidia*.ko -exec ls -l {} \; -rw-r--r-- 1 root root 3746408 16. Feb 15:12 /lib/modules/5.10.14-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 2188704 16. Feb 15:12 /lib/modules/5.10.14-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40549672 16. Feb 15:12 /lib/modules/5.10.14-1-default/updates/nvidia-uvm.ko -rw-r--r-- 1 root root 49432264 16. Feb 15:12 /lib/modules/5.10.14-1-default/updates/nvidia.ko -rw-r--r-- 1 root root 3746392 25. Feb 16:20 /lib/modules/5.10.16-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 2188840 25. Feb 16:20 /lib/modules/5.10.16-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40550216 25. Feb 16:20 /lib/modules/5.10.16-1-default/updates/nvidia-uvm.ko -rw-r--r-- 1 root root 49432392 25. Feb 16:20 /lib/modules/5.10.16-1-default/updates/nvidia.ko and, no weak-updates... This solved not the root cause, of the issue, but at least I can work again... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c21 --- Comment #21 from Stefan Dirsch <sndirsch@suse.com> --- Ok. So the behaviour of nvidia packages is fine. As mentioned before TW doesn't use weak-updates mechanism. Just updating the kernel alsne doesn't work as expected for you (TW) and for Michael (Leap) (but it does for me (TW) and apparently also for others (TW and Leap IIRC) as I've seen on the factory ML). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c22 --- Comment #22 from Axel Braun <axel.braun@gmx.de> --- Then it is interesting (probably) to learn where these weak-update entries come from. This machine was set-up last May with TW and keeps rolling since then. In the ML are some more users that had issues with *this* update and fixed it by re-installation -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c23 --- Comment #23 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to Stefan Dirsch from comment #21)
Ok. So the behaviour of nvidia packages is fine. As mentioned before TW doesn't use weak-updates mechanism. Just updating the kernel alsne doesn't work as expected for you (TW) and for Michael (Leap) (but it does for me (TW) and apparently also for others (TW and Leap IIRC) as I've seen on the factory ML).
Guess I found the reason. After installing again, I saw: Warning: /lib/modules/5.3.18-lp152.63-default is inconsistent Warning: weak-updates symlinks might not be created Probably this happens, because I alredy have an update directory in it, which contains drivers for my DV card. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c24 --- Comment #24 from Michael Hirmke <opensuse@mike.franken.de> --- After moving my updates directory out of the way, a reinstall of the nvidia drivers gave the desired results: ls -la weak-updates/updates/ total 12 drwxr-xr-x 2 root root 4096 Feb 26 20:25 . drwxr-xr-x 3 root root 4096 Feb 26 20:25 .. lrwxrwxrwx 1 root root 58 Feb 26 20:25 nvidia-drm.ko -> /lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 62 Feb 26 20:25 nvidia-modeset.ko -> /lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 58 Feb 26 20:25 nvidia-uvm.ko -> /lib/modules/5.3.18-lp152.19-default/updates/nvidia-uvm.ko lrwxrwxrwx 1 root root 54 Feb 26 20:25 nvidia.ko -> /lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko So the question is how to handle this combination with own drivers and the weak-update mechanism of the nvidia drivers? Just copying my drivers to weak-updates/updates? Or is there a better way? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c25 --- Comment #25 from Stefan Dirsch <sndirsch@suse.com> --- Ok. But I did not understand where your DV card modules are located exactly. You should not copy anything below weak-updates dir. The KMP or the kernel itself creates symlinks below this directory to compatible modules in different kernel module trees. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c26 --- Comment #26 from Michael Hirmke <opensuse@mike.franken.de> --- (In reply to Stefan Dirsch from comment #25)
Ok. But I did not understand where your DV card modules are located exactly.
You should not copy anything below weak-updates dir. The KMP or the kernel itself creates symlinks below this directory to compatible modules in different kernel module trees.
I copied them to /lib/modules/<ver>/updates/media/... with <ver> being the actual kernel version. Btw. - it should read DVB card. This is a TechnoTrend S2-6400, where no official drivers are available. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c27 --- Comment #27 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. This shouldn't be a problem for weak-updates, but I'm not familiar with this /usr/lib/module-init-tools/weak-modules2 script. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c29 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |guenter.stoehr@gs-consult-a | |c.de --- Comment #29 from Stefan Dirsch <sndirsch@suse.com> --- *** Bug 1183480 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c30 B <kerossin@pm.me> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kerossin@pm.me --- Comment #30 from B <kerossin@pm.me> --- I believe I encountered the same problem when upgrading to snapshot 20210315. kernel-default changed from 5.11.4-1.2 to 5.11.4-1.3. /lib/modules/5.11.4-1-default/updates/ usually contains the kernel modules but after the upgrade this directory is empty and there's a new directory /lib/modules/5.11.4-1-default/weak-updates/updates/ which has broken links nvidia-drm.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-drm.ko nvidia.ko -> /lib/modules/5.10.16-1-default/updates/nvidia.ko nvidia-modeset.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-modeset.ko nvidia-uvm.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-uvm.ko I don't have any 5.10 kernels installed anymore, there's just old dirs left over in /lib/modules but they don't have the modules anymore so I don't know why it tries to link to them. Also, this bug seems to happen when an existing kernel is reinstalled when the SUSE patch number gets bumped. In this case the upstream kernel version 5.11.4 stayed the same but the SUSE patch number went from 1.2 to 1.3 so the kernel was reinstalled. I believe the same thing happened with 20210222 when Alex reported this, kernel remained 5.10.16 but the SUSE patch also change 1.2->1.3. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c31 --- Comment #31 from B <kerossin@pm.me> --- Also, I don't think I encountered this issue with snapshot 20210222 and kernel 5.10.16 because I skipped it and updated later when 5.11 was out. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c32 --- Comment #32 from Stefan Dirsch <sndirsch@suse.com> --- Thanks. Sounds like an interesting observation. So kernels with same version number but different patchlevel 1.2/1.3 result in the same kernel modules path. Maybe I can reproduce the issue with such kernels now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c33 Michael Pujos <pujos.michael@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pujos.michael@gmail.com --- Comment #33 from Michael Pujos <pujos.michael@gmail.com> --- I had this issue for the first time, updating a kernel just differing from the previous by the patch level. So you can bet that's it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c34 --- Comment #34 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. After updating kernel-default kernel-default-devel kernel-devel from 5.11.2-1 to 5.11.4-1 kernel modules have been built and installed and no dangling symlinks created. X works fine, modules are loaded and active. # find . -name nvidia* | xargs ls -ld drwxr-xr-x 2 root root 4096 Feb 24 23:42 ./5.10.16-1-default/kernel/drivers/net/ethernet/nvidia drwxr-xr-x 2 root root 4096 Mar 8 11:15 ./5.11.2-1-default/kernel/drivers/net/ethernet/nvidia -rw-r--r-- 1 root root 3761000 Mar 16 21:59 ./5.11.2-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 49582696 Mar 16 21:59 ./5.11.2-1-default/updates/nvidia.ko -rw-r--r-- 1 root root 2203392 Mar 16 21:59 ./5.11.2-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40844136 Mar 16 21:59 ./5.11.2-1-default/updates/nvidia-uvm.ko drwxr-xr-x 2 root root 4096 Mar 16 22:03 ./5.11.4-1-default/kernel/drivers/net/ethernet/nvidia -rw-r--r-- 1 root root 3761000 Mar 16 22:06 ./5.11.4-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 49582696 Mar 16 22:06 ./5.11.4-1-default/updates/nvidia.ko -rw-r--r-- 1 root root 2203248 Mar 16 22:06 ./5.11.4-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40844608 Mar 16 22:06 ./5.11.4-1-default/updates/nvidia-uvm.ko Unfortunately I don't have any kernel packages available for testing with same release but different patch number ... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c35 --- Comment #35 from B <kerossin@pm.me> --- In the current TW OSS repo Zypper doesn't show 5.11.4-1.2 but I can see from the browser both 1.2 and 1.3. https://download.opensuse.org/tumbleweed/repo/oss/x86_64/kernel-default-5.11... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c36 --- Comment #36 from Stefan Dirsch <sndirsch@suse.com> --- Tried installing same kernel packages version 5.11.4-1 (rpm --force), but still same result. :-( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c37 --- Comment #37 from Stefan Dirsch <sndirsch@suse.com> --- I only see 5.11.4-1.3 in http://download.opensuse.org/tumbleweed/repo/oss/ TW repo. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c38 --- Comment #38 from Stefan Dirsch <sndirsch@suse.com> --- Ok. I was able to download 5.11.4-1.2 and 5.11.4-1.3 packages and installed both. Result then is # find /lib/modules/ -name nvidia*|xargs ls -ld drwxr-xr-x 2 root root 4096 Feb 24 23:42 /lib/modules/5.10.16-1-default/kernel/drivers/net/ethernet/nvidia drwxr-xr-x 2 root root 4096 Mar 16 23:01 /lib/modules/5.11.4-1-default/kernel/drivers/net/ethernet/nvidia -rw-r--r-- 1 root root 3761000 Mar 16 23:01 /lib/modules/5.11.4-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 49582696 Mar 16 23:01 /lib/modules/5.11.4-1-default/updates/nvidia.ko -rw-r--r-- 1 root root 2203248 Mar 16 23:01 /lib/modules/5.11.4-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40844608 Mar 16 23:01 /lib/modules/5.11.4-1-default/updates/nvidia-uvm.ko So still works for me. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c39 --- Comment #39 from Stefan Dirsch <sndirsch@suse.com> --- Of course I first installed 5.11.4-1.2 and then later 5.11.4-1.3 kernel packages. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c40 --- Comment #40 from Stefan Dirsch <sndirsch@suse.com> --- But anyway I kind of doubt it's really a good idea having two kernels installed using the same kernel modules directory, i.e. with a massive number of file conflicts. Since both use /lib/modules/5.11.4-1-default ... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c41 --- Comment #41 from B <kerossin@pm.me> --- I did some testing myself, removed all kernel 5.11.4 packages, installed 5.11.4-1.2 from RPMs then installed 1.3 from repo but couldn't reproduce the problem. One observation I made is that when kernel-default is installed the broken links in weak-updates/updates/ are generated, then later when kernel-default-devel is being installed those links are removed and the proper modules are generated in updates/ and it looks like the packages are always installed in the same correct order: 1. kernel-default 2. kernel-devel 3. kernel-default-devel so cause from random installation order doesn't seem possible. There seems to be some other weird fairly rare fault during kernel-default-devel installation that happens. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c42 Paolo Stivanin <pstivanin@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pstivanin@suse.com --- Comment #42 from Paolo Stivanin <pstivanin@suse.com> --- I was on 20210307, all good. Then updated to 20210312 and drivers were not loaded. Also with 20210315 they weren't being loaded. $ find /lib/modules/ -name nvidia*|xargs ls -ld drwxr-xr-x 1 root root 30 Mar 8 17:59 /lib/modules/5.11.2-1-default/kernel/drivers/net/ethernet/nvidia -rw-r--r-- 1 root root 3761168 Mar 8 18:03 /lib/modules/5.11.2-1-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 49582648 Mar 8 18:03 /lib/modules/5.11.2-1-default/updates/nvidia.ko -rw-r--r-- 1 root root 2203328 Mar 8 18:03 /lib/modules/5.11.2-1-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 40844304 Mar 8 18:03 /lib/modules/5.11.2-1-default/updates/nvidia-uvm.ko drwxr-xr-x 1 root root 30 Mar 16 16:32 /lib/modules/5.11.4-1-default/kernel/drivers/net/ethernet/nvidia lrwxrwxrwx 1 root root 52 Mar 16 16:32 /lib/modules/5.11.4-1-default/weak-updates/updates/nvidia-drm.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 48 Mar 16 16:32 /lib/modules/5.11.4-1-default/weak-updates/updates/nvidia.ko -> /lib/modules/5.10.16-1-default/updates/nvidia.ko lrwxrwxrwx 1 root root 56 Mar 16 16:32 /lib/modules/5.11.4-1-default/weak-updates/updates/nvidia-modeset.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 52 Mar 16 16:32 /lib/modules/5.11.4-1-default/weak-updates/updates/nvidia-uvm.ko -> /lib/modules/5.10.16-1-default/updates/nvidia-uvm.ko I then proceeded with reinstalling all nvidia related packages, and that did solve the issue for me -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c43 --- Comment #43 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to B from comment #41)
I did some testing myself, removed all kernel 5.11.4 packages, installed 5.11.4-1.2 from RPMs then installed 1.3 from repo but couldn't reproduce the problem.
One observation I made is that when kernel-default is installed the broken links in weak-updates/updates/ are generated, then later when kernel-default-devel is being installed those links are removed and the proper modules are generated in updates/ and it looks like the packages are always installed in the same correct order: 1. kernel-default 2. kernel-devel 3. kernel-default-devel so cause from random installation order doesn't seem possible. There seems to be some other weird fairly rare fault during kernel-default-devel installation that happens.
Actually this makes sense. The rebuild of the nvidia kernel module on TW is triggered by an update of kernel-default-devel and at this time broken weak-updates symlinks are removed as well. # get rid of broken weak-updates symlinks created in some %post apparently; # either by kmp itself or by kernel package update for i in $(find /lib/modules/*/weak-updates -type l 2> /dev/null); do test -e $(readlink -f $i) || rm $i done [...] (build and install kernel module) So it seems users are updating kernel-default, but not kernel-default-devel. Not sure why. kernel-default-devel is simply needed to (re)build the kernel module. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c44 --- Comment #44 from Axel Braun <axel.braun@gmx.de> --- (In reply to Stefan Dirsch from comment #43)
So it seems users are updating kernel-default, but not kernel-default-devel.
Hm, zypper dup should resolve/force this. Maybe it does not? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c45 --- Comment #45 from B <kerossin@pm.me> --- I used zypper dup when I encountered this problem. Also, I found there are Zypper logs at /var/log/zypp/history. I pulled out all kernel* package changes from that run: 2021-03-16 16:16:00|command|root@kero-pc|'zypper' 'dup' '-l'| 2021-03-16 16:16:12|install|kernel-macros|5.11.4-1.3|noarch||download.opensuse.org-oss|094c2698eee92f5e2ed2141b24402d723312363176761fb7b9051bcc6ed708b3| 2021-03-16 16:16:21|install|kernel-devel|5.11.4-1.3|noarch||download.opensuse.org-oss|2a2fef6ba3147345f27515e6a1ee6a250b35fd161749fa257b244d04887b7f6e| 2021-03-16 16:16:24|remove |kernel-source|5.11.4-1.2|noarch|| 2021-03-16 16:16:25|remove |kernel-devel|5.11.4-1.2|noarch|| 2021-03-16 16:16:25|remove |kernel-default-devel|5.11.4-1.2|x86_64|| 2021-03-16 16:16:55|install|kernel-source|5.11.4-1.3|noarch||download.opensuse.org-oss|3f88e7ef0ccc6967b56ffb11a7b0a46eda8869820b7f2067acb41922fe3124a5| # 2021-03-16 16:18:28 kernel-default-devel-5.11.4-1.3.x86_64.rpm installed ok # Additional rpm output: # /usr/src/kernel-modules/nvidia-460.56-default / *** removed all the make output when compiling modules *** 2021-03-16 16:18:28|install|kernel-default-devel|5.11.4-1.3|x86_64||download.opensuse.org-oss|e7d65e28cc78c8a2224581b1a08f804ef1943353ba7864b7fd670fb1c55d8b7e| 2021-03-16 16:18:33|remove |kernel-default|5.11.4-1.2|x86_64|| 2021-03-16 16:18:38|install|kernel-syms|5.11.4-1.3|x86_64||download.opensuse.org-oss|5ed22f7ab2fd92cab01455afe73e89d2f57256673ac3459360498f34a045b9d0| 2021-03-16 16:19:12|install|kernel-default|5.11.4-1.3|x86_64||download.opensuse.org-oss|d1fa5e4c267fe3bb23f02258b940b4b6060f873ab480cee7a93c131258653abb| Now when I was testing the updates yesterday, on the last few runs I checked the kernel package updates either by zypper in and specifying the packages or by zypper dup were handled in the correct order - that is kernel-default is installed first and later kernel-default-devel is installed. But the logs from the original zypper dup (which produced the problem) according to the timestamps say that first the kernel-default-devel was installed "2021-03-16 16:18:28|install|kernel-default-devel|5.11.4-1.3" (and there's output of it actually compiling the modules), then the old kernel-default was removed "2021-03-16 16:18:33|remove |kernel-default|5.11.4-1.2|x86_64||" and the new one installed "2021-03-16 16:19:12|install|kernel-default|5.11.4-1.3". I ran a test just now: 1) Uninstalled kernel-default and kernel-default-devel 5.11.4 packages 2) sudo zypper in --oldpackage kernel-default-5.11.4-1.2.x86_64.rpm kernel-default-devel-5.11.4-1.2.x86_64.rpm Installed -1.2 from RPMs, modules compiled and in the updates/ dir, no broken links. 3) sudo zypper in kernel-default-devel Upgraded ONLY kernel-default-devel to -1.3 from the repo, again modules compiled and where they need to be. 4) sudo zypper in kernel-default Now upgraded the kernel to -1.3 from the repo, updates/ is empty and the broken links in weak-updates/updates/ are present. So for some reason zypper dup sometimes installs kernel packages in the wrong order when there are a lot of packages. Is it possible that another package depends on kernel-default-devel and pulls it to be installed before kernel-default? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c46 --- Comment #46 from Guenter Stoehr <guenter.stoehr@gs-consult-ac.de> --- Created attachment 847339 --> http://bugzilla.opensuse.org/attachment.cgi?id=847339&action=edit 2021-03-11_zypper-history.log and 2021-03-15_zypper-history.log Find attached 2 extracts of /var/log/zypp/history of the pc of related bug 1183480: 1. 2021-03-11_zypper-history.log: Updating Leap 15.2 from 5.3.18-lp152.63-default to 5.3.18-lp152.66-default. The nvidia-drivers don't work. 2. 2021-03-15_zypper-history.log: fixing the problem by re-installing "kernel-default-devel" first (didn't work) and then "kernel-devel" second. Then it worked. Hope this has some use... :) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c47 --- Comment #47 from B <kerossin@pm.me> ---
rpm -q --whatrequires kernel-default no package requires kernel-default
rpm -q --whatrequires kernel-default-devel nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64 kernel-syms-5.11.4-1.2.x86_64 kernel-syms-5.11.4-1.3.x86_64
rpm -q --requires kernel-syms-5.11.4-1.3.x86_64 kernel-default-devel = 5.11.4-1 kernel-devel = 5.11.4-1
rpm -q --requires kernel-default-devel-5.11.4-1.3.x86_64 kernel-devel = 5.11.4-1
There seems to be no strict requirement for kernel-default to be installed before kernel-default-devel so it must've been a coincidence that the few times I looked at the order when installing it was correct. When kernel-default is uninstalled it cleans out /lib/modules/$kernel/updates/ dir where the modules are. So if there's a minor patch version bump 5.11.4-1.3 it first needs to remove the older package 5.11.4-1.2 so if you get unlucky: 1) kernel-default-devel-5.11.4-1.3 is installed 2) kernel-default-5.11.4-1.2 is removed 3) kernel-default-5.11.4-1.3 is installed you now are left without those extra kernel modules. Some possible fixes I see: 1) Modify the kernel-default uninstall script to not clean out /lib/modules/$kernel/updates/. This is probably a lazy and bad idea because it will junk when you normally uninstall kernels. 2) Modify the kernel-default uninstall script so that it somehow knows that it's not a regular uninstall happening but a minor patch upgrade and not delete those kernel modules. 3) Make it that minor patch updates (5.11.4-1.2->5.11.4-1.3) work like regular updates (for example 5.11.4->5.11.6) where kernels are not replaced but installed alongside each other and use different directories. Basically every idea involves modifications to the kernel packages, not Nvidia and I have no idea how easy it would be to implement those. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c48 --- Comment #48 from Stefan Dirsch <sndirsch@suse.com> --- Thanks a lot for your input @B <kerossin@pm.me> ! To me your arguments make 100% sense. I vote for 3) Make it that minor patch updates (5.11.4-1.2->5.11.4-1.3) work like regular updates (for example 5.11.4->5.11.6) where kernels are not replaced but installed alongside each other and use different directories. I would call it broken to use the same /lib/modules/<kernel-version-omitting-patch-version> directory for different kernel patchversions - especially when you are supposed to install both at the same time and switch between for booting. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c49 --- Comment #49 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Stefan Dirsch from comment #48)
Thanks a lot for your input @B <kerossin@pm.me> !
To me your arguments make 100% sense. I vote for
3) Make it that minor patch updates (5.11.4-1.2->5.11.4-1.3) work like regular updates (for example 5.11.4->5.11.6) where kernels are not replaced but installed alongside each other and use different directories.
I would call it broken to use the same /lib/modules/<kernel-version-omitting-patch-version> directory for different kernel patchversions - especially when you are supposed to install both at the same time and switch between for booting.
But then kernel versions (uname -r) would also need to be 5.11.4-1.2 instead of 5.11.4-1 for 5.11.4-1.2 RPMs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c50 --- Comment #50 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. I could not reproduce that /lib/modules/<kernel-version>/updates contents gets removed when updating from kernel-default 5.11.4-1.2 to 5.11.4-1.3. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c51 --- Comment #51 from B <kerossin@pm.me> --- (In reply to Stefan Dirsch from comment #50)
Hmm. I could not reproduce that /lib/modules/<kernel-version>/updates contents gets removed when updating from kernel-default 5.11.4-1.2 to 5.11.4-1.3.
Did you first upgrade kernel-default-devel and only then kernel-default on a separate zypper run? -- You are receiving this mail because: You are on the CC list for the bug.
sudo rpm -eh -vv kernel-default-5.11.4-1.3.x86_64 *** removed part of output *** D: %postun(kernel-default-5.11.4-1.3.x86_64): waitpid(18562) rc 18562 status 0 D: Plugin: calling hook scriptlet_post in syslog plugin D: %triggerpostun(nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64):
rpm -q --scripts nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c52 --- Comment #52 from B <kerossin@pm.me> --- Did some more digging, looking at the rpm scripts in kernel-default there doesn't seem to be anything that would manipulate /lib/modules/$kernel/updates/ directory so I tried removing kernel-default with the rpm tool with verbose output: scriptlet start fdio: 2 writes, 172 total bytes in 0.000011 secs D: %triggerpostun(nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64): execv(/bin/sh) pid 19832 D: Plugin: calling hook scriptlet_fork_post in prioreset plugin D: Plugin: calling hook scriptlet_fork_post in selinux plugin ++ find /lib/modules -mindepth 1 -maxdepth 1 -type d + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.10.12-1-default/kernel ']' + test -d /lib/modules/5.10.12-1-default/updates + rm -f '/lib/modules/5.10.12-1-default/updates/nvidia*.ko' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.10.12-1khz/kernel ']' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.10.14-1-default/kernel ']' + test -d /lib/modules/5.10.14-1-default/updates + rm -f '/lib/modules/5.10.14-1-default/updates/nvidia*.ko' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.10.16-1-default/kernel ']' + test -d /lib/modules/5.10.16-1-default/updates + rm -f '/lib/modules/5.10.16-1-default/updates/nvidia*.ko' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.11.2-1-default/kernel ']' + test -d /lib/modules/5.11.2-1-default/updates + rm -f '/lib/modules/5.11.2-1-default/updates/nvidia*.ko' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.11.2-1khz/kernel ']' + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.11.4-1-default/kernel ']' + test -d /lib/modules/5.11.4-1-default/updates + rm -f /lib/modules/5.11.4-1-default/updates/nvidia-drm.ko /lib/modules/5.11.4-1-default/updates/nvidia.ko /lib/modules/5.11.4-1-default/updates/nvidia-modeset.ko /lib/modules/5.11.4-1-default/updates/nvidia-uvm.ko + for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d) + '[' '!' -d /lib/modules/5.11.4-1-1khz/kernel ']' *** removed part of output *** So it seems when kernel-default is being removed the post uninstall script of nvidia-gfxG05-kmp-default is triggered. This looks like the right script: postuninstall scriptlet (using /bin/sh): flavor=default if [ "$1" = 0 ] ; then # Avoid accidental removal of G<n+1> alternative (bnc#802624) if [ ! -f /usr/lib/nvidia/alternate-install-present-$flavor ]; then /usr/sbin/update-alternatives --remove alternate-install-present /usr/lib/nvidia/alternate-install-present-$flavor # get rid of *all* nvidia kernel modules when uninstalling package (boo#1180010) for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do test -d $dir/updates && rm -f $dir/updates/nvidia*.ko done fi # cleanup of bnc# 1000625 rm -f /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf fi Seems to be clear that if Nvidia kernel modules are installed before a minor patch version kernel "upgrade" there won't be any module files after. Also, if anyone will try to test this be careful with the rpm command, even though I told it to erase a specific version of kernel-default for some reason it removed all of them. Luckily I run some custom kernels and those weren't touched. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c53 --- Comment #53 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. I believe the behaviour is correct. You need to look for %triggerpostun script, not the %postun script. # rpm --triggers -q nvidia-gfxG05-kmp-default triggerpostun scriptlet (using /bin/sh) -- kernel-default for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do if [ ! -d $dir/kernel ]; then test -d $dir/updates && rm -f $dir/updates/nvidia*.ko fi done modules in updates/ are only removed if there is no kernel/ subdir any longer, i.e. no modules are longer installed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c54 --- Comment #54 from Stefan Dirsch <sndirsch@suse.com> --- Also during an update %triggerpostun is the last which is being executed. https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#orderi... At that time there should exist a /lib/modules/<kernel-version>/kernel/ directory. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c55 --- Comment #55 from B <kerossin@pm.me> --- (In reply to Stefan Dirsch from comment #53)
Hmm. I believe the behaviour is correct. You need to look for %triggerpostun script, not the %postun script.
My bad, this was probably my first time seriously looking into triggers/scripts of an RPM. But yes, this looks correct, I just wanted to find out for myself how it happens and to confirm that.
# rpm --triggers -q nvidia-gfxG05-kmp-default triggerpostun scriptlet (using /bin/sh) -- kernel-default for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do if [ ! -d $dir/kernel ]; then test -d $dir/updates && rm -f $dir/updates/nvidia*.ko fi done
modules in updates/ are only removed if there is no kernel/ subdir any longer, i.e. no modules are longer installed.
Also during an update %triggerpostun is the last which is being executed.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#orderi...
At that time there should exist a /lib/modules/<kernel-version>/kernel/ directory.
Now this is where I think the problem is. In this particular situation - a minor kernel version bump seems to be handled neither like a regular kernel update and neither like a normal package update. This "update" is basically a normal uninstall of the old version, there's no /lib/modules/$kernel/kernel/ directory anymore because it belongs to kernel-default package, the triggerpostun then deletes those Nvidia modules, a normal installation of the new kernel-default begins. So I think were back to the same solutions where something should be done from the kernel packaging side. I think potentially it's not just a problem with Nvidia drivers but other packages that handle modules in a similar fashion could be affected. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c56 --- Comment #56 from Stefan Dirsch <sndirsch@suse.com> --- No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out. https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#orderi... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c57 --- Comment #57 from B <kerossin@pm.me> --- (In reply to Stefan Dirsch from comment #56)
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3" So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c58 --- Comment #58 from Michael Pujos <pujos.michael@gmail.com> --- Created attachment 847362 --> http://bugzilla.opensuse.org/attachment.cgi?id=847362&action=edit lizypp logs for 5.11.4-1.2 to 5.11.4-1.3 kernel update I've extracted the libzypp logs for my update from 5.11.4-1.2 to 5.11.4-1.3 that caused this issue a few days ago. at 15:15:38, nvidia modules are compiled and just after (15:15:44) |kernel-default|5.11.4-1.2| is removed. I'm no expert in how exactly this is supposed to work, but this log shows the exact ordering of install/uninstall of all kernel- package and the nvidia module compilation -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c59 --- Comment #59 from Michael Pujos <pujos.michael@gmail.com> --- And I also just updated to 5.11.6 without issue, so pretty certain it is a kernel version patch level issue. -- You are receiving this mail because: You are on the CC list for the bug.
# 2021-03-11 19:29:11 kernel-devel-5.3.18-lp152.66.2.noarch.rpm installed ok # Additional rpm output: # Changing symlink /usr/src/linux from linux-5.3.18-lp152.63 to
# Additional rpm output: # Changing symlink /usr/src/linux-obj/x86_64/default from ../../linux-5.3.18-lp152.63-obj/x86_64/default to ../../linux-5.3.18-lp152.66-obj/x86_64/default # <<<
2021-03-11 19:29:25|install|kernel-default-devel|5.3.18-lp152.66.2|x86_64||repo-update|b441cf0df1888d8f6d0d9c0065ada4eeed07f190540104bee04efda96f482ea5| <<< 2021-03-11 19:29:25|install|yast2-security|4.2.19-lp152.2.12.1|noarch||repo-update|c38b83c76852a3a481564af8495164bf55574253c7d24a1351f900c208ae2a16| {...] 2021-03-11 19:29:32|install|glibc-locale|2.26-lp152.26.6.1|x86_64||repo-update|91eb4247586fffb10446e894db6b25e29aeb1d14496e7e02e8a98b1d8ac50be1| 2021-03-11 19:33:40|command|root@gs3lnx|'zypper' 'update'|
# 2021-03-11 19:34:12 nvidia-gfxG05-kmp-default-460.56_k5.3.18_lp152.19-lp152.35.1.x86_64.rpm installed ok # Additional rpm output: # make: *** No rule to make target 'kernelrelease'. Stop. # make: Entering directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default' # make: *** No rule to make target 'modules'. Stop. # make: Leaving directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default' # /usr/src/kernel-modules/nvidia-460.56-default / # make[1]: *** /lib/modules//source: No such file or directory. Stop. # make: *** [Makefile:80: modules] Error 2 # / # rm: cannot remove '/lib/modules//updates/nvidia*.ko': No such file or
2021-03-11 19:35:05|install|kernel-default|5.3.18-lp152.66.2|x86_64||repo-update|5e59d75b409d042614573c01acef5fcdf2c05c3332f9ed75fdc58a09391bc540| <<< ################ It seems to me, that kernel-devel and kernel-default-devel are installed first,
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c60 --- Comment #60 from Guenter Stoehr <guenter.stoehr@gs-consult-ac.de> --- Sorry for writing another comment to this bug, but I have a problem and some questions. And I don't know whether the other bug is still active. I updated the PC of my son from >>> linux-5.3.18-lp152.63 to linux-5.3.18-lp152.66.<<< So this is not a patch-level-issue, as I suppose. Is that correct? But the zypper-log contains the trouble very clearly (extract from /var/log/zypp/history), as far as I can understand it: I marked the critical points with >>> <<< ################ 2021-03-11 19:29:02|command|root@gs3lnx|'zypper' 'update'| # 2021-03-11 19:29:02 ImageMagick-config-7-SUSE-7.0.7.34-lp152.12.12.1.x86_64.rpm installed ok # Additional rpm output: # update-alternatives: warning: forcing reinstallation of alternative /etc/ImageMagick-7-SUSE because link group ImageMagick-7 is broken # 2021-03-11 19:29:02|install|ImageMagick-config-7-SUSE|7.0.7.34-lp152.12.12.1|x86_64||repo-update|d1232473707f855d5799989fdcef875a7ffd33c10fe2152a10a4a9e89d9d4635| 2021-03-11 19:29:03|install|glibc|2.26-lp152.26.6.1|x86_64||repo-update|19d26c98080eedea689b4c4dacf5a78a3576e17fed2ee5cc793f2fcdbf069948| 2021-03-11 19:29:03|install|glibc-32bit|2.26-lp152.26.6.1|x86_64||repo-update|85c1d794a63b5ad712f9063b8229c4265fc1b2b97672af9814a447f15e2eee6d| 2021-03-11 19:29:04|install|kernel-macros|5.3.18-lp152.66.2|noarch||repo-update|69aba45ed532b7790e241572beaf932525f2bde457cab4322a34cbea1e5a5629| 2021-03-11 19:29:04|install|python3-bind|9.16.6-lp152.14.13.1|noarch||repo-update|2177f66e8880fa8568c8f1b9a96bea301b6881e2e8aabafbcd02049a64b97889| 2021-03-11 19:29:04|install|python3-six|1.14.0-lp152.4.3.1|noarch||repo-update|4e9ba0e615207528f5b49213c24036fab8a5a96748be895ecafed9dafd5e6c1d| 2021-03-11 19:29:04|install|yast2-logs|4.2.92-lp152.2.22.1|x86_64||repo-update|63455eb22c18f0b44bd6f55b7562471ee645a186ff8f422607c57e3eeb8d4dea| 2021-03-11 19:29:04|install|glibc-extra|2.26-lp152.26.6.1|x86_64||repo-update|bcb9faa40f8e294321f73dbc9364f2b284c2ba729476993770b34dfcf8a709a9| 2021-03-11 19:29:04|install|libavahi-common3-32bit|0.7-lp152.3.6.1|x86_64||repo-update|a0c70d299c7665334985664c88fda902cc7c33bab292ab7448c28e40d8d551ce| linux-5.3.18-lp152.66 # 2021-03-11 19:29:11|install|kernel-devel|5.3.18-lp152.66.2|noarch||repo-update|59c6fde99dcecd80e8062de952a51a79dee9a6cc927b1dc00ab927d3c71151ca| <<< # 2021-03-11 19:29:11 yast2-4.2.92-lp152.2.22.1.x86_64.rpm installed ok # Additional rpm output: # Updating /etc/sysconfig/yast2 ... # linux-5.3.18-lp152.63 2021-03-11 19:29:11|install|yast2|4.2.92-lp152.2.22.1|x86_64||repo-update|fb1d0545ead88acac0b882ff1f18cb234e242fd5b434c4b8e8fa968f997970b6| [...] 2021-03-11 19:29:24|install|libavahi-client3-32bit|0.7-lp152.3.6.1|x86_64||repo-update|64f11eeee8405f32e029b912213781de2b19daaad2a7777fee2c7cf9fedc0c64| # 2021-03-11 19:29:25 kernel-default-devel-5.3.18-lp152.66.2.x86_64.rpm installed ok directory # install: cannot stat '/usr/src/kernel-modules/nvidia-460.56-default/nvidia*.ko': No such file or directory # depmod: WARNING: could not open modules.order at /lib/modules/5.3.18-lp152.19-default: No such file or directory # depmod: WARNING: could not open modules.builtin at /lib/modules/5.3.18-lp152.19-default: No such file or directory # EFI variables are not supported on this system # # Modprobe blacklist files have been created at /etc/modprobe.d to prevent Nouveau from loading. This can be reverted by deleting /etc/modprobe.d/nvidia-*.conf. # # *** Reboot your computer and verify that the NVIDIA graphics driver can be loaded. *** # # grep: /etc/sysconfig/kernel: No such file or directory # 2021-03-11 19:34:12|install|nvidia-gfxG05-kmp-default|460.56_k5.3.18_lp152.19-lp152.35.1|x86_64||download.nvidia.com-leap|616e8a54d9b5664bcb2b1e92327caa837ceef11a09ea9e081956d46a2dadd97d| # 2021-03-11 19:34:14 nvidia-computeG05-460.56-lp152.35.1.x86_64.rpm installed ok # Additional rpm output: # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked. <<< [...] then the nvidia-drivers and kernel-default is installed later. And the installation of nvidia fails and those files in usr/lib64 are empty. What are those files good for? How can I fix it? Is it sufficient to re-install those libs? You can find the whole zypper-history of that day as attachment to comment #46 . And I have another Leap-15.2-system to update. Same configuration, that means linux-5.3.18-lp152.63 and nvidia-graphiccard. I started the update and stopped it, when zypper has told, what it was intented to update: ================================================ Die folgenden 4 NEUEN Pakete werden installiert: kernel-default 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-default-devel 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-devel 5.3.18-lp152.66.2 noarch Main Update Repository openSUSE ================================================ Is that the right order to install those packages? Or do I have to expect teh same trouble again? Thank you for some answers in advance. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c61 --- Comment #61 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Michael Pujos from comment #58)
Created attachment 847362 [details] lizypp logs for 5.11.4-1.2 to 5.11.4-1.3 kernel update
I've extracted the libzypp logs for my update from 5.11.4-1.2 to 5.11.4-1.3 that caused this issue a few days ago.
at 15:15:38, nvidia modules are compiled and just after (15:15:44) |kernel-default|5.11.4-1.2| is removed.
I'm no expert in how exactly this is supposed to work, but this log shows the exact ordering of install/uninstall of all kernel- package and the nvidia module compilation
This log shows me that kernel-default-5.11.4-1.2 gets removed 2021-03-15 15:15:44|remove |kernel-default|5.11.4-1.2|x86_64|| but no kernel-default-5.11.4-1.3 is being installed. This explains why just created nvidia modules are removed again right after installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c62 --- Comment #62 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Guenter Stoehr from comment #60)
Sorry for writing another comment to this bug, but I have a problem and some questions. And I don't know whether the other bug is still active. I updated the PC of my son from >>> linux-5.3.18-lp152.63 to linux-5.3.18-lp152.66.<<< So this is not a patch-level-issue, as I suppose. Is that correct?
No, this also looks like a patch-level issue. patch lp152.63 vs. lp 152.66. But in the logs you attached I see an update to lp152.62
19:29:04|install|kernel-macros|5.3.18-lp152.66.2|noarch||repo- update|69aba45ed532b7790e241572beaf932525f2bde457cab4322a34cbea1e5a5629| [...] # 2021-03-11 19:29:11 kernel-devel-5.3.18-lp152.66.2.noarch.rpm installed ok # Additional rpm output: # Changing symlink /usr/src/linux from linux-5.3.18-lp152.63 to linux-5.3.18-lp152.66 [...] # 2021-03-11 19:29:25 kernel-default-devel-5.3.18-lp152.66.2.x86_64.rpm installed ok
# Additional rpm output: # Changing symlink /usr/src/linux-obj/x86_64/default from ../../linux-5.3.18-lp152.63-obj/x86_64/default to ../../linux-5.3.18-lp152.66-obj/x86_64/default 2021-03-11 nvidia-gfxG05-kmp-default-460.56_k5.3.18_lp152.19-lp152.35.1.x86_64.rpm installed ok # Additional rpm output: # make: *** No rule to make target 'kernelrelease'. Stop. # make: Entering directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default'
Hmm. Seems installation of kernel-default-devel-5.3.18-lp152.66.2.x86_64.rpm failed if make -sC /usr/src/linux-obj/x86_64/default kernelrelease doesn't work. I suggest to reinstall kernel-default-devel package.
# Additional rpm output: # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked.
This is unrelated to this ticket.
It seems to me, that kernel-devel and kernel-default-devel are installed first, then the nvidia-drivers and kernel-default is installed later.
Kind of, yes.
And the installation of nvidia fails
Yes. See above.
and those files in usr/lib64 are empty. What are those files good for? How can I fix it? Is it sufficient to re-install those libs?
That's unrelated to this ticket.
You can find the whole zypper-history of that day as attachment to comment #46
Ok. I remember.
And I have another Leap-15.2-system to update. Same configuration, that means linux-5.3.18-lp152.63 and nvidia-graphiccard. I started the update and stopped it, when zypper has told, what it was intented to update: ================================================ Die folgenden 4 NEUEN Pakete werden installiert: kernel-default 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-default-devel 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-devel 5.3.18-lp152.66.2 noarch Main Update Repository openSUSE ================================================ Is that the right order to install those packages? Or do I have to expect the same trouble again?
I don't know. In the worst case you need to reinstall nvidia-gfxG05-kmp-default or kernel-default-devel package. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c63 --- Comment #63 from Michael Pujos <pujos.michael@gmail.com> --- Here's the rest of the log just following that |remove |kernel-default|5.11.4-1.2 line (last line of attached log), up to where we see kernel-default|5.11.4-1.3 is installed, but this does not trigger a rebuild of the nvidia modules. 2021-03-15 15:15:44|install|libopenssl-1_1-devel|1.1.1j-2.1|x86_64||http-download.opensuse.org-ade0eb17|6f3d4ea741ac48158b40c3c2223fdb7d1346da3ea74f8c0bb61fc01fd5cdb68b| 2021-03-15 15:15:45|install|npm10|10.24.0-2.1|x86_64||http-download.opensuse.org-ade0eb17|303806cd668f0407ea4822fde38f1e8aebde063488662cee25dfc0d45ebd2204| 2021-03-15 15:15:46|install|rust|1.50.0-1.1|x86_64||http-download.opensuse.org-ade0eb17|5790f4557b4eab5102a1bb7c8a779934bfaf9e9927d91d273601824ca4830592| 2021-03-15 15:15:46|install|python38-pycurl|7.43.0.6-2.1|x86_64||http-download.opensuse.org-ade0eb17|6c1d4370f5a50d14d4dea06f67210ecee462fbb51eaf8b2de3ae024be744c44f| 2021-03-15 15:15:47|install|libcurl-devel|7.75.0-1.1|x86_64||http-download.opensuse.org-ade0eb17|c01a9056d7f13bccbc4cf3e7c0e3bdc921629e719a7a6d97b38f4cea50703146| 2021-03-15 15:15:47|install|git-core|2.30.2-1.1|x86_64||http-download.opensuse.org-ade0eb17|31e0a6b16da948467e078b6fa9e10d1cc3ee8c12551bc253140a5b965f5cb940| 2021-03-15 15:15:47|install|curl|7.75.0-1.1|x86_64||http-download.opensuse.org-ade0eb17|6afb3a9f22da80bb34fe4e237a413cc6ad918753de966613b9890bb2c9739f9a| 2021-03-15 15:15:47|install|pulseaudio-module-zeroconf|14.2-3.1|x86_64||http-download.opensuse.org-ade0eb17|5a9df3a75af295d438b7e7f37f771d930249cd22d778a406822cef4dbb246326| 2021-03-15 15:15:47|install|pulseaudio-module-gconf|14.2-3.1|x86_64||http-download.opensuse.org-ade0eb17|e67d781b28723ea732189ac52366bf8010884b0f73f5000e03dc7f251cbb565d| 2021-03-15 15:15:47|install|pulseaudio-module-bluetooth|14.2-3.1|x86_64||http-download.opensuse.org-ade0eb17|0ecd0a3bba8445dcf1ee109d5a42772dcb00e41918ca96c85ff93fb625db3468| 2021-03-15 15:15:48|install|pulseaudio-lang|14.2-3.1|noarch||http-download.opensuse.org-ade0eb17|8af1d485c65c0aa0b483c7958a348a3fceec9f7afdae569c92c9fba8dfb98f8b| 2021-03-15 15:15:48|install|pulseaudio-gdm-hooks|14.2-3.1|x86_64||http-download.opensuse.org-ade0eb17|2d9aa2f5e0563f0fd8f09befc74c943a1b56cec2e290a2085f3510da6cb76835| 2021-03-15 15:15:48|install|pulseaudio-utils|14.2-3.1|x86_64||http-download.opensuse.org-ade0eb17|692b6473000c5c5ae4259ddc21bf06d4ca3a2fc58d02515b5b7117fb49f7d18a| 2021-03-15 15:15:48|install|qemu-ui-spice-core|5.2.0-10.2|x86_64||http-download.opensuse.org-ade0eb17|60822f6f809945a651eb444ff87ba485dba02168011e085817a7937a846f63d4| 2021-03-15 15:15:48|install|nano-lang|5.6.1-1.1|noarch||http-download.opensuse.org-ade0eb17|d27f324be44e15558419434238b336ea748fdb2ce5eb8c08b8eafe3018749f07| 2021-03-15 15:15:48|install|kernel-syms|5.11.4-1.3|x86_64||http-download.opensuse.org-ade0eb17|5ed22f7ab2fd92cab01455afe73e89d2f57256673ac3459360498f34a045b9d0| 2021-03-15 15:15:48|install|snapper-zypp-plugin|0.8.15-3.1|x86_64||http-download.opensuse.org-ade0eb17|f64c00f4f0b613675da798d7775b6516299e63c237d41c3236ad3ec39ebd5a63| 2021-03-15 15:16:31|install|kernel-default|5.11.4-1.3|x86_64||http-download.opensuse.org-ade0eb17|d1fa5e4c267fe3bb23f02258b940b4b6060f873ab480cee7a93c131258653abb| ... ... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 Paolo Stivanin <pstivanin@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC|pstivanin@suse.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c64 --- Comment #64 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to B from comment #57)
(In reply to Stefan Dirsch from comment #56)
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3"
So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted.
Puh. If this is true, it would mean zypper would handle package updates different than RPM. I verified with rpm -vv that things are done in the right order during kernel-default Update and nvidia modules weren't removed afterwards when updating kernel-default 5.11.4-1.2 -> 5.11.4.-1.3. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c65 --- Comment #65 from Stefan Dirsch <sndirsch@suse.com> --- Indeed. 2021-03-15 15:15:44|remove |kernel-default|5.11.4-1.2|x86_64|| [...] 2021-03-15 15:16:31|install|kernel-default|5.11.4-1.3|x86_64||http-download.opensuse.org-ade0eb17|d1fa5e4c267fe3bb23f02258b940b4b6060f873ab480cee7a93c131258653abb| ... Oh. Well. This means I can forget about cleaning up of nvidia modules and need to revert what I implemented for boo#1164520. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c66 --- Comment #66 from B <kerossin@pm.me> --- (In reply to Stefan Dirsch from comment #64)
(In reply to B from comment #57)
(In reply to Stefan Dirsch from comment #56)
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3"
So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted.
Puh. If this is true, it would mean zypper would handle package updates different than RPM. I verified with rpm -vv that things are done in the right order during kernel-default Update and nvidia modules weren't removed afterwards when updating kernel-default 5.11.4-1.2 -> 5.11.4.-1.3.
I looked up that zypper also has extra flags for more verbosity
-v, --verbose Increase verbosity. For debugging output specify this option twice.
I'll try to do some installations/updates with zypper -vv, maybe it will show what's happening more precisely. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c67 --- Comment #67 from B <kerossin@pm.me> --- zypper -vv didn't really give more information on how those installations are handled, it just had more info on repo related stuff.
sudo zypper -vv in --oldpackage kernel-default-5.11.4-1.3.x86_64.rpm
Parts of output: The following NEW package is going to be installed: kernel-default 5.11.4-1.3 x86_64 Plain RPM files cache openSUSE The following package is going to be REMOVED: kernel-default 5.11.4-1.2 x86_64 openSUSE The following package requires a system reboot: kernel-default 5.11.4-1.3 x86_64 Plain RPM files cache openSUSE 1 new package to install, 1 to remove. Checking for file conflicts: ...................................................................................................................................................................................................................................[done] (1/2) Removing kernel-default-5.11.4-1.2.x86_64 ................................................................................................................................................................................................................[done] (2/2) Installing: kernel-default-5.11.4-1.3.x86_64 .............................................................................................................................................................................................................[done] CommitResult (total 2, done 2, error 0, skipped 0, updateMessages 0) The summary and the progress shows it happening in 2 steps: 1) remove old, 2) install new. Compared to regular package updates:
sudo zypper dup
The following 2 packages are going to be upgraded: iscan iscan-data Checking for file conflicts: ...................................................................................................................................................................................................................................[done] (1/2) Installing: iscan-data-1.39.1-5.30.noarch ................................................................................................................................................................................................................[done] (2/2) Installing: iscan-2.30.4-5.30.x86_64 .....................................................................................................................................................................................................................[done] -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c68 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #68 from Stefan Dirsch <sndirsch@suse.com> --- Fixed this issue now. Should fix this issue for TW and Leap. ------------------------------------------------------------------- Thu Mar 18 15:07:02 UTC 2021 - Stefan Dirsch <sndirsch@suse.com> - Unfortunately removing no longer used kernel modules via %triggerpostun doesn't work since kernel updates are not considered "atomar" when using YaST/zypper (only safe when using rpm) [boo#1182666, boo#1164520] I hope there will be still a driver update this week. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c69 --- Comment #69 from B <kerossin@pm.me> --- (In reply to Stefan Dirsch from comment #68)
Fixed this issue now. Should fix this issue for TW and Leap.
------------------------------------------------------------------- Thu Mar 18 15:07:02 UTC 2021 - Stefan Dirsch <sndirsch@suse.com>
- Unfortunately removing no longer used kernel modules via %triggerpostun doesn't work since kernel updates are not considered "atomar" when using YaST/zypper (only safe when using rpm) [boo#1182666, boo#1164520]
I hope there will be still a driver update this week.
I created a kernel bug report 1183739, maybe the kernel maintainers could make so that kernels are installed into their own directories or propose some other fixes because never removing those modules is not ideal. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c70 --- Comment #70 from Stefan Dirsch <sndirsch@suse.com> --- Thanks, but since you've mentioned nvidia in there it will be reassigned to me immediately without asking. :-( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c71 Sebastian Turza��ski <dpbasti@wp.pl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dpbasti@wp.pl --- Comment #71 from Sebastian Turza��ski <dpbasti@wp.pl> --- I think I may have similar issue kernel: nvidia: disagrees about version of symbol module_layout this happened after June2021 updates I thought I had 2 kernel versions installed but then made sure only 1 was present and forced reinstall of nvidia but still the same error my kernel packages : sudo rpm -qa |grep kernel kernel-firmware-qcom-20210503-1.2.noarch kernel-firmware-network-20210503-1.2.noarch kernel-firmware-radeon-20210503-1.2.noarch kernel-firmware-ath11k-20210503-1.2.noarch kernel-firmware-realtek-20210503-1.2.noarch kernel-firmware-sound-20210503-1.2.noarch kernel-firmware-usb-network-20210503-1.2.noarch kernel-firmware-qlogic-20210503-1.2.noarch kernel-firmware-chelsio-20210503-1.2.noarch kernel-firmware-bnx2-20210503-1.2.noarch kernel-firmware-all-20210503-1.2.noarch kernel-firmware-ti-20210503-1.2.noarch kernel-firmware-marvell-20210503-1.2.noarch kernel-firmware-atheros-20210503-1.2.noarch kernel-firmware-liquidio-20210503-1.2.noarch kernel-firmware-bluetooth-20210503-1.2.noarch kernel-firmware-mediatek-20210503-1.2.noarch kernel-firmware-serial-20210503-1.2.noarch kernel-firmware-intel-20210503-1.2.noarch kernel-firmware-nfp-20210503-1.2.noarch kernel-firmware-nvidia-20210503-1.2.noarch kernel-firmware-mellanox-20210503-1.2.noarch kernel-firmware-dpaa2-20210503-1.2.noarch kernel-firmware-iwlwifi-20210503-1.2.noarch kernel-firmware-amdgpu-20210503-1.2.noarch purge-kernels-service-0-8.1.noarch kernel-firmware-i915-20210503-1.2.noarch kernel-firmware-ueagle-20210503-1.2.noarch kernel-firmware-brcm-20210503-1.2.noarch kernel-firmware-mwifiex-20210503-1.2.noarch kernel-firmware-media-20210503-1.2.noarch kernel-firmware-ath10k-20210503-1.2.noarch kernel-firmware-platform-20210503-1.2.noarch kernel-syms-5.12.9-1.1.x86_64 kernel-firmware-prestera-20210503-1.2.noarch kernel-devel-5.12.9-1.1.noarch kernel-default-devel-5.12.9-1.1.x86_64 kernel-source-5.12.9-1.1.noarch kernel-default-5.12.9-1.1.x86_64 kernel-macros-5.12.9-1.1.noarch -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com