[Bug 1182666] New: TW 20210222 - Update broken for NVIDIA
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666 Bug ID: 1182666 Summary: TW 20210222 - Update broken for NVIDIA Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Major Priority: P5 - None Component: X11 3rd Party Driver Assignee: gfx-bugs@suse.de Reporter: axel.braun@gmx.de QA Contact: sndirsch@suse.com Found By: --- Blocker: --- upgrading 20210220 to 20210222 fails and ends in terminal window more /var/log/Xorg.0.log | grep EE (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 69.552] (EE) Failed to load module "intel" (module does not exist, 0) [ 69.558] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.558] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.558] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.563] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.563] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.563] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.568] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.568] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.568] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.573] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 69.573] (EE) NVIDIA: system's kernel log for additional error messages and [ 69.573] (EE) NVIDIA: consult the NVIDIA README for details. [ 69.573] (EE) No devices detected. [ 69.573] (EE) [ 69.573] (EE) no screens found(EE) [ 69.573] (EE) [ 69.573] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 69.573] (EE) [ 69.591] (EE) Server terminated with error (1). Closing log file. journalctl -xb | grep nvidia Feb 24 12:08:45 X1E kernel: audit: type=1400 audit(1614164925.078:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=552 comm="app armor_parser" Feb 24 12:08:45 X1E kernel: audit: type=1400 audit(1614164925.078:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=552 com m="apparmor_parser" Feb 24 12:08:55 X1E suse-prime[1323]: Boot: setting-up nvidia card Feb 24 12:08:56 X1E suse-prime[1392]: trying switch ON nvidia: [bbswitch] NVIDIA card is ON Feb 24 12:08:56 X1E prime-select[1394]: modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/5.10.16-1-default -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c1
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c2
Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c3
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c4
Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c5
Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c6
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c7
--- Comment #7 from Stefan Dirsch
Created attachment 846461 [details] installation log
Looks like kernel module rebuild was successful. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c8
Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c9
--- Comment #9 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c10
--- Comment #10 from Axel Braun
You're using
kernel 5.10.16-1-default
Old kernel modules are still available in 5.10.14.-1-default directory. weak-updates symlinks are available for 5.10.16-1-default but are symlinking to non-existing modules in 5.10.9-1-default.
Yes, seen this as well, but question is - how comes? You have seen the installation log, where it compiled the kernels, but obviously did not put it into the right place -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c11
--- Comment #11 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c12
--- Comment #12 from Stefan Dirsch
Yes, seen this as well, but question is - how comes? You have seen the installation log, where it compiled the kernels, but obviously did not put it into the right place
Or removed it right again. See comment #11. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c13
--- Comment #13 from Axel Braun
Maybe checking for /lib/modules/5.10.16-1-default/kernel no longer works and this file doesn't exist although 5.10.16-1-default is the currently installed and running kernel.
The directory /lib/modules/5.10.16-1-default/kernel exists and looks OK. Surprisingly there is still a bunch of modules from the day onwards where I set-up the system (starting at 5.5.x kernels...) - this is not automatically cleaned up? Anything else I can check/lookup/try? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c14
--- Comment #14 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c15
Michael Hirmke
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c16
--- Comment #16 from Michael Hirmke
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c17
--- Comment #17 from Stefan Dirsch
Not sure if this is the same problem, but: On Leap 15.2 I actually have kernel 5.3.18-lp152.63-default. When adding the NVidia repo and installing nvidia-glG05, the files can be found in 5.3.18-lp152.19-default/updates afterwards
This is correct behaviour.
- and nothing was copied or linked to 5.3.18-lp152.63-default.
That's an issue. weak-updates(2) should have created symlinks from 5.3.18-lp152.63-default/weak-updates to 5.3.18-lp152.19-default/updates .
So no module is loaded on reboot. After manually copying or linking the modules to 5.3.18-lp152.63-default/updates and running "depmod -a" the modules are loaded on nex reboot.
Yes, good workaround. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c18
--- Comment #18 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c19
--- Comment #19 from Michael Hirmke
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c20
--- Comment #20 from Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c21
--- Comment #21 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c22
--- Comment #22 from Axel Braun
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c23
--- Comment #23 from Michael Hirmke
Ok. So the behaviour of nvidia packages is fine. As mentioned before TW doesn't use weak-updates mechanism. Just updating the kernel alsne doesn't work as expected for you (TW) and for Michael (Leap) (but it does for me (TW) and apparently also for others (TW and Leap IIRC) as I've seen on the factory ML).
Guess I found the reason. After installing again, I saw: Warning: /lib/modules/5.3.18-lp152.63-default is inconsistent Warning: weak-updates symlinks might not be created Probably this happens, because I alredy have an update directory in it, which contains drivers for my DV card. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c24
--- Comment #24 from Michael Hirmke
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c25
--- Comment #25 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c26
--- Comment #26 from Michael Hirmke
Ok. But I did not understand where your DV card modules are located exactly.
You should not copy anything below weak-updates dir. The KMP or the kernel itself creates symlinks below this directory to compatible modules in different kernel module trees.
I copied them to /lib/modules/<ver>/updates/media/... with <ver> being the actual kernel version. Btw. - it should read DVB card. This is a TechnoTrend S2-6400, where no official drivers are available. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c27
--- Comment #27 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c29
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c30
B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c31
--- Comment #31 from B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c32
--- Comment #32 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c33
Michael Pujos
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c34
--- Comment #34 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c35
--- Comment #35 from B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c36
--- Comment #36 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c37
--- Comment #37 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c38
--- Comment #38 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c39
--- Comment #39 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c40
--- Comment #40 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c41
--- Comment #41 from B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c42
Paolo Stivanin
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c43
--- Comment #43 from Stefan Dirsch
I did some testing myself, removed all kernel 5.11.4 packages, installed 5.11.4-1.2 from RPMs then installed 1.3 from repo but couldn't reproduce the problem.
One observation I made is that when kernel-default is installed the broken links in weak-updates/updates/ are generated, then later when kernel-default-devel is being installed those links are removed and the proper modules are generated in updates/ and it looks like the packages are always installed in the same correct order: 1. kernel-default 2. kernel-devel 3. kernel-default-devel so cause from random installation order doesn't seem possible. There seems to be some other weird fairly rare fault during kernel-default-devel installation that happens.
Actually this makes sense. The rebuild of the nvidia kernel module on TW is triggered by an update of kernel-default-devel and at this time broken weak-updates symlinks are removed as well. # get rid of broken weak-updates symlinks created in some %post apparently; # either by kmp itself or by kernel package update for i in $(find /lib/modules/*/weak-updates -type l 2> /dev/null); do test -e $(readlink -f $i) || rm $i done [...] (build and install kernel module) So it seems users are updating kernel-default, but not kernel-default-devel. Not sure why. kernel-default-devel is simply needed to (re)build the kernel module. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c44
--- Comment #44 from Axel Braun
So it seems users are updating kernel-default, but not kernel-default-devel.
Hm, zypper dup should resolve/force this. Maybe it does not? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c45
--- Comment #45 from B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c46
--- Comment #46 from Guenter Stoehr
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c47
--- Comment #47 from B
rpm -q --whatrequires kernel-default no package requires kernel-default
rpm -q --whatrequires kernel-default-devel nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64 kernel-syms-5.11.4-1.2.x86_64 kernel-syms-5.11.4-1.3.x86_64
rpm -q --requires kernel-syms-5.11.4-1.3.x86_64 kernel-default-devel = 5.11.4-1 kernel-devel = 5.11.4-1
rpm -q --requires kernel-default-devel-5.11.4-1.3.x86_64 kernel-devel = 5.11.4-1
There seems to be no strict requirement for kernel-default to be installed before kernel-default-devel so it must've been a coincidence that the few times I looked at the order when installing it was correct. When kernel-default is uninstalled it cleans out /lib/modules/$kernel/updates/ dir where the modules are. So if there's a minor patch version bump 5.11.4-1.3 it first needs to remove the older package 5.11.4-1.2 so if you get unlucky: 1) kernel-default-devel-5.11.4-1.3 is installed 2) kernel-default-5.11.4-1.2 is removed 3) kernel-default-5.11.4-1.3 is installed you now are left without those extra kernel modules. Some possible fixes I see: 1) Modify the kernel-default uninstall script to not clean out /lib/modules/$kernel/updates/. This is probably a lazy and bad idea because it will junk when you normally uninstall kernels. 2) Modify the kernel-default uninstall script so that it somehow knows that it's not a regular uninstall happening but a minor patch upgrade and not delete those kernel modules. 3) Make it that minor patch updates (5.11.4-1.2->5.11.4-1.3) work like regular updates (for example 5.11.4->5.11.6) where kernels are not replaced but installed alongside each other and use different directories. Basically every idea involves modifications to the kernel packages, not Nvidia and I have no idea how easy it would be to implement those. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c48
--- Comment #48 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c49
--- Comment #49 from Stefan Dirsch
Thanks a lot for your input @B
! To me your arguments make 100% sense. I vote for
3) Make it that minor patch updates (5.11.4-1.2->5.11.4-1.3) work like regular updates (for example 5.11.4->5.11.6) where kernels are not replaced but installed alongside each other and use different directories.
I would call it broken to use the same /lib/modules/<kernel-version-omitting-patch-version> directory for different kernel patchversions - especially when you are supposed to install both at the same time and switch between for booting.
But then kernel versions (uname -r) would also need to be 5.11.4-1.2 instead of 5.11.4-1 for 5.11.4-1.2 RPMs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c50
--- Comment #50 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c51
--- Comment #51 from B
Hmm. I could not reproduce that /lib/modules/<kernel-version>/updates contents gets removed when updating from kernel-default 5.11.4-1.2 to 5.11.4-1.3.
Did you first upgrade kernel-default-devel and only then kernel-default on a separate zypper run? -- You are receiving this mail because: You are on the CC list for the bug.
sudo rpm -eh -vv kernel-default-5.11.4-1.3.x86_64 *** removed part of output *** D: %postun(kernel-default-5.11.4-1.3.x86_64): waitpid(18562) rc 18562 status 0 D: Plugin: calling hook scriptlet_post in syslog plugin D: %triggerpostun(nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64):
rpm -q --scripts nvidia-gfxG05-kmp-default-460.56_k5.10.16_1-35.1.x86_64
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c52
--- Comment #52 from B
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c53
--- Comment #53 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c54
--- Comment #54 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c55
--- Comment #55 from B
Hmm. I believe the behaviour is correct. You need to look for %triggerpostun script, not the %postun script.
My bad, this was probably my first time seriously looking into triggers/scripts of an RPM. But yes, this looks correct, I just wanted to find out for myself how it happens and to confirm that.
# rpm --triggers -q nvidia-gfxG05-kmp-default triggerpostun scriptlet (using /bin/sh) -- kernel-default for dir in $(find /lib/modules -mindepth 1 -maxdepth 1 -type d); do if [ ! -d $dir/kernel ]; then test -d $dir/updates && rm -f $dir/updates/nvidia*.ko fi done
modules in updates/ are only removed if there is no kernel/ subdir any longer, i.e. no modules are longer installed.
Also during an update %triggerpostun is the last which is being executed.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#orderi...
At that time there should exist a /lib/modules/<kernel-version>/kernel/ directory.
Now this is where I think the problem is. In this particular situation - a minor kernel version bump seems to be handled neither like a regular kernel update and neither like a normal package update. This "update" is basically a normal uninstall of the old version, there's no /lib/modules/$kernel/kernel/ directory anymore because it belongs to kernel-default package, the triggerpostun then deletes those Nvidia modules, a normal installation of the new kernel-default begins. So I think were back to the same solutions where something should be done from the kernel packaging side. I think potentially it's not just a problem with Nvidia drivers but other packages that handle modules in a similar fashion could be affected. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c56
--- Comment #56 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c57
--- Comment #57 from B
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3" So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c58
--- Comment #58 from Michael Pujos
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c59
--- Comment #59 from Michael Pujos
# 2021-03-11 19:29:11 kernel-devel-5.3.18-lp152.66.2.noarch.rpm installed ok # Additional rpm output: # Changing symlink /usr/src/linux from linux-5.3.18-lp152.63 to
# Additional rpm output: # Changing symlink /usr/src/linux-obj/x86_64/default from ../../linux-5.3.18-lp152.63-obj/x86_64/default to ../../linux-5.3.18-lp152.66-obj/x86_64/default # <<<
2021-03-11 19:29:25|install|kernel-default-devel|5.3.18-lp152.66.2|x86_64||repo-update|b441cf0df1888d8f6d0d9c0065ada4eeed07f190540104bee04efda96f482ea5| <<< 2021-03-11 19:29:25|install|yast2-security|4.2.19-lp152.2.12.1|noarch||repo-update|c38b83c76852a3a481564af8495164bf55574253c7d24a1351f900c208ae2a16| {...] 2021-03-11 19:29:32|install|glibc-locale|2.26-lp152.26.6.1|x86_64||repo-update|91eb4247586fffb10446e894db6b25e29aeb1d14496e7e02e8a98b1d8ac50be1| 2021-03-11 19:33:40|command|root@gs3lnx|'zypper' 'update'|
# 2021-03-11 19:34:12 nvidia-gfxG05-kmp-default-460.56_k5.3.18_lp152.19-lp152.35.1.x86_64.rpm installed ok # Additional rpm output: # make: *** No rule to make target 'kernelrelease'. Stop. # make: Entering directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default' # make: *** No rule to make target 'modules'. Stop. # make: Leaving directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default' # /usr/src/kernel-modules/nvidia-460.56-default / # make[1]: *** /lib/modules//source: No such file or directory. Stop. # make: *** [Makefile:80: modules] Error 2 # / # rm: cannot remove '/lib/modules//updates/nvidia*.ko': No such file or
2021-03-11 19:35:05|install|kernel-default|5.3.18-lp152.66.2|x86_64||repo-update|5e59d75b409d042614573c01acef5fcdf2c05c3332f9ed75fdc58a09391bc540| <<< ################ It seems to me, that kernel-devel and kernel-default-devel are installed first,
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c60
--- Comment #60 from Guenter Stoehr
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c61
--- Comment #61 from Stefan Dirsch
Created attachment 847362 [details] lizypp logs for 5.11.4-1.2 to 5.11.4-1.3 kernel update
I've extracted the libzypp logs for my update from 5.11.4-1.2 to 5.11.4-1.3 that caused this issue a few days ago.
at 15:15:38, nvidia modules are compiled and just after (15:15:44) |kernel-default|5.11.4-1.2| is removed.
I'm no expert in how exactly this is supposed to work, but this log shows the exact ordering of install/uninstall of all kernel- package and the nvidia module compilation
This log shows me that kernel-default-5.11.4-1.2 gets removed 2021-03-15 15:15:44|remove |kernel-default|5.11.4-1.2|x86_64|| but no kernel-default-5.11.4-1.3 is being installed. This explains why just created nvidia modules are removed again right after installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c62
--- Comment #62 from Stefan Dirsch
Sorry for writing another comment to this bug, but I have a problem and some questions. And I don't know whether the other bug is still active. I updated the PC of my son from >>> linux-5.3.18-lp152.63 to linux-5.3.18-lp152.66.<<< So this is not a patch-level-issue, as I suppose. Is that correct?
No, this also looks like a patch-level issue. patch lp152.63 vs. lp 152.66. But in the logs you attached I see an update to lp152.62
19:29:04|install|kernel-macros|5.3.18-lp152.66.2|noarch||repo- update|69aba45ed532b7790e241572beaf932525f2bde457cab4322a34cbea1e5a5629| [...] # 2021-03-11 19:29:11 kernel-devel-5.3.18-lp152.66.2.noarch.rpm installed ok # Additional rpm output: # Changing symlink /usr/src/linux from linux-5.3.18-lp152.63 to linux-5.3.18-lp152.66 [...] # 2021-03-11 19:29:25 kernel-default-devel-5.3.18-lp152.66.2.x86_64.rpm installed ok
# Additional rpm output: # Changing symlink /usr/src/linux-obj/x86_64/default from ../../linux-5.3.18-lp152.63-obj/x86_64/default to ../../linux-5.3.18-lp152.66-obj/x86_64/default 2021-03-11 nvidia-gfxG05-kmp-default-460.56_k5.3.18_lp152.19-lp152.35.1.x86_64.rpm installed ok # Additional rpm output: # make: *** No rule to make target 'kernelrelease'. Stop. # make: Entering directory '/usr/src/linux-5.3.18-lp152.66-obj/x86_64/default'
Hmm. Seems installation of kernel-default-devel-5.3.18-lp152.66.2.x86_64.rpm failed if make -sC /usr/src/linux-obj/x86_64/default kernelrelease doesn't work. I suggest to reinstall kernel-default-devel package.
# Additional rpm output: # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37.49.9 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinstui.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libwebkit2gtk-4.0.so.37 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libkfontinst.so.5.18.6 is empty, not checked. # /sbin/ldconfig: File /usr/lib64/libjavascriptcoregtk-4.0.so.18.17.13 is empty, not checked.
This is unrelated to this ticket.
It seems to me, that kernel-devel and kernel-default-devel are installed first, then the nvidia-drivers and kernel-default is installed later.
Kind of, yes.
And the installation of nvidia fails
Yes. See above.
and those files in usr/lib64 are empty. What are those files good for? How can I fix it? Is it sufficient to re-install those libs?
That's unrelated to this ticket.
You can find the whole zypper-history of that day as attachment to comment #46
Ok. I remember.
And I have another Leap-15.2-system to update. Same configuration, that means linux-5.3.18-lp152.63 and nvidia-graphiccard. I started the update and stopped it, when zypper has told, what it was intented to update: ================================================ Die folgenden 4 NEUEN Pakete werden installiert: kernel-default 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-default-devel 5.3.18-lp152.66.2 x86_64 Main Update Repository openSUSE kernel-devel 5.3.18-lp152.66.2 noarch Main Update Repository openSUSE ================================================ Is that the right order to install those packages? Or do I have to expect the same trouble again?
I don't know. In the worst case you need to reinstall nvidia-gfxG05-kmp-default or kernel-default-devel package. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c63
--- Comment #63 from Michael Pujos
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
Paolo Stivanin
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c64
--- Comment #64 from Stefan Dirsch
(In reply to Stefan Dirsch from comment #56)
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3"
So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted.
Puh. If this is true, it would mean zypper would handle package updates different than RPM. I verified with rpm -vv that things are done in the right order during kernel-default Update and nvidia modules weren't removed afterwards when updating kernel-default 5.11.4-1.2 -> 5.11.4.-1.3. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c65
--- Comment #65 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c66
--- Comment #66 from B
(In reply to B from comment #57)
(In reply to Stefan Dirsch from comment #56)
No, that's just a package update. First the files of the new package are installed, then the files of the old package which are not part of the new package are uninstalled. At last the %triggerpostun of nvidia-gfxG05-kmp-default is running. At that point there is kernel/ subdir. Check this out.
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ #ordering
Yes, I read the link but I'm saying is that I observed it differently. When you run zypper dup the summary for package updates usually says "The following X packages are going to be upgraded: ..." but with the kernel-default 5.11.4-1.2 -> 5.11.4.-1.3 it was: "The following packages will be removed: kernel-default-5.11.4-1.2" "The following packages will be installed: kernel-default-5.11.4-1.3"
So in this specific situation that ordering doesn't apply, a package gets removed, a new one installed and that's how those Nvidia modules in /lib/modules/$kernel/updates/ get deleted.
Puh. If this is true, it would mean zypper would handle package updates different than RPM. I verified with rpm -vv that things are done in the right order during kernel-default Update and nvidia modules weren't removed afterwards when updating kernel-default 5.11.4-1.2 -> 5.11.4.-1.3.
I looked up that zypper also has extra flags for more verbosity
-v, --verbose Increase verbosity. For debugging output specify this option twice.
I'll try to do some installations/updates with zypper -vv, maybe it will show what's happening more precisely. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c67
--- Comment #67 from B
sudo zypper -vv in --oldpackage kernel-default-5.11.4-1.3.x86_64.rpm
Parts of output: The following NEW package is going to be installed: kernel-default 5.11.4-1.3 x86_64 Plain RPM files cache openSUSE The following package is going to be REMOVED: kernel-default 5.11.4-1.2 x86_64 openSUSE The following package requires a system reboot: kernel-default 5.11.4-1.3 x86_64 Plain RPM files cache openSUSE 1 new package to install, 1 to remove. Checking for file conflicts: ...................................................................................................................................................................................................................................[done] (1/2) Removing kernel-default-5.11.4-1.2.x86_64 ................................................................................................................................................................................................................[done] (2/2) Installing: kernel-default-5.11.4-1.3.x86_64 .............................................................................................................................................................................................................[done] CommitResult (total 2, done 2, error 0, skipped 0, updateMessages 0) The summary and the progress shows it happening in 2 steps: 1) remove old, 2) install new. Compared to regular package updates:
sudo zypper dup
The following 2 packages are going to be upgraded: iscan iscan-data Checking for file conflicts: ...................................................................................................................................................................................................................................[done] (1/2) Installing: iscan-data-1.39.1-5.30.noarch ................................................................................................................................................................................................................[done] (2/2) Installing: iscan-2.30.4-5.30.x86_64 .....................................................................................................................................................................................................................[done] -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c68
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c69
--- Comment #69 from B
Fixed this issue now. Should fix this issue for TW and Leap.
------------------------------------------------------------------- Thu Mar 18 15:07:02 UTC 2021 - Stefan Dirsch
- Unfortunately removing no longer used kernel modules via %triggerpostun doesn't work since kernel updates are not considered "atomar" when using YaST/zypper (only safe when using rpm) [boo#1182666, boo#1164520]
I hope there will be still a driver update this week.
I created a kernel bug report 1183739, maybe the kernel maintainers could make so that kernels are installed into their own directories or propose some other fixes because never removing those modules is not ideal. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c70
--- Comment #70 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666
http://bugzilla.opensuse.org/show_bug.cgi?id=1182666#c71
Sebastian Turza��ski
participants (1)
-
bugzilla_noreply@suse.com