[Bug 1082704] New: compiling nvidia kernel modules uses wrong kernel tree?
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 Bug ID: 1082704 Summary: compiling nvidia kernel modules uses wrong kernel tree? Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: X11 3rd Party Driver Assignee: xorg-maintainer-bugs@forge.provo.novell.com Reporter: P.Suetterlin@royac.iac.es QA Contact: sndirsch@suse.com Found By: --- Blocker: --- My desktop has a NVidia GTX 1060, I use the proprietary driver from the nvidia repo and recent Tumbleweed. The drivers kernel modules are version is nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1.x86_64, the kernel (TW 20180221) is 4.15.4-1-default. For checking some other issue, I had rebooted to the previous kernel (4.15.2-1-default) today. The drivers wouldn't load. I have /lib/modules/4.15.2-1-default/updates/nvidia.ko /lib/modules/4.15.4-1-default/weak-updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko However, modinfo /lib/modules/4.15.2-1-default/updates/nvidia.ko filename: /lib/modules/4.15.2-1-default/updates/nvidia.ko alias: char-major-195-* version: 390.25 supported: external license: NVIDIA srcversion: B5B1CA3087B567ADFADC070 alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00* alias: pci:v000010DEd*sv*sd*bc03sc02i00* alias: pci:v000010DEd*sv*sd*bc03sc00i00* depends: ipmi_msghandler retpoline: Y name: nvidia vermagic: 4.15.4-1-default SMP preempt mod_unload modversions or strings /lib/modules/4.15.2-1-default/updates/nvidia.ko | grep 4\\.15 4.15.4-1H /usr/src/linux-4.15.4-1/include/linux/dma-mapping.h /usr/src/linux-4.15.4-1/include/linux/dma-mapping.h ...... Seems it was compiled using the 4.15.4 kernel tree, but installed in the 4.15.2 modules directory? No manual compiles etc., just using standard 'zypper dup' -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c3
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c4 Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c5 --- Comment #5 from Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c7 --- Comment #7 from Peter Sütterlin
My guess is, that you have various versions of kernel-default-devel, kernel-devel, kernel-source installed.
Yes, indeed. lux:~ # rpm -q kernel-default-devel kernel-devel kernel-default-devel-4.15.2-1.4.x86_64 kernel-default-devel-4.15.4-1.5.x86_64 kernel-devel-4.15.2-1.4.noarch kernel-devel-4.15.4-1.5.noarch nvidia-gfxG04-kmp-default requires kernel-default-devel, and this gets (also) updated with every new kernel version when running 'zypper dup'. So I'd assume every user of the nvidia repo would be in that situation?
I cannot investigate that issue without direct access to the system. I you want to investigate the issue yourself, you would need to run %post of nvidia-gfxG04-kmp-default manually. Please check out via
rpm --scripts -q nvidia-gfxG04-kmp-default
I had a look at those before, and also at the makefiles etc, but could so far not spot where it decides to use the latest devel version... going through it again now. There's no mantion of any update-alternatives though. The postinstall scriptlet only consists of ----- postinstall scriptlet (using /bin/sh): nvr=nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1 wm2=/usr/lib/module-init-tools/weak-modules2 if [ -x $wm2 ]; then INITRD_IN_POSTTRANS=1 /bin/bash -${-/e/} $wm2 --add-kmp $nvr fi ----- And running that does not compile anything, I think it only creates the links. So then the suspicion is it is the kernel-default-devel package. The zypper log shows that that one did actually compile the nvidia modules. I checked the postinstall of that one, but that only does the ..obj links. Still, the log has # 2018-02-22 23:24:25 kernel-default-devel-4.15.4-1.5.x86_64.rpm installed ok # Additional rpm output: # Changing symlink /usr/src/linux-obj/x86_64/default from ../../linux-4.15.2-1-obj/x86_64/default to ../../linux-4.15.4-1-obj/x86_64/default # /usr/src/kernel-modules/nvidia-390.25-default / # rm -f -r conftest # make[1]: Entering directory '/usr/src/linux-4.15.2-1' The 'Changing symlink' is from the post script. No idea why it starts compiling now. Does it call dkms or something? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c8
--- Comment #8 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c9 --- Comment #9 from Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c10
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c11 --- Comment #11 from Peter Sütterlin
Ok. You're right. I was testing on Leap 42.3. Indeed there we have the rebuild in %post and in the %triggers. On TW only in the %triggers.
one mystery solved :)
I also believe it's correct to install to /lib/modules/$kver/updates instead of the hardcoded path in the trigger scripts for TW.
I think it's wrong in both TW and Leap. kver=$(make -sC /usr/src/linux-obj/$arch/$flavor kernelrelease) will always point to the latest installed kernel. This *can* be the one the modules were compiled for originally. In that case $kver=4.4.76-1-$flavor (in your case). But if it's different, it would overwrite the original one with a 'broken' one. Likely the error doesn't show up in Leap, as that one doesn't really do big kernel jumps. In principle it's easy: If you compile against $kver, then also install in /lib/modules/$kver ..... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c12
--- Comment #12 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c13 --- Comment #13 from Peter Sütterlin
The reason the issue doesn't show up on Leap 42.3 is, because there we are kABI compatible. And without using the fixed tree weak-updates mechanism wouldn't work.
On TW we're no longer kABI compatible.
Ah! Then you wouldn't use/need it in TW at all...
Anyway, I've changed this now. But I haven't done any testing yet.
Mon Feb 26 16:22:07 UTC 2018 - sndirsch@suse.com
- rebuilded kernel modules in %trigger of TW packages should go to the tree against which the kernel module gets builded, not the hardcoded one during build of the package; introduced kmp-trigger.sh/kmp-trigger-old.sh script snippets for this based on kmp-post.sh/kmp-post-old.sh (boo#1082704)
--> obs://X11:Drivers:Video/nvidia-gfxG04
You can do a manually build, if you want (check the README file).
I had a look at the kmp.post.sh, looks fine to me. I had just manually compiled the 4.15.2-1-default version of the modules, based on the old TW script, changing $kver and the related directory names (linux-obj -> linux-${kver%$flavor}obj). Went fine, depmod $kver doesn't give errors. Thanks! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c14
--- Comment #14 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c15 --- Comment #15 from Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c16
--- Comment #16 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c17 --- Comment #17 from Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704 http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c18 --- Comment #18 from Peter Sütterlin
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c19
--- Comment #19 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704
http://bugzilla.opensuse.org/show_bug.cgi?id=1082704#c21
Sebastian Turza��ski
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com