[opensuse-factory] nvidia on tumbleweed
Nvidia repo is activated, nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1.x86_64 nvidia-computeG04-390.25-10.1.x86_64 nvidia-glG04-390.25-10.1.x86_64 x11-video-nvidiaG04-390.25-10.1.x86_64 are installed the module is build but trying to load it fails with : modprobe -v nvidia install PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then if /sbin/modprobe nvidia_uvm; then if [ ! -c /dev/nvidia-uvm ]; then mknod -m 660 /dev/nvidia-uvm c $(cat /proc/de vices | while read major device; do if [ "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0; chown :video /dev/nvidia-uvm; fi; fi; if [ ! -c /dev/nvidiactl ]; then mknod -m 660 /dev/nvidiactl c 195 255; chown :video /dev/nvidiactl; fi; devid=-1; for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor); if [ "$vendorid" == "0x10de" ]; then class=$(cat $dev/class); classid=${class%%00}; if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)); if [ ! -c /dev/nvidia${devid} ]; then mknod -m 660 /dev/nvidia${devid} c 195 ${devid}; chown :video /dev/nvidia${devid}; fi; fi; fi; done; /sbin/modprobe nvidia_drm; if [ ! -c /dev/nvidia-modeset ]; then mknod -m 660 /dev/nvidia-modeset c 195 254; chown :video /dev/nvidia-modeset; fi; fi NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 insmod /lib/modules/4.15.2-1-default/updates/nvidia.ko NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=33 NVreg_DeviceFileMode=0660 [ 519.347560] nvidia: disagrees about version of symbol kmem_cache_alloc_trace [ 519.348076] nvidia: Unknown symbol kmem_cache_alloc_trace (err -22) [ 519.348621] nvidia: disagrees about version of symbol kmem_cache_alloc [ 519.349138] nvidia: Unknown symbol kmem_cache_alloc (err -22) [ 519.349726] nvidia: disagrees about version of symbol kmem_cache_free [ 519.350334] nvidia: Unknown symbol kmem_cache_free (err -22) modprobe: ERROR: could not insert 'nvidia': Invalid argument Any ideas ? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
This is interesting... Markus Koßmann wrote:
Nvidia repo is activated, nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1.x86_64 nvidia-computeG04-390.25-10.1.x86_64 nvidia-glG04-390.25-10.1.x86_64 x11-video-nvidiaG04-390.25-10.1.x86_64
I have the same versions here. But I'm already running kernel 4.15.4 (TW 20180221). No problems there.
[ 519.347560] nvidia: disagrees about version of symbol kmem_cache_alloc_trace [ 519.348076] nvidia: Unknown symbol kmem_cache_alloc_trace (err -22) [ 519.348621] nvidia: disagrees about version of symbol kmem_cache_alloc [ 519.349138] nvidia: Unknown symbol kmem_cache_alloc (err -22) [ 519.349726] nvidia: disagrees about version of symbol kmem_cache_free [ 519.350334] nvidia: Unknown symbol kmem_cache_free (err -22) modprobe: ERROR: could not insert 'nvidia': Invalid argument
However if I look at /var/log/zypp/history and serach for nvidia, I do find # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-modeset.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_alloc # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_free # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_alloc # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_free
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Peter Suetterlin wrote:
This is interesting...
Markus Koßmann wrote:
Nvidia repo is activated, nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1.x86_64 nvidia-computeG04-390.25-10.1.x86_64 nvidia-glG04-390.25-10.1.x86_64 x11-video-nvidiaG04-390.25-10.1.x86_64
I have the same versions here. But I'm already running kernel 4.15.4 (TW 20180221). No problems there.
[ 519.347560] nvidia: disagrees about version of symbol kmem_cache_alloc_trace [ 519.348076] nvidia: Unknown symbol kmem_cache_alloc_trace (err -22) [ 519.348621] nvidia: disagrees about version of symbol kmem_cache_alloc [ 519.349138] nvidia: Unknown symbol kmem_cache_alloc (err -22) [ 519.349726] nvidia: disagrees about version of symbol kmem_cache_free [ 519.350334] nvidia: Unknown symbol kmem_cache_free (err -22) modprobe: ERROR: could not insert 'nvidia': Invalid argument
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel?
I just tried strings /lib/modules/4.15.2-1-default/updates/nvidia.ko | grep 4\\.15 Guess what I see? Tons of references to /usr/src/linux-4.15.4-1 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel? Well before I noticed that problem I booted into runlvel 3 and forced a reinstallation of the nvidia rpms with zypper in --force. So I'am quite sure
Am Samstag, 24. Februar 2018, 18:53:38 schrieb Peter Suetterlin: that is was build against the kernel, where it failed to load. If you look into /lib/modules for nvidia.ko you will find: /lib/modules # find . -name nvidia.ko | xargs ls -l -rw-r--r-- 1 root root 29440208 Feb 24 18:01 ./4.15.2-1- default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 19 18:22 ./4.15.3-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 21 20:10 ./4.15.4-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko So it seems to be compiled into the 4.15.2 module tree and then linked into the other module trees -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Markus Koßmann wrote:
Am Samstag, 24. Februar 2018, 18:53:38 schrieb Peter Suetterlin:
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel?
Well before I noticed that problem I booted into runlvel 3 and forced a reinstallation of the nvidia rpms with zypper in --force. So I'am quite sure that is was build against the kernel, where it failed to load.
Really? Have you tried the strings command from the other post, strings /lib/modules/4.15.2-1-default/updates/nvidia.ko | grep 4\\.15
If you look into /lib/modules for nvidia.ko you will find: /lib/modules # find . -name nvidia.ko | xargs ls -l -rw-r--r-- 1 root root 29440208 Feb 24 18:01 ./4.15.2-1- default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 19 18:22 ./4.15.3-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 21 20:10 ./4.15.4-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko
So it seems to be compiled into the 4.15.2 module tree and then linked into the other module trees
As it is in my case, so you'd *assume* it was compiled against 4.15.2., as the 4.15.4 modules directory only has the weak-update links. As if the compile script is hardcoded to put it to /lib/modules/4.15.2-1-default, but uses the latest installed kernel headers for compilation I have just booted to 4.15.2, which is still around. The nvidia module won't load..... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
* Markus Koßmann
Am Samstag, 24. Februar 2018, 18:53:38 schrieb Peter Suetterlin:
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel? Well before I noticed that problem I booted into runlvel 3 and forced a reinstallation of the nvidia rpms with zypper in --force. So I'am quite sure that is was build against the kernel, where it failed to load.
If you look into /lib/modules for nvidia.ko you will find: /lib/modules # find . -name nvidia.ko | xargs ls -l -rw-r--r-- 1 root root 29440208 Feb 24 18:01 ./4.15.2-1- default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 19 18:22 ./4.15.3-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 21 20:10 ./4.15.4-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko
So it seems to be compiled into the 4.15.2 module tree and then linked into the other module trees
no problem at all with "sh NVIDIA-Linux-x86_64-390.25.run -aqs" -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Registered Linux User #207535 @ http://linuxcounter.net Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet freenode -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Markus Koßmann wrote:
Am Samstag, 24. Februar 2018, 18:53:38 schrieb Peter Suetterlin:
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel? Well before I noticed that problem I booted into runlvel 3 and forced a reinstallation of the nvidia rpms with zypper in --force. So I'am quite sure that is was build against the kernel, where it failed to load.
I've opene a bugreport: https://bugzilla.opensuse.org/show_bug.cgi?id=1082704 I wonder if the forced reinstall would work if you remove all kernel-devel packages other than the 4.15.2 ones, i.e., only keep kernel-default-devel-4.15.2-1.4.x86_64 kernel-devel-4.15.2-1.4.noarch and/or check where /usr/src/linux is pointing to... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Markus Koßmann wrote:
Am Samstag, 24. Februar 2018, 18:53:38 schrieb Peter Suetterlin:
Any ideas ?
Not sure what to make from the above - looks as if it had been compiled against the wrong kernel? Well before I noticed that problem I booted into runlvel 3 and forced a reinstallation of the nvidia rpms with zypper in --force. So I'am quite sure that is was build against the kernel, where it failed to load.
If you look into /lib/modules for nvidia.ko you will find: /lib/modules # find . -name nvidia.ko | xargs ls -l -rw-r--r-- 1 root root 29440208 Feb 24 18:01 ./4.15.2-1- default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 19 18:22 ./4.15.3-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko lrwxrwxrwx 1 root root 47 Feb 21 20:10 ./4.15.4-1-default/weak- updates/updates/nvidia.ko -> /lib/modules/4.15.2-1-default/updates/nvidia.ko
So it seems to be compiled into the 4.15.2 module tree and then linked into the other module trees
Not sure if you solved the issue or are following the bugreport. The problem is/was a kmp compile script that is triggered when a new kernel-default-devel gets installed. That script (you can check it with rpm -q --triggers nvidia-gfxG04-kmp-default) does compile the nvidia modules against the new kernel, but places them in the old location, which is hardcoded in the script... I attach a script that I used to recompile my modules for the kernels I have. ./compile_nvidia_kmp.sh # this will compile against the latest installed # kernel, but place the modules properly ./compile_nvidia_kmp.sh 4.15.2-1-default # modules for older kernel HTH
On 25/02/18 04:53, Peter Suetterlin wrote:
This is interesting...
Markus Ko�mann wrote:
Nvidia repo is activated, nvidia-gfxG04-kmp-default-390.25_k4.15.2_1-10.1.x86_64 nvidia-computeG04-390.25-10.1.x86_64 nvidia-glG04-390.25-10.1.x86_64 x11-video-nvidiaG04-390.25-10.1.x86_64 I have the same versions here. But I'm already running kernel 4.15.4 (TW 20180221). No problems there.
[ 519.347560] nvidia: disagrees about version of symbol kmem_cache_alloc_trace [ 519.348076] nvidia: Unknown symbol kmem_cache_alloc_trace (err -22) [ 519.348621] nvidia: disagrees about version of symbol kmem_cache_alloc [ 519.349138] nvidia: Unknown symbol kmem_cache_alloc (err -22) [ 519.349726] nvidia: disagrees about version of symbol kmem_cache_free [ 519.350334] nvidia: Unknown symbol kmem_cache_free (err -22) modprobe: ERROR: could not insert 'nvidia': Invalid argument However if I look at /var/log/zypp/history and serach for nvidia, I do find
# depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-modeset.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_alloc # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia-uvm.ko disagrees about version of symbol kmem_cache_free # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_alloc_trace # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_alloc # depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_free
I can almost bet my life on it but I suspect that you still have kernel-default 4.15.2 installed -- have a look in Yast2 under the tab "Versions".
Any ideas ? Not sure what to make from the above - looks as if it had been compiled against the wrong kernel?
BC -- Always be nice to people on your way up -- you'll see the same people on your way down. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Basil Chupin wrote:
On 25/02/18 04:53, Peter Suetterlin wrote:
# depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_free
I can almost bet my life on it but I suspect that you still have kernel-default 4.15.2 installed -- have a look in Yast2 under the tab "Versions".
Yes, of course I have. It is the default for purge-kernels to keep the latest, second latest, and running kernels, IIRC. In my case that is 4.15.2-1-default and 4.15.4-1-default. (If it were not installed, I couldn't boot it to verify that the modules don't load with that version, is it?) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 26/02/18 19:48, Peter Suetterlin wrote:
Basil Chupin wrote:
On 25/02/18 04:53, Peter Suetterlin wrote:
# depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_free I can almost bet my life on it but I suspect that you still have kernel-default 4.15.2 installed -- have a look in Yast2 under the tab "Versions". Yes, of course I have. It is the default for purge-kernels to keep the latest, second latest, and running kernels, IIRC. In my case that is 4.15.2-1-default and 4.15.4-1-default.
(If it were not installed, I couldn't boot it to verify that the modules don't load with that version, is it?)
You are absolutely correct. I should have paid more attention to the kernel number you mention (above); I saw what I wanted to see thus leading to my post :-(. I had trouble with an old kernel (which ended with the number 2) when I switched to one of the latest kernels in '.../Kernel:/stable/standard/' with the old kernel hanging around and causing hassles -- until I manually deleted it. So, sorry for the kerfuffle :-). BC -- Always be nice to people on your way up -- you'll see the same people on your way down. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Basil Chupin wrote:
On 26/02/18 19:48, Peter Suetterlin wrote:
Basil Chupin wrote:
On 25/02/18 04:53, Peter Suetterlin wrote:
# depmod: WARNING: //lib/modules/4.15.2-1-default/updates/nvidia.ko disagrees about version of symbol kmem_cache_free I can almost bet my life on it but I suspect that you still have kernel-default 4.15.2 installed -- have a look in Yast2 under the tab "Versions". Yes, of course I have. It is the default for purge-kernels to keep the latest, second latest, and running kernels, IIRC. In my case that is 4.15.2-1-default and 4.15.4-1-default.
(If it were not installed, I couldn't boot it to verify that the modules don't load with that version, is it?)
You are absolutely correct. I should have paid more attention to the kernel number you mention (above); I saw what I wanted to see thus leading to my post :-(.
Been there, done that myself, no worries ;^> The fix for the issue is in the pipeline, BTW. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (4)
-
Basil Chupin
-
Markus Koßmann
-
Patrick Shanahan
-
Peter Suetterlin