[Bug 1173733] New: Compute capabilities of NVIDIA drivers cannot be initialised by non-root users
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733 Bug ID: 1173733 Summary: Compute capabilities of NVIDIA drivers cannot be initialised by non-root users Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: X11 3rd Party Driver Assignee: gfx-bugs@suse.de Reporter: marix@marix.org QA Contact: sndirsch@suse.com Found By: --- Blocker: --- After an upgrade from openSUSE 15.1 to openSUSE 15.2, the compute capabilities of the proprietary NVIDIA drivers can no longer be used by normal users unless root has used them before. This can be easily verified by running an arbitrary application that invokes `clGetPlatformIDs`. That function will return an error code of -1001. It will also show in Boinc, which reports: No usable GPUs found. Running the same code as root will succeed, and afterwards also non-priviledged users can fully utilise the GPU. What I was able to debug is that until root used compute capabilities of the GPU, the `nvidia-uvm` kernel module is not loaded and the device files `/dev/nvidia-uvm` and `/dev/nvidia-uvm-tools` are missing. Loading the `nvidia-uvm` kernel module on its own is not sufficient. Running a simple GPU-utilising application shows that the application (or rather one of the driver components it invokes), before returning the error, attempts to run `nvidia-modprobe`. According to it's help, `nvidia-modprobe` is a "setuid program is used to create, in a Linux distribution-independent way, NVIDIA Linux device files and load the NVIDIA kernel module, on behalf of NVIDIA Linux driver components". However, this application seem to not be installed as a setuid application in Leap 15.2. Manually making `nvidia-modrpobe` setuid will actually fix the issue, that is, normal users like myself or the boinc user can afterwards successfully utilise the GPU without root having run anything on it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c1
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c2
--- Comment #2 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c3
--- Comment #3 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c4
--- Comment #4 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c5
--- Comment #5 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c6
--- Comment #6 from Stefan Dirsch
I attached the files you asked for. Please be aware that only loading nvidia-uvm is not sufficient. The proper creation of the corresponding device files is also essential.
/usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf makes sure that device nodes are created during boot and permissions are set when user logs in. Things are complicated. (boo#1000625) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c7
--- Comment #7 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c8
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c9
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c20
--- Comment #20 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c21
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c22
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c23
--- Comment #23 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c24
--- Comment #24 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c25
Mister Pend
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c26
--- Comment #26 from Mister Pend
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c27
--- Comment #27 from Stefan Dirsch
I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
Thanks! Good catch! This is definitely needed and may explain the black screen Matthias sees now with SDDM.
Also, my test machine (working) is lacking /dev/nvidia-uvm-tools, but CUDA still functions fine:
Looks like it depends, which functionality you require whether you need this device or not. I figured out that it already exists since driver version 364 (March 2016). It's weird to see not getting reports earlier and now with Leap 15.2 several at about the same day even! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c28
--- Comment #28 from Stefan Dirsch
Created attachment 839480 [details] sample files from two machines
sample files from two machines, one working one not
You're using different NVreg_DeviceFileGID on your machines (not using our default of 33). Also on the broken machine /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely. # diff -u -r prodmachine\ \(broken\)/ testmachine\ \(works\)/ diff -u -r "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf" "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf" --- "prodmachine (broken)/etc/modprobe.d/50-nvidia-default.conf" 2020-07-08 11:22:44.000000000 +0200 +++ "testmachine (works)/etc/modprobe.d/50-nvidia-default.conf" 2020-07-08 11:23:42.000000000 +0200 @@ -1,2 +1,2 @@ -options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=483 NVreg_DeviceFileMode=0660 +options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=484 NVreg_DeviceFileMode=0660 install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then if /sbin/modprobe nvidia_uvm; then if [ ! -c /dev/nvidia-uvm ]; then mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0; chown :video /dev/nvidia-uvm; fi; fi; if [ ! -c /dev/nvidiactl ]; then mknod -m 660 /dev/nvidiactl c 195 255; chown :video /dev/nvidiactl; fi; devid=-1; for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor); if [ "$vendorid" == "0x10de" ]; then class=$(cat $dev/class); classid=${class%%00}; if [ "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)); if [ ! -c /dev/nvidia${devid} ]; then mknod -m 660 /dev/nvidia${devid} c 195 ${devid}; chown :video /dev/nvidia${devid}; fi; fi; fi; done; /sbin/modprobe nvidia_drm; if [ ! -c /dev/nvidia-modeset ]; then mknod -m 660 /dev/nvidia-modeset c 195 254; chown :video /dev/nvidia-modeset; fi; fi \ No newline at end of file Only in testmachine (works)/usr/lib/tmpfiles.d: nvidia-logind-acl-trick.conf -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c29
--- Comment #29 from Stefan Dirsch
getfacl /dev/nvidia* [...] @Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c30
--- Comment #30 from Mister Pend
You're using different NVreg_DeviceFileGID on your machines (not using our default of 33). Also on the broken machine /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely.
I'm not sure how, I'm not doing anything out of the ordinary here - drivers were installed from NVIDIA repository as per my ansible tasklist (NVIDIA repository added 'https://download.nvidia.com/opensuse/leap/15.2', package 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing completely on a clean installed machine - I suspect it's presence on the working machine is due to driver package variances during it's life before I tested the upgrade on it. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c31
--- Comment #31 from Mister Pend
getfacl /dev/nvidia* [...] @Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
Correct, I had SSH'ed into the test machine cause I was too lazy to set up VNC or walk across to the other end of my workshop :P once a regular user has logged on, they show as having access as well -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c32
--- Comment #32 from Stefan Dirsch
Correct, I had SSH'ed into the test machine cause I was too lazy to set up VNC or walk across to the other end of my workshop :P once a regular user has logged on, they show as having access as well
That's fine then! :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c33
--- Comment #33 from Stefan Dirsch
(In reply to Stefan Dirsch from comment #28)
You're using different NVreg_DeviceFileGID on your machines (not using our default of 33). Also on the broken machine /usr/lib/tmpfiles.d/nvidia-logind-acl-trick.conf is missing completely.
I'm not sure how, I'm not doing anything out of the ordinary here - drivers were installed from NVIDIA repository as per my ansible tasklist (NVIDIA repository added 'https://download.nvidia.com/opensuse/leap/15.2', package 'x11-video-nvidiaG05' installed via zypper). And yes, that file is missing completely on a clean installed machine - I suspect it's presence on the working machine is due to driver package variances during it's life before I tested the upgrade on it.
Then there is something fishy. You must have edited manually /etc/modprobe.d/50-nvidia-default.conf in order to have a different NVreg_DeviceFileGID there. 33 is the group ID of video group. That's why we use it here. Probably it no longer matters since permissions are meanwhile set via udev/logind (ACLs). So I guess you can ignore this. /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf is created in %post of nvidia-gfxG05-kmp-default and only removed in %postun when uninstalled, not during an update. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c34
--- Comment #34 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c35
Nikolai Nikolaevskii
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c36
--- Comment #36 from Stefan Dirsch
So is this issue solved or not?
Honestly I can't say.
My wild shot: GDM starts with root privileges, SDDM starts with ordinary user privileges. Maybe I am wrong.
I'm afraid you are. AFAIK sddm chooser runs as root, but then gets replaced by the user Xsession running as regular user, so with autologin enabled it may look like X not working from beginnning when permissions to /dev/nvidia0 are not available. It's similar with gdm, which chooser is being run as gdm user and then starts a second Xserver running the Xsession under regular user. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c37
--- Comment #37 from Matthias Bach
I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
Thanks! Good catch! This is definitely needed and may explain the black screen Matthias sees now with SDDM.
I can confirm that adding this line into the file will resolve the black-screen issue. Though I am still without compute capablities. (In reply to Stefan Dirsch from comment #24)
Hmm. You rebooted your machine afterwards, right? You could try running the script code in modprobe file manually for testing. I'm running out of ideas why things are not working for you.
Yes, I rebooted my machine. I have extracted the code from the install section of the modprobe file and running this via /bin/sh (which on my system means Bash) will create the missing files. root@eddie:~ # ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 254 Jul 8 18:26 /dev/nvidia-modeset crw-rw---- 1 root video 239, 0 Jul 8 18:29 /dev/nvidia-uvm crw-rw---- 1 root video 239, 1 Jul 8 18:29 /dev/nvidia-uvm-tools crw-rw----+ 1 root video 195, 0 Jul 8 18:26 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 Jul 8 18:26 /dev/nvidiactl In consequence, it seems like the modprobe file for some reason is not properly applied by machine despite being present. Could this be an issue of initalisation order? Could it be that some trigger condition for the modprobe is not being matched. Although it wound wonder me if those changed since 15.1. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c38
--- Comment #38 from Matthias Bach
getfacl /dev/nvidia* [...] @Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
Just to make this explicit, it's completely valid to utilise the compute capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a dedicated machine-learning host that for noise reasons you don't want to have right next to your desk. It's one of the big advantages of the NVIDIA cards over AMD that you can have a truly headless system with them. The first generation of Tesla cards didn't even have graphics outlets, and in consequence wouldn't work on Windows (which I assume is the only reason why a lot of supercomputers now could run giant display farms). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c39
--- Comment #39 from Cor Blom
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c40
--- Comment #40 from Cor Blom
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c41
--- Comment #41 from Stefan Dirsch
(In reply to Stefan Dirsch from comment #27)
I haven't applied the tarball suggested (the reports above gave me pause), but did notice once discrepancy with your tarball: you seem > to be missing this final line from /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf:
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0
Thanks! Good catch! This is definitely needed and may explain the black screen Matthias sees now with SDDM.
I can confirm that adding this line into the file will resolve the black-screen issue. Though I am still without compute capablities.
Thanks. At least this we could fix again.
I have extracted the code from the install section of the modprobe file and running this via /bin/sh (which on my system means Bash) will create the missing files.
root@eddie:~ # ls -l /dev/nvidia* crw-rw----+ 1 root video 195, 254 Jul 8 18:26 /dev/nvidia-modeset crw-rw---- 1 root video 239, 0 Jul 8 18:29 /dev/nvidia-uvm crw-rw---- 1 root video 239, 1 Jul 8 18:29 /dev/nvidia-uvm-tools crw-rw----+ 1 root video 195, 0 Jul 8 18:26 /dev/nvidia0 crw-rw----+ 1 root video 195, 255 Jul 8 18:26 /dev/nvidiactl
Ah. Thanks for checking this1
In consequence, it seems like the modprobe file for some reason is not properly applied by machine despite being present. Could this be an issue of initalisation order? Could it be that some trigger condition for the modprobe is not being matched. Although it wound wonder me if those changed since 15.1.
That could be a good catch! The modprobe file is marked as %config, so possibly the old one has been backed up as .rpmsave, but preferred over the new one nevertheless. This would explain the behaviou at least. Could you check this and remove the old modprobe file. And test again? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c42
--- Comment #42 from Stefan Dirsch
(In reply to Stefan Dirsch from comment #29)
getfacl /dev/nvidia* [...] @Mister Pend Seeems only gdm has access to your nvidia devices? It should look different once a regular user has logged into the session.
Just to make this explicit, it's completely valid to utilise the compute capabilities (nvenc, CUDA, OpenCL) without a running X session, i.e. for a dedicated machine-learning host that for noise reasons you don't want to have right next to your desk. It's one of the big advantages of the NVIDIA cards over AMD that you can have a truly headless system with them. The first generation of Tesla cards didn't even have graphics outlets, and in consequence wouldn't work on Windows (which I assume is the only reason why a lot of supercomputers now could run giant display farms).
Sure, but we don;t cover this use case. We can only set permissions when user logs into a Xsession. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c43
--- Comment #43 from Stefan Dirsch
I have build rev 71 of X11:Drivers:Video, i.e. before the version update to 45o.57, which, if I am correct contains all corrections mentioned in this report.
Yes, that's perfect! Thanks for doing this! Unfortunately I cannot provide RPMs for testing for legal reasons here. Could you check if you have two modprobe files like /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/50-nvidia-default.conf.rpmsave See my comment#41. If yes, please remove the older file and try again (reboot is the easiest). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c44
--- Comment #44 from Matthias Bach
Could you check if you have two modprobe files like
/etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/50-nvidia-default.conf.rpmsave
See my comment#41. If yes, please remove the older file and try again (reboot is the easiest).
I only have the following: # ls /etc/modprobe.d/*nvidia* /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/nvidia-default.conf I do remember removing some rpmsave file at some point during my debugging but my last rounds of tests definitely already were performed without that file present. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c45
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c48
Cor Blom
Could you check if you have two modprobe files like
/etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/50-nvidia-default.conf.rpmsave
See my comment#41. If yes, please remove the older file and try again (reboot is the easiest).
Yes, I had both. Removed the .rpmsave one and rebooted. It made not difference. I have backuped both files and can provide them, if necessary. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c51
--- Comment #51 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c52
--- Comment #52 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c53
Stefan Dirsch
Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.
This would have happened if I could provide a real KMP package to you ... on the other side Cor Blom tested a real KMP package. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c54
Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c55
--- Comment #55 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c56
--- Comment #56 from Cor Blom
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c57
--- Comment #57 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c58
--- Comment #58 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c59
--- Comment #59 from Mister Pend
Based on the previous test I have been able to solve the problem for myself. The solution is as obvious as hidden in plain sight. All I had to do was execute the following:
/sbin/mkinitrd
Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.
Maybe Mister Pend or Cor Blom can confirm this behaviour.
Sorry, I've tried on separate systems and can't confirm this. Executing mkinitrd doesn't error, but doesn't solve the issue. Even after a further reboot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c60
--- Comment #60 from Matthias Bach
(In reply to Matthias Bach from comment #52)
Based on the previous test I have been able to solve the problem for myself. The solution is as obvious as hidden in plain sight. All I had to do was execute the following:
/sbin/mkinitrd
Still, I don't fully understand _why_ this solves the problem. I always assumed the driver installation to trigger this.
Maybe Mister Pend or Cor Blom can confirm this behaviour.
Sorry, I've tried on separate systems and can't confirm this. Executing mkinitrd doesn't error, but doesn't solve the issue. Even after a further reboot.
After the last driver update, and having worked around bug 1174204, I know also have the issue again. The /sbin/mkinitrd didn't help this time, so it must have been something else that fixed it for me back than. Still, I have that weird effect that unloading the modules after boot will actually make the system work as expected afterwards, until the next reboot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c61
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c62
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c63
--- Comment #63 from Cor Blom
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c64
--- Comment #64 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c65
--- Comment #65 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c66
--- Comment #66 from Bernhard Wiedemann
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c67
--- Comment #67 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c68
--- Comment #68 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c69
--- Comment #69 from Matthias Bach
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c70
Stefan Dirsch
2) or make sure they are not added at all
# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF
Sorry, that was wrong. It should have been # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF Could you please try this as well? I believe this should result in what achieved Cor Blom by comment #56. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c71
steve edmonds
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c72
--- Comment #72 from Mister Pend
Check which nvidia modules are in your initrd by running
Looks like no nvidia-uvm on a clean installed system: -rw-r--r-- 1 root root 1484 Jul 17 22:49 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 05:43 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 119664 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27465704 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko -rw-r--r-- 1 root root 1574168 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 54 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 50 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko lrwxrwxrwx 1 root root 58 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko
2) or make sure they are not added at all
Ran your command in comment #70, followed by a mkinitrd and a reboot. And at this point, blank screen, X doesn't seem to load :( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c73
--- Comment #73 from Mister Pend
Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2. I have tried the Systemd Unit providing a workaround suggestion but that failed to work.
It would seek likely. A workaround (not a solution, but a workaround) that works for me was adding the following to the root crontab: @reboot nvidia-modprobe -u -c=0 (or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). After this it may start working for you. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c74
--- Comment #74 from steve edmonds
(In reply to steve edmonds from comment #71)
Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2. I have tried the Systemd Unit providing a workaround suggestion but that failed to work.
It would seek likely. A workaround (not a solution, but a workaround) that works for me was adding the following to the root crontab:
@reboot nvidia-modprobe -u -c=0
(or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). After this it may start working for you.
Unfortunately that has not worked for me either.I have not tried modifying the initrd, I am not quite sure which action to take. sudo lsinitrd | grep nvidia | grep -v firmware gives only -rw-r--r-- 1 root root 1483 Jul 20 15:04 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:43 etc/modprobe.d/nvidia-default.conf Where as on my functioning Leap 15.1 I have -rw-r--r-- 1 root root 1483 Jul 17 21:10 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:42 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 116160 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27452392 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko -rw-r--r-- 1 root root 1570992 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 1934696 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 51 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia.ko lrwxrwxrwx 1 root root 59 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c75
Matthias Bach
(In reply to Stefan Dirsch from comment #67)
2) or make sure they are not added at all
# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF
Sorry, that was wrong. It should have been
# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF
Could you please try this as well? I believe this should result in what achieved Cor Blom by comment #56.
I can confirm that this also fixes the issue for me. I like this even better than force-including them into the initrd as this gives a nice reduction in initrd size from 37 MiB to 14 MiB. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c76
--- Comment #76 from Matthias Bach
(In reply to Mister Pend from comment #73)
(In reply to steve edmonds from comment #71)
Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2. I have tried the Systemd Unit providing a workaround suggestion but that failed to work.
It would seek likely. A workaround (not a solution, but a workaround) that works for me was adding the following to the root crontab:
@reboot nvidia-modprobe -u -c=0
(or running "nvidia-modprobe -u -c=0" as an elevated user once every boot). After this it may start working for you.
Unfortunately that has not worked for me either.I have not tried modifying the initrd, I am not quite sure which action to take. sudo lsinitrd | grep nvidia | grep -v firmware gives only
[…]
That initrd looks correct to my non-expert eye. Some other things that might be interesting: 1) Output of `lsmod | grep nvidia` 2) Output of `ls -lh /dev/nvidia*` 3) Is you user a member of the group `video`? 4) Output of `clinfo | head -n 5` -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c77
--- Comment #77 from steve edmonds
That initrd looks correct to my non-expert eye. Some other things that might be interesting:
1) Output of `lsmod | grep nvidia` 2) Output of `ls -lh /dev/nvidia*` 3) Is you user a member of the group `video`? 4) Output of `clinfo | head -n 5`
1.>lsmod | grep nvidia (nothing) 2.>ls -lh /dev/nvidia* ls: cannot access '/dev/nvidia*': No such file or directory 3. Yes 4. clinfo | head -n 5 Number of platforms 0 The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite different responses. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c78
--- Comment #78 from Stefan Dirsch
Sorry to jump in here, please correct me if wrong, am I correct in assuming this bug is preventing my applications requiring CUDA (Davinci Resolve, Blender) failing to run after a DUP from Leap 15.1 to 15.2.
Yes, this sounds reasonable! (In reply to steve edmonds from comment #74)
Unfortunately that has not worked for me either.I have not tried modifying the initrd, I am not quite sure which action to take. sudo lsinitrd | grep nvidia | grep -v firmware gives only -rw-r--r-- 1 root root 1483 Jul 20 15:04 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:43 etc/modprobe.d/nvidia-default.conf
That does not need to be an issue. I have the same behaviour on my working system.
Where as on my functioning Leap 15.1 I have
-rw-r--r-- 1 root root 1483 Jul 17 21:10 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 07:42 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 116160 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27452392 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia.ko -rw-r--r-- 1 root root 1570992 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-modeset.ko -rw-r--r-- 1 root root 1934696 Jul 17 21:11 lib/modules/4.12.14-lp151.27-default/updates/nvidia-uvm.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-drm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 51 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia.ko lrwxrwxrwx 1 root root 59 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-modeset. ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 55 Jul 17 21:11 lib/modules/4.12.14-lp151.28.52-default/weak-updates/updates/nvidia-uvm.ko -> ../../../4.12.14-lp151.27-default/updates/nvidia-uvm.ko
Yes, this looks consistent. (In reply to steve edmonds from comment #77)
1.>lsmod | grep nvidia (nothing) 2.>ls -lh /dev/nvidia* ls: cannot access '/dev/nvidia*': No such file or directory 3. Yes 4. clinfo | head -n 5 Number of platforms 0
The same video card and GO5 driver (450.57) working in Leap 15.1 gives quite different responses.
OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package installed. If yes, what does modprobe nvidia trigger? Check also dmesg output. Also please make sure you have the latest G05 packages installed from our Leap 15.2 repos. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c79
Stefan Dirsch
(In reply to Stefan Dirsch from comment #67)
Check which nvidia modules are in your initrd by running
Looks like no nvidia-uvm on a clean installed system:
-rw-r--r-- 1 root root 1484 Jul 17 22:49 etc/modprobe.d/50-nvidia-default.conf -rw-r--r-- 1 root root 18 Jul 17 05:43 etc/modprobe.d/nvidia-default.conf -rw-r--r-- 1 root root 119664 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-drm.ko -rw-r--r-- 1 root root 27465704 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia.ko -rw-r--r-- 1 root root 1574168 Jul 17 22:49 lib/modules/5.3.18-lp152.19-default/updates/nvidia-modeset.ko lrwxrwxrwx 1 root root 54 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-drm.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-drm.ko lrwxrwxrwx 1 root root 50 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia.ko lrwxrwxrwx 1 root root 58 Jul 17 22:49 lib/modules/5.3.18-lp152.26-default/weak-updates/updates/nvidia-modeset.ko -> ../../../5.3.18-lp152.19-default/updates/nvidia-modeset.ko
Yes, exactly the same issue as Matthias Bach had.
2) or make sure they are not added at all
Ran your command in comment #70, followed by a mkinitrd and a reboot. And at this point, blank screen, X doesn't seem to load :(
My fault, the advice was wrong. Please try instead - as already corrected in my comment #70 # cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF # mkinitrd According to Matthias Bach this should work. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c80
--- Comment #80 from steve edmonds
OMG. I'm wondering whether you really have nvidia-gfxG05-kmp-default package installed. If yes, what does
modprobe nvidia
trigger? Check also dmesg output. Also please make sure you have the latest G05 packages installed from our Leap 15.2 repos.
sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it) modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
From Yast i │nvidia-computeG05 │NVIDIA driver for computing with GPGPU i │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for GeForce 600 series and newer i │nvidia-glG05 │NVIDIA OpenGL libraries for OpenGL acceleration i │x11-video-nvidiaG05 │NVIDIA graphics driver for GeForce 600 series and newer
Also, my CAD software balks if I do not have the above drivers loaded. Only reference I found in dmesg is [ 4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482 comm="apparmor_parser" [ 4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=482 comm="apparmor_parser" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c81
--- Comment #81 from Stefan Dirsch
(In reply to Stefan Dirsch from comment #70)
(In reply to Stefan Dirsch from comment #67)
2) or make sure they are not added at all
# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_dracutmodules+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF
Sorry, that was wrong. It should have been
# cat > /etc/dracut.conf.d/50-nvidia-default.conf << EOF omit_drivers+="nvidia nvidia-drm nvidia-modeset nvidia-uvm" EOF
Could you please try this as well? I believe this should result in what achieved Cor Blom by comment #56.
I can confirm that this also fixes the issue for me. I like this even better than force-including them into the initrd as this gives a nice reduction in initrd size from 37 MiB to 14 MiB.
Thanks for feedback! I've implemented this now in our packages. Anyone building the packages themselves from obs://X11:Drivers:Video can test this right now. Hello @Cor Blom, glad to know that at least one person is making use of this service! :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c82
Stefan Dirsch
sudo modprobe nvidia (done via ssh as not in front of the 15.2 PC but with a screen locked X11 session running on it) modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
From Yast i │nvidia-computeG05 │NVIDIA driver for computing with GPGPU i │nvidia-gfxG05-kmp-default│NVIDIA graphics driver kernel module for GeForce 600 series and newer i │nvidia-glG05 │NVIDIA OpenGL libraries for OpenGL acceleration i │x11-video-nvidiaG05 │NVIDIA graphics driver for GeForce 600 series and newer
Also, my CAD software balks if I do not have the above drivers loaded.
Only reference I found in dmesg is [ 4.599729] audit: type=1400 audit(1595298800.894:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=482 comm="apparmor_parser" [ 4.599732] audit: type=1400 audit(1595298800.894:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=482 comm="apparmor_parser"
Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c83
--- Comment #83 from Cor Blom
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c84
--- Comment #84 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c85
--- Comment #85 from steve edmonds
Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ...
Do you think it could be related to this release note; 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed openSUSE Leap 15.2 now enables a kernel module signature check for third-party drivers (CONFIG_MODULE_SIG=y). This is an important security measure to avoid untrusted code running in the kernel. This may prevent third-party kernel modules from being loaded if UEFI Secure Boot is enabled. Importantly, this affects NVIDIA...... Although I can't see why my CAD complains of no openGL if I remove the installed Nvidia packages. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c86
--- Comment #86 from Matthias Bach
(In reply to Stefan Dirsch from comment #82)
Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ...
Do you think it could be related to this release note; 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed
Yes, if you are using secure boot your issues are most likely be caused by this. But that too should be solved with the latest package version. You'll have to manually import the package signing key once, though. See bug 1173682 for details. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c87
--- Comment #87 from steve edmonds
(In reply to steve edmonds from comment #85)
(In reply to Stefan Dirsch from comment #82)
Seems you're failing on a complete different level. No nvidia modules installed at all. I suggest to reinstall nvidia-gfxG05-kmp-default package. If this doesn't help, open a separate bug. It's really unrelated to this one ...
Do you think it could be related to this release note; 4.1 Secure Boot: Third-Party Drivers Need to Be Properly Signed
Yes, if you are using secure boot your issues are most likely be caused by this. But that too should be solved with the latest package version. You'll have to manually import the package signing key once, though. See bug 1173682 for details.
May be secure boot is not my issue, I am booting with GRUB2 without EFI and enable trusted boot support off. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c88
--- Comment #88 from steve edmonds
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c89
--- Comment #89 from steve edmonds
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c90
--- Comment #90 from Bernhard Wiedemann
Interestingly though when running mkinitrd I have output as below. Is the first part relating to the kernel 4.4.76-1 supposed to be here or is it a hangover from Leap 15.1.
Creating initrd: /boot/initrd-4.4.76-1-default
This is expected. We always leave some older kernels around so if the latest one is bad, you can boot into an older kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c91
--- Comment #91 from steve edmonds
ls /etc/dracut.conf.d 99-debug.conf
I do have /etc/modprobe.d/50-nvidia-default.conf /etc/modprobe.d/nvidia-default.conf I suspect the @reboot nvidia-modprobe -u -c=0 solution may be my option
lsmod | grep nvidia nvidia_drm 61440 16 nvidia_modeset 1187840 34 nvidia_drm nvidia 19726336 1586 nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 19 drm_kms_helper,nvidia_drm clinfo | head -n 5 Number of platforms 0 steve@linux-qw83:~> lsmod | grep nvidia nvidia_drm 61440 17 nvidia_modeset 1187840 36 nvidia_drm nvidia 19726336 1671 nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 20 drm_kms_helper,nvidia_drm sudo clinfo | head -n 5 [sudo] password for root: Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.0.210 Platform Profile FULL_PROFILE lsmod | grep nvidia nvidia_uvm 1110016 0 nvidia_drm 61440 17 nvidia_modeset 1187840 36 nvidia_drm nvidia 19726336 1672 nvidia_uvm,nvidia_modeset drm_kms_helper 229376 1 nvidia_drm drm 544768 20 drm_kms_helper,nvidia_drm
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c92
--- Comment #92 from Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c93
--- Comment #93 from steve edmonds
Removed /etc/dracut.conf.d/50-nvidia-default.conf could be explained by having suse-prime package installed (Optimus systems with Intel/NVIDIA GPU combo), but then you would have a /etc/dracut.conf.d/90-nvidia-dracut-G05.conf or /usr/lib/dracut/dracut.conf.d/90-nvidia-dracut-G05.conf installed with the same content and you still shouldn't have any nvidia modules in your initrd.
I don't have an explanation for this right now.
The files were there under Leap 15.1, I am assuming nvidia-computeG05 provides /etc/modprobe.d/50-nvidia-default.conf but I have no idea what process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c94
--- Comment #94 from Stefan Dirsch
The files were there under Leap 15.1, I am assuming nvidia-computeG05 provides /etc/modprobe.d/50-nvidia-default.conf
No, that's part of nvidia-gfxG05-kmp-default packgaes
but I have no idea what process leads to the presence of /etc/dracut.conf.d/50-nvidia-default.conf
That was the temporary workaround, This won't be needed with the next driver package update. With that nvidia-gfxG05-kmp-default will include /etc/dracut.conf.d/60-nvidia-default.conf with the same content, i.e. nvidia modules won't be added any longer to initrd. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733
http://bugzilla.opensuse.org/show_bug.cgi?id=1173733#c95
--- Comment #95 from steve edmonds
participants (1)
-
bugzilla_noreply@suse.com