Problem compiling kernel module for nvidia after dup of Tumbleweed

Dear Community, I'm running OpenSUSE Tumbleweed and I have added the repository for Cuda from the Nvidia Website <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=rpm_network>. I also added the "official" nvidia-repository for OpenSUSE <https://download.nvidia.com/opensuse/tumbleweed>, which however is for Leap, not for Tumbleweed. I installed Cuda, which pulled the nvidia-drivers "nvidia-computeG05", "nvidia-gfxG05-kmp-default" and "nvidia-glG05" of the 530 family. Everything was working fine. I recently did a zypper dup. After rebooting, the window manager doesn't start and I'm sent to a console login. The X log tells me, among other things: [ 65.945] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65.945] (EE) NVIDIA: system's kernel log for additional error messages and [ 65.945] (EE) NVIDIA: consult the NVIDIA README for details. [ 65.945] (EE) No devices detected. [ 65.945] (EE) Fatal server error: [ 65.945] (EE) no screens found(EE) [ 65.945] (EE) dmesg and journalctl have no trace of anything related to nvidia. I tried to look for why the nvidia-driver (graphics card Nvidia GTX 750 Ti) doesn't load, I found that there is no "nvidia*.ko" in the directory /lib/modules/6.3.1-1-default/updates/. I tried to reinstall the drivers in order to see if it would recompile the kernel module against the new kernel, but I got the first error message (please let me know if attaching text files to this list is not appropriate and if you would like it pasted online somewhere instead). I thought that may be because the cuda-repository for Leap is no longer at the same kernel version as Tumbleweed and disabled it. Trying to reinstall the above mentioned nvidia-drivers found only the family 470 and offered me, among other options, to downgrade "x11-video-nvidiaG05-530.30.02-0.x86_64" to "x11-video-nvidiaG05-470.182.03-53.1.x86_64", which I accepted. During the recompilation of the kernel module I got the errors in Error2.txt, which are very similar to the errors I got the first time. I'm out of clue as to what to try next and would appreciate any help. Thank you

On 08.05.2023 08:34, Andrea Croci wrote:
Dear Community,
I'm running OpenSUSE Tumbleweed and I have added the repository for Cuda from the Nvidia Website <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=rpm_network>. I also added the "official" nvidia-repository for OpenSUSE <https://download.nvidia.com/opensuse/tumbleweed>, which however is for Leap, not for Tumbleweed.
I installed Cuda, which pulled the nvidia-drivers "nvidia-computeG05", "nvidia-gfxG05-kmp-default" and "nvidia-glG05" of the 530 family. Everything was working fine. I recently did a zypper dup. After rebooting, the window manager doesn't start and I'm sent to a console login.
The X log tells me, among other things:
[ 65.945] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65.945] (EE) NVIDIA: system's kernel log for additional error messages and [ 65.945] (EE) NVIDIA: consult the NVIDIA README for details. [ 65.945] (EE) No devices detected. [ 65.945] (EE) Fatal server error: [ 65.945] (EE) no screens found(EE) [ 65.945] (EE)
dmesg and journalctl have no trace of anything related to nvidia.
I tried to look for why the nvidia-driver (graphics card Nvidia GTX 750 Ti) doesn't load, I found that there is no "nvidia*.ko" in the directory /lib/modules/6.3.1-1-default/updates/. I tried to reinstall the drivers in order to see if it would recompile the kernel module against the new kernel, but I got the first error message (please let me know if attaching text files to this list is not appropriate and if you would like it pasted online somewhere instead).
I thought that may be because the cuda-repository for Leap is no longer at the same kernel version as Tumbleweed and disabled it. Trying to reinstall the above mentioned nvidia-drivers found only the family 470 and offered me, among other options, to downgrade "x11-video-nvidiaG05-530.30.02-0.x86_64" to "x11-video-nvidiaG05-470.182.03-53.1.x86_64", which I accepted. During the recompilation of the kernel module I got the errors in Error2.txt, which are very similar to the errors I got the first time.
I'm out of clue as to what to try next and would appreciate any help.
Thank you

Hello Andrei, thank you for the reply. The bug discussion you linked says to boot in an old kernel for now, but in the initial grub screen there is only the option 6.3 and 6.2, which is also affected. How do I tell grub to offer me kernel 4.12, which is also installed in my system? On 08.05.23 07:41, Andrei Borzenkov wrote:
On 08.05.2023 08:34, Andrea Croci wrote:
Dear Community,
I'm running OpenSUSE Tumbleweed and I have added the repository for Cuda from the Nvidia Website <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=rpm_network>.
I also added the "official" nvidia-repository for OpenSUSE <https://download.nvidia.com/opensuse/tumbleweed>, which however is for Leap, not for Tumbleweed.
I installed Cuda, which pulled the nvidia-drivers "nvidia-computeG05", "nvidia-gfxG05-kmp-default" and "nvidia-glG05" of the 530 family. Everything was working fine. I recently did a zypper dup. After rebooting, the window manager doesn't start and I'm sent to a console login.
The X log tells me, among other things:
[ 65.945] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65.945] (EE) NVIDIA: system's kernel log for additional error messages and [ 65.945] (EE) NVIDIA: consult the NVIDIA README for details. [ 65.945] (EE) No devices detected. [ 65.945] (EE) Fatal server error: [ 65.945] (EE) no screens found(EE) [ 65.945] (EE)
dmesg and journalctl have no trace of anything related to nvidia.
I tried to look for why the nvidia-driver (graphics card Nvidia GTX 750 Ti) doesn't load, I found that there is no "nvidia*.ko" in the directory /lib/modules/6.3.1-1-default/updates/. I tried to reinstall the drivers in order to see if it would recompile the kernel module against the new kernel, but I got the first error message (please let me know if attaching text files to this list is not appropriate and if you would like it pasted online somewhere instead).
I thought that may be because the cuda-repository for Leap is no longer at the same kernel version as Tumbleweed and disabled it. Trying to reinstall the above mentioned nvidia-drivers found only the family 470 and offered me, among other options, to downgrade "x11-video-nvidiaG05-530.30.02-0.x86_64" to "x11-video-nvidiaG05-470.182.03-53.1.x86_64", which I accepted. During the recompilation of the kernel module I got the errors in Error2.txt, which are very similar to the errors I got the first time.
I'm out of clue as to what to try next and would appreciate any help.
Thank you

On 08.05.2023 09:04, Andrea Croci wrote:
Hello Andrei,
thank you for the reply. The bug discussion you linked says to boot in an old kernel for now, but in the initial grub screen there is only the option 6.3 and 6.2, which is also affected.
NVIDIA driver fails to compile every second kernel version. If you cannot handle it, you really should not be using NVIDIA with Tumbleweed. Current NVIDIA drivers from SUSE should compile under kernel 6.2. At least changelog suggests it. Show log of failed compile for kernel 6.2.
How do I tell grub to offer me kernel 4.12, which is also installed in my system?
Sorry? *Tumbleweed* with kernel *4.12*? Where does it come from and how long ago did you update the last time? Anyway, as long as kernel is in /boot you do not need to tell grub anything because it automatically adds menu entries for kernels in /boot. If it does not happen you need to provide more information - how this kernel was installed, where it is located, what is the content of grub.cfg etc.
On 08.05.23 07:41, Andrei Borzenkov wrote:
On 08.05.2023 08:34, Andrea Croci wrote:
Dear Community,
I'm running OpenSUSE Tumbleweed and I have added the repository for Cuda from the Nvidia Website <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=rpm_network>.
I also added the "official" nvidia-repository for OpenSUSE <https://download.nvidia.com/opensuse/tumbleweed>, which however is for Leap, not for Tumbleweed.
I installed Cuda, which pulled the nvidia-drivers "nvidia-computeG05", "nvidia-gfxG05-kmp-default" and "nvidia-glG05" of the 530 family. Everything was working fine. I recently did a zypper dup. After rebooting, the window manager doesn't start and I'm sent to a console login.
The X log tells me, among other things:
[ 65.945] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65.945] (EE) NVIDIA: system's kernel log for additional error messages and [ 65.945] (EE) NVIDIA: consult the NVIDIA README for details. [ 65.945] (EE) No devices detected. [ 65.945] (EE) Fatal server error: [ 65.945] (EE) no screens found(EE) [ 65.945] (EE)
dmesg and journalctl have no trace of anything related to nvidia.
I tried to look for why the nvidia-driver (graphics card Nvidia GTX 750 Ti) doesn't load, I found that there is no "nvidia*.ko" in the directory /lib/modules/6.3.1-1-default/updates/. I tried to reinstall the drivers in order to see if it would recompile the kernel module against the new kernel, but I got the first error message (please let me know if attaching text files to this list is not appropriate and if you would like it pasted online somewhere instead).
I thought that may be because the cuda-repository for Leap is no longer at the same kernel version as Tumbleweed and disabled it. Trying to reinstall the above mentioned nvidia-drivers found only the family 470 and offered me, among other options, to downgrade "x11-video-nvidiaG05-530.30.02-0.x86_64" to "x11-video-nvidiaG05-470.182.03-53.1.x86_64", which I accepted. During the recompilation of the kernel module I got the errors in Error2.txt, which are very similar to the errors I got the first time.
I'm out of clue as to what to try next and would appreciate any help.
Thank you

On 08.05.23 08:42, Andrei Borzenkov wrote:
On 08.05.2023 09:04, Andrea Croci wrote:
Hello Andrei,
thank you for the reply. The bug discussion you linked says to boot in an old kernel for now, but in the initial grub screen there is only the option 6.3 and 6.2, which is also affected.
NVIDIA driver fails to compile every second kernel version. If you cannot handle it, you really should not be using NVIDIA with Tumbleweed. I should probably do away with Tumbleweed altogether. I gave it a try because I liked the idea of a rolling release after a release upgrade of Ubuntu failed for the n-th time (I have never been able to make one work, always having to reinstall the whole thing from scratch and got fed up with that), but Tumbleweed does seem to give me more problems than joy.
Current NVIDIA drivers from SUSE should compile under kernel 6.2. At least changelog suggests it. Show log of failed compile for kernel 6.2. I just tried it again, booting into 6.2.12 and reinstalling the drivers. I got the same errors as for 6.3.1 (see attachment, if no one tells me I shouldn't attach anything here). The interesting thing is that I'm now logged into 6.2.12 and it still enter the directory for 6.3.1 (maybe normal?) where the whole trouble is. Then at the very end it enters the directory for 6.2.12 and only cleans things there. Anyway, the problem persists under 6.2, with it not being able to initialize the kernel module.
How do I tell grub to offer me kernel 4.12, which is also installed in my system?
Sorry? *Tumbleweed* with kernel *4.12*? Where does it come from and how long ago did you update the last time? I first installed it on the 30th of March and do the updates every 2 or 3 days since.
Anyway, as long as kernel is in /boot you do not need to tell grub anything because it automatically adds menu entries for kernels in /boot. If it does not happen you need to provide more information - how this kernel was installed, where it is located, what is the content of grub.cfg etc.
You are right, of course. I assumed I had 4.12 because there is a directory "4.12.14-lp150.12.82-default" under /usr/lib/modules/, which is, by the way, the only one that has any "nvidia*.ko" in it. But there is no trace of a 4.12 under /boot. How that directory ended up there, I have no clue.
On 08.05.23 07:41, Andrei Borzenkov wrote:
On 08.05.2023 08:34, Andrea Croci wrote:
Dear Community,
I'm running OpenSUSE Tumbleweed and I have added the repository for Cuda from the Nvidia Website <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=rpm_network>.
I also added the "official" nvidia-repository for OpenSUSE <https://download.nvidia.com/opensuse/tumbleweed>, which however is for Leap, not for Tumbleweed.
I installed Cuda, which pulled the nvidia-drivers "nvidia-computeG05", "nvidia-gfxG05-kmp-default" and "nvidia-glG05" of the 530 family. Everything was working fine. I recently did a zypper dup. After rebooting, the window manager doesn't start and I'm sent to a console login.
The X log tells me, among other things:
[ 65.945] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65.945] (EE) NVIDIA: system's kernel log for additional error messages and [ 65.945] (EE) NVIDIA: consult the NVIDIA README for details. [ 65.945] (EE) No devices detected. [ 65.945] (EE) Fatal server error: [ 65.945] (EE) no screens found(EE) [ 65.945] (EE)
dmesg and journalctl have no trace of anything related to nvidia.
I tried to look for why the nvidia-driver (graphics card Nvidia GTX 750 Ti) doesn't load, I found that there is no "nvidia*.ko" in the directory /lib/modules/6.3.1-1-default/updates/. I tried to reinstall the drivers in order to see if it would recompile the kernel module against the new kernel, but I got the first error message (please let me know if attaching text files to this list is not appropriate and if you would like it pasted online somewhere instead).
I thought that may be because the cuda-repository for Leap is no longer at the same kernel version as Tumbleweed and disabled it. Trying to reinstall the above mentioned nvidia-drivers found only the family 470 and offered me, among other options, to downgrade "x11-video-nvidiaG05-530.30.02-0.x86_64" to "x11-video-nvidiaG05-470.182.03-53.1.x86_64", which I accepted. During the recompilation of the kernel module I got the errors in Error2.txt, which are very similar to the errors I got the first time.
I'm out of clue as to what to try next and would appreciate any help.
Thank you

On 08.05.2023 11:08, Andrea Croci wrote:
On 08.05.23 08:42, Andrei Borzenkov wrote:
On 08.05.2023 09:04, Andrea Croci wrote:
Hello Andrei,
thank you for the reply. The bug discussion you linked says to boot in an old kernel for now, but in the initial grub screen there is only the option 6.3 and 6.2, which is also affected.
NVIDIA driver fails to compile every second kernel version. If you cannot handle it, you really should not be using NVIDIA with Tumbleweed. I should probably do away with Tumbleweed altogether. I gave it a try because I liked the idea of a rolling release after a release upgrade of Ubuntu failed for the n-th time (I have never been able to make one work, always having to reinstall the whole thing from scratch and got fed up with that), but Tumbleweed does seem to give me more problems than joy.
Current NVIDIA drivers from SUSE should compile under kernel 6.2. At least changelog suggests it. Show log of failed compile for kernel 6.2. I just tried it again, booting into 6.2.12 and reinstalling the drivers. I got the same errors as for 6.3.1 (see attachment, if no one tells me I
You are still compiling against kernel 6.3 make[1]: Entering directory '/usr/src/linux-6.3.1-1' NVIDIA driver is compiled during installation on the user system; you need to have kernel-default-devel package (and its dependencies) matching your running binary kernel. Unless you had installed it earlier, only the latest version is available from standard Tumbleweed repositories. You may look at https://download.opensuse.org/history/ if previous versions are still present there.
shouldn't attach anything here). The interesting thing is that I'm now logged into 6.2.12 and it still enter the directory for 6.3.1 (maybe normal?) where the whole trouble is. Then at the very end it enters the directory for 6.2.12 and only cleans things there. Anyway, the problem persists under 6.2, with it not being able to initialize the kernel module.
How do I tell grub to offer me kernel 4.12, which is also installed in my system?
Sorry? *Tumbleweed* with kernel *4.12*? Where does it come from and how long ago did you update the last time?
I first installed it on the 30th of March and do the updates every 2 or 3 days since.
There is no way you got openSUSE kernel 4.12 with Tumbleweed install on the 30th of March.
Anyway, as long as kernel is in /boot you do not need to tell grub anything because it automatically adds menu entries for kernels in /boot. If it does not happen you need to provide more information - how this kernel was installed, where it is located, what is the content of grub.cfg etc.
You are right, of course. I assumed I had 4.12 because there is a directory "4.12.14-lp150.12.82-default" under /usr/lib/modules/, which is, by the way, the only one that has any "nvidia*.ko" in it. But there is no trace of a 4.12 under /boot. How that directory ended up there, I have no clue.
It sounds like you attempted to install NVIDIA modules for Leap 15.0.
participants (2)
-
Andrea Croci
-
Andrei Borzenkov