troubleshooting nvidia
Hello list, I am struggling to troubleshoot why the nvidia proprietary driver doesn't work anymore (after some kernel updates previously). The G06 driver worked fine, including cuda some time ago... reverted to nouveau then, but now I need the proprietary driver working again with 3D acceleration and so on... I followed the instructions in the SDB. GFX card: 03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) Subsystem: eVga.com. Corp. Device 6593 Kernel modules: nouveau when starting the display manager service, after a long wait I get a very reduced GUI, with only one of three monitors working. neither the nvidia module nor nouveau gets loaded (at least the blacklisting nouveau seems to work...). But there are no Xorg logs created in /var/log/, which where my main resource for troubleshooting in the past. there's nothing in dmesg or lsmod to see about the nvidia modules... how to continue troubleshooting? finally: # rpm -qa | grep -i nvidia | sort kernel-firmware-nvidia-20230724-150500.3.9.1.noarch kernel-firmware-nvidia-gspx-G06-545.29.02-150400.9.15.1.x86_64 kernel-firmware-nvidia-gspx-G06-550.90.07-150500.11.29.1.x86_64 libva-nvidia-driver-0.0.12-lp155.2.1.x86_64 nvidia-compute-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-compute-G06-550.90.07-lp155.23.1.x86_64 nvidia-driver-G06-kmp-default-550.90.07_k5.14.21_150500.53-lp155.23.1.x86_64 nvidia-gl-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-gl-G06-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-550.90.07-lp155.23.1.x86_64 openSUSE-repos-Leap-NVIDIA-20230804.41e41a9-lp155.2.6.1.x86_64 pcp-pmda-nvidia-gpu-5.2.5-150400.5.6.3.x86_64 thank you and regards, Paul Neuwirth
Am Donnerstag, 20. Juni 2024, 10:44:14 CEST schrieb Paul Neuwirth via openSUSE Users:
Hello list,
I am struggling to troubleshoot why the nvidia proprietary driver doesn't work anymore (after some kernel updates previously). The G06 driver worked fine, including cuda some time ago... reverted to nouveau then, but now I need the proprietary driver working again with 3D acceleration and so on... I followed the instructions in the SDB.
GFX card: 03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) Subsystem: eVga.com. Corp. Device 6593 Kernel modules: nouveau
when starting the display manager service, after a long wait I get a very reduced GUI, with only one of three monitors working. neither the nvidia module nor nouveau gets loaded (at least the blacklisting nouveau seems to work...). But there are no Xorg logs created in /var/log/, which where my main resource for troubleshooting in the past.
there's nothing in dmesg or lsmod to see about the nvidia modules...
how to continue troubleshooting?
finally: # rpm -qa | grep -i nvidia | sort kernel-firmware-nvidia-20230724-150500.3.9.1.noarch kernel-firmware-nvidia-gspx-G06-545.29.02-150400.9.15.1.x86_64 kernel-firmware-nvidia-gspx-G06-550.90.07-150500.11.29.1.x86_64 libva-nvidia-driver-0.0.12-lp155.2.1.x86_64 nvidia-compute-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-compute-G06-550.90.07-lp155.23.1.x86_64 nvidia-driver-G06-kmp-default-550.90.07_k5.14.21_150500.53-lp155.23.1.x86_64 nvidia-gl-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-gl-G06-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-550.90.07-lp155.23.1.x86_64 openSUSE-repos-Leap-NVIDIA-20230804.41e41a9-lp155.2.6.1.x86_64 pcp-pmda-nvidia-gpu-5.2.5-150400.5.6.3.x86_64
thank you and regards,
Paul Neuwirth
Have you enabled secure boot in UEFI? If yes, have you added the new Nvidia Key to the Mok? Stephan
On Thursday 2024-06-20 11:11, Stephan Hemeier via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 11:11:07 From: Stephan Hemeier via openSUSE Users <users@lists.opensuse.org> Reply-To: Stephan Hemeier <Sauerlandlinux@gmx.de> To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Am Donnerstag, 20. Juni 2024, 10:44:14 CEST schrieb Paul Neuwirth via openSUSE Users:
Hello list,
I am struggling to troubleshoot why the nvidia proprietary driver doesn't work anymore (after some kernel updates previously). The G06 driver worked fine, including cuda some time ago... reverted to nouveau then, but now I need the proprietary driver working again with 3D acceleration and so on... I followed the instructions in the SDB.
GFX card: 03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) Subsystem: eVga.com. Corp. Device 6593 Kernel modules: nouveau
when starting the display manager service, after a long wait I get a very reduced GUI, with only one of three monitors working. neither the nvidia module nor nouveau gets loaded (at least the blacklisting nouveau seems to work...). But there are no Xorg logs created in /var/log/, which where my main resource for troubleshooting in the past.
there's nothing in dmesg or lsmod to see about the nvidia modules...
how to continue troubleshooting?
finally: # rpm -qa | grep -i nvidia | sort kernel-firmware-nvidia-20230724-150500.3.9.1.noarch kernel-firmware-nvidia-gspx-G06-545.29.02-150400.9.15.1.x86_64 kernel-firmware-nvidia-gspx-G06-550.90.07-150500.11.29.1.x86_64 libva-nvidia-driver-0.0.12-lp155.2.1.x86_64 nvidia-compute-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-compute-G06-550.90.07-lp155.23.1.x86_64 nvidia-driver-G06-kmp-default-550.90.07_k5.14.21_150500.53-lp155.23.1.x86_64 nvidia-gl-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-gl-G06-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-550.90.07-lp155.23.1.x86_64 openSUSE-repos-Leap-NVIDIA-20230804.41e41a9-lp155.2.6.1.x86_64 pcp-pmda-nvidia-gpu-5.2.5-150400.5.6.3.x86_64
thank you and regards,
Paul Neuwirth
Have you enabled secure boot in UEFI?
If yes, have you added the new Nvidia Key to the Mok?
Stephan
No, no UEFI boot at all. Just found out, that the upgrade somehow enabled Xwayland - thus no Xorg.0.log - uncommented the line in /etc/gdm/custom.conf to disable it. Xorg.0.log now reads: [ 65778.364] X.Org X Server 1.21.1.12 X Protocol Version 11, Revision 0 [ 65778.364] Current Operating System: Linux lambda 6.8.8-lp155.9-default #1 SMP PREEMPT_DYNAMIC TKG Mon Apr 29 05:24:46 UTC 2024 (5cd3298 x86_64 [ 65778.364] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-lp155.9-default root=UUID=406791e3-4103-4da5-8235-aa35ed5be74c nosplash resume=/dev/disk/by-id/scsi-36c81f660cfd00400232bc7c603abcd39-part4 splash=nosplash debug showopts elevator=cfq vga=792 mitigations=auto [ 65778.364] [ 65778.364] Current version of pixman: 0.40.0 [ 65778.364] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 65778.364] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 65778.364] (==) Log file: "/var/log/Xorg.0.log", Time: Thu Jun 20 10:51:38 2024 [ 65778.365] (==) Using config file: "/etc/X11/xorg.conf" [ 65778.365] (==) Using config directory: "/etc/X11/xorg.conf.d" [ 65778.365] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 65778.365] (==) ServerLayout "Layout0" [ 65778.365] (**) |-->Screen "Screen0" (0) [ 65778.365] (**) | |-->Monitor "Monitor0" [ 65778.366] (**) | |-->Device "Device0" [ 65778.366] (**) |-->Input Device "Keyboard0" [ 65778.366] (**) |-->Input Device "Mouse0" [ 65778.366] (**) Allowing byte-swapped clients [ 65778.366] (==) Automatically adding devices [ 65778.366] (==) Automatically enabling devices [ 65778.366] (==) Automatically adding GPU devices [ 65778.366] (==) Automatically binding GPU devices [ 65778.366] (==) Max clients allowed: 512, resource mask: 0xfffff [ 65778.366] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/misc/sgi:unscaled, /usr/share/fonts/truetype/, built-ins [ 65778.366] (==) ModulePath set to "/usr/lib64/xorg/modules" [ 65778.366] (WW) Ignoring unrecognized extension "XFree86-DGA" [ 65778.366] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled. [ 65778.366] (WW) Disabling Keyboard0 [ 65778.366] (WW) Disabling Mouse0 [ 65778.366] (II) Loader magic: 0x557590578e00 [ 65778.366] (II) Module ABI versions: [ 65778.366] X.Org ANSI C Emulation: 0.4 [ 65778.366] X.Org Video Driver: 25.2 [ 65778.366] X.Org XInput driver : 24.4 [ 65778.366] X.Org Server Extension : 10.0 [ 65778.378] (--) using VT number 3 [ 65778.378] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [ 65778.379] (II) xfree86: Adding drm device (/dev/dri/card0) [ 65778.379] (II) Platform probe for /sys/devices/platform/simple-framebuffer.0/drm/card0 [ 65778.404] (--) PCI:*(3@0:0:0) 10de:1b06:3842:6593 rev 161, Mem @ 0x9e000000/16777216, 0x80000000/268435456, 0x90000000/33554432, I/O @ 0x00007000/128, BIOS @ 0x????????/131072 [ 65778.404] (II) LoadModule: "glx" [ 65778.405] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [ 65778.406] (II) Module glx: vendor="X.Org Foundation" [ 65778.406] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.406] ABI class: X.Org Server Extension, version 10.0 [ 65778.406] (II) LoadModule: "nvidia" [ 65778.406] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so [ 65778.407] (II) Module nvidia: vendor="NVIDIA Corporation" [ 65778.407] compiled for 1.6.99.901, module version = 1.0.0 [ 65778.407] Module class: X.Org Video Driver [ 65778.407] (II) NVIDIA dlloader X Driver 550.90.07 Fri May 31 09:34:34 UTC 2024 [ 65778.407] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs [ 65778.410] (II) Loading sub module "fb" [ 65778.410] (II) LoadModule: "fb" [ 65778.410] (II) Module "fb" already built-in [ 65778.410] (II) Loading sub module "wfb" [ 65778.410] (II) LoadModule: "wfb" [ 65778.410] (II) Loading /usr/lib64/xorg/modules/libwfb.so [ 65778.410] (II) Module wfb: vendor="X.Org Foundation" [ 65778.410] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.410] ABI class: X.Org ANSI C Emulation, version 0.4 [ 65778.503] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65778.503] (EE) NVIDIA: system's kernel log for additional error messages and [ 65778.503] (EE) NVIDIA: consult the NVIDIA README for details. [ 65778.503] (EE) No devices detected. [ 65778.503] (EE) Fatal server error: [ 65778.503] (EE) no screens found(EE) [ 65778.503] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 65778.503] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 65778.503] (EE) [ 65778.515] (EE) Server terminated with error (1). Closing log file. but nvidia-xconfig delivers a plausible configuration... and still nothing in dmesg. Paul
On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 11:23:18 From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> To: Stephan Hemeier <Sauerlandlinux@gmx.de> Cc: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
On Thursday 2024-06-20 11:11, Stephan Hemeier via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 11:11:07 From: Stephan Hemeier via openSUSE Users <users@lists.opensuse.org> Reply-To: Stephan Hemeier <Sauerlandlinux@gmx.de> To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Am Donnerstag, 20. Juni 2024, 10:44:14 CEST schrieb Paul Neuwirth via openSUSE Users:
Hello list,
I am struggling to troubleshoot why the nvidia proprietary driver doesn't work anymore (after some kernel updates previously). The G06 driver worked fine, including cuda some time ago... reverted to nouveau then, but now I need the proprietary driver working again with 3D acceleration and so on... I followed the instructions in the SDB.
GFX card: 03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) Subsystem: eVga.com. Corp. Device 6593 Kernel modules: nouveau
when starting the display manager service, after a long wait I get a very reduced GUI, with only one of three monitors working. neither the nvidia module nor nouveau gets loaded (at least the blacklisting nouveau seems to work...). But there are no Xorg logs created in /var/log/, which where my main resource for troubleshooting in the past.
there's nothing in dmesg or lsmod to see about the nvidia modules...
how to continue troubleshooting?
finally: # rpm -qa | grep -i nvidia | sort kernel-firmware-nvidia-20230724-150500.3.9.1.noarch kernel-firmware-nvidia-gspx-G06-545.29.02-150400.9.15.1.x86_64 kernel-firmware-nvidia-gspx-G06-550.90.07-150500.11.29.1.x86_64 libva-nvidia-driver-0.0.12-lp155.2.1.x86_64 nvidia-compute-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-compute-G06-550.90.07-lp155.23.1.x86_64 nvidia-driver-G06-kmp-default-550.90.07_k5.14.21_150500.53-lp155.23.1.x86_64 nvidia-gl-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-gl-G06-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-550.90.07-lp155.23.1.x86_64 openSUSE-repos-Leap-NVIDIA-20230804.41e41a9-lp155.2.6.1.x86_64 pcp-pmda-nvidia-gpu-5.2.5-150400.5.6.3.x86_64
thank you and regards,
Paul Neuwirth
Have you enabled secure boot in UEFI?
If yes, have you added the new Nvidia Key to the Mok?
Stephan
No, no UEFI boot at all.
Just found out, that the upgrade somehow enabled Xwayland - thus no Xorg.0.log - uncommented the line in /etc/gdm/custom.conf to disable it.
Xorg.0.log now reads: [ 65778.364] X.Org X Server 1.21.1.12 X Protocol Version 11, Revision 0 [ 65778.364] Current Operating System: Linux lambda 6.8.8-lp155.9-default #1 SMP PREEMPT_DYNAMIC TKG Mon Apr 29 05:24:46 UTC 2024 (5cd3298 x86_64 [ 65778.364] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-lp155.9-default root=UUID=406791e3-4103-4da5-8235-aa35ed5be74c nosplash resume=/dev/disk/by-id/scsi-36c81f660cfd00400232bc7c603abcd39-part4 splash=nosplash debug showopts elevator=cfq vga=792 mitigations=auto [ 65778.364] [ 65778.364] Current version of pixman: 0.40.0 [ 65778.364] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 65778.364] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 65778.364] (==) Log file: "/var/log/Xorg.0.log", Time: Thu Jun 20 10:51:38 2024 [ 65778.365] (==) Using config file: "/etc/X11/xorg.conf" [ 65778.365] (==) Using config directory: "/etc/X11/xorg.conf.d" [ 65778.365] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 65778.365] (==) ServerLayout "Layout0" [ 65778.365] (**) |-->Screen "Screen0" (0) [ 65778.365] (**) | |-->Monitor "Monitor0" [ 65778.366] (**) | |-->Device "Device0" [ 65778.366] (**) |-->Input Device "Keyboard0" [ 65778.366] (**) |-->Input Device "Mouse0" [ 65778.366] (**) Allowing byte-swapped clients [ 65778.366] (==) Automatically adding devices [ 65778.366] (==) Automatically enabling devices [ 65778.366] (==) Automatically adding GPU devices [ 65778.366] (==) Automatically binding GPU devices [ 65778.366] (==) Max clients allowed: 512, resource mask: 0xfffff [ 65778.366] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/misc/sgi:unscaled, /usr/share/fonts/truetype/, built-ins [ 65778.366] (==) ModulePath set to "/usr/lib64/xorg/modules" [ 65778.366] (WW) Ignoring unrecognized extension "XFree86-DGA" [ 65778.366] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled. [ 65778.366] (WW) Disabling Keyboard0 [ 65778.366] (WW) Disabling Mouse0 [ 65778.366] (II) Loader magic: 0x557590578e00 [ 65778.366] (II) Module ABI versions: [ 65778.366] X.Org ANSI C Emulation: 0.4 [ 65778.366] X.Org Video Driver: 25.2 [ 65778.366] X.Org XInput driver : 24.4 [ 65778.366] X.Org Server Extension : 10.0 [ 65778.378] (--) using VT number 3
[ 65778.378] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [ 65778.379] (II) xfree86: Adding drm device (/dev/dri/card0) [ 65778.379] (II) Platform probe for /sys/devices/platform/simple-framebuffer.0/drm/card0 [ 65778.404] (--) PCI:*(3@0:0:0) 10de:1b06:3842:6593 rev 161, Mem @ 0x9e000000/16777216, 0x80000000/268435456, 0x90000000/33554432, I/O @ 0x00007000/128, BIOS @ 0x????????/131072 [ 65778.404] (II) LoadModule: "glx" [ 65778.405] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [ 65778.406] (II) Module glx: vendor="X.Org Foundation" [ 65778.406] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.406] ABI class: X.Org Server Extension, version 10.0 [ 65778.406] (II) LoadModule: "nvidia" [ 65778.406] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so [ 65778.407] (II) Module nvidia: vendor="NVIDIA Corporation" [ 65778.407] compiled for 1.6.99.901, module version = 1.0.0 [ 65778.407] Module class: X.Org Video Driver [ 65778.407] (II) NVIDIA dlloader X Driver 550.90.07 Fri May 31 09:34:34 UTC 2024 [ 65778.407] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs [ 65778.410] (II) Loading sub module "fb" [ 65778.410] (II) LoadModule: "fb" [ 65778.410] (II) Module "fb" already built-in [ 65778.410] (II) Loading sub module "wfb" [ 65778.410] (II) LoadModule: "wfb" [ 65778.410] (II) Loading /usr/lib64/xorg/modules/libwfb.so [ 65778.410] (II) Module wfb: vendor="X.Org Foundation" [ 65778.410] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.410] ABI class: X.Org ANSI C Emulation, version 0.4 [ 65778.503] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65778.503] (EE) NVIDIA: system's kernel log for additional error messages and [ 65778.503] (EE) NVIDIA: consult the NVIDIA README for details. [ 65778.503] (EE) No devices detected. [ 65778.503] (EE) Fatal server error: [ 65778.503] (EE) no screens found(EE) [ 65778.503] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 65778.503] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 65778.503] (EE) [ 65778.515] (EE) Server terminated with error (1). Closing log file.
but nvidia-xconfig delivers a plausible configuration... and still nothing in dmesg.
Paul
some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some suspicious looking lines while building the modules and during dracut: # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory # dracut-install: Failed to find module 'nvidia_drm' # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm also noticed, that `modprobe nvidia` doesn't find the module (don't know if that's normal behaviour?): modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg) btw. nouveau worked fine while nvidia drivers were not present... Thank you, Paul
Hello, Sorry, by mistake, I sent a direct mail. In the Message; Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201147400.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 11:52:44 +0200 (CEST) [PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written: PN> On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote: PN> > Date: Thu, 20 Jun 2024 11:23:18 PN> > From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> PN> > Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> PN> > To: Stephan Hemeier <Sauerlandlinux@gmx.de> PN> > Cc: users@lists.opensuse.org PN> > Subject: Re: troubleshooting nvidia [...] PN> some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia PN> packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some PN> suspicious looking lines while building the modules and during dracut: PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # dracut-install: Failed to find module 'nvidia_drm' PN> # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm PN> also noticed, that `modprobe nvidia` doesn't find the module (don't know if PN> that's normal behaviour?): PN> modprobe: ERROR: could not find module by name='nvidia' PN> modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg) I think this might be of interest to you. https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio... Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: m.nomiya+suse @ gmail.com ┃\/彡 ┗━━┛ " Hassabis says that no one really knows for sure that AI will become a major danger. But he is certain that if progress continues at its current pace, there isn’t much time to develop safeguards. "I can see the kinds of things we're building into the Gemini series right, and we have no reason to believe that they won't work," he says." -- "Google DeepMind's CEO Says Its Next Algorithm Will Eclipse ChatGPT" --
On Thursday 2024-06-20 12:57, Masaru Nomiya wrote:
Date: Thu, 20 Jun 2024 12:57:32 From: Masaru Nomiya <nomiya@lake.dti.ne.jp> Reply-To: m.nomiya+suse@gmail.com To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Hello,
Sorry, by mistake, I sent a direct mail.
In the Message;
Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201147400.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 11:52:44 +0200 (CEST)
[PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written:
PN> On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote:
PN> > Date: Thu, 20 Jun 2024 11:23:18 PN> > From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> PN> > Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> PN> > To: Stephan Hemeier <Sauerlandlinux@gmx.de> PN> > Cc: users@lists.opensuse.org PN> > Subject: Re: troubleshooting nvidia [...] PN> some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia PN> packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some PN> suspicious looking lines while building the modules and during dracut: PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # dracut-install: Failed to find module 'nvidia_drm' PN> # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm
PN> also noticed, that `modprobe nvidia` doesn't find the module (don't know if PN> that's normal behaviour?): PN> modprobe: ERROR: could not find module by name='nvidia' PN> modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
I think this might be of interest to you.
https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio...
Best Regards.
Indeed, there are similarities. I meanwhile discovered, that the nvidia modules are not built for the current kernel 6.8.8-lp155.9-default - and it keeps using some files of an older kernel 5.14.21.... and the rpms indeed require that kernel-dev package. I tried to uninstall these old kernel packages (and ran zypper -n purge-kernels), and it suggested downgrade of the nvidia drivers to another repository (obs://build.opensuse.org/home:regataos) - but still the build seems to fail as dracut still cannot find nvidia_drm. Paul Neuwirth
Am Donnerstag, 20. Juni 2024, 13:25:38 CEST schrieb Paul Neuwirth via openSUSE Users:
On Thursday 2024-06-20 12:57, Masaru Nomiya wrote:
Date: Thu, 20 Jun 2024 12:57:32 From: Masaru Nomiya <nomiya@lake.dti.ne.jp> Reply-To: m.nomiya+suse@gmail.com To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Hello,
Sorry, by mistake, I sent a direct mail.
In the Message;
Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201147400.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 11:52:44 +0200 (CEST)
[PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written:
PN> On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote:
PN> > Date: Thu, 20 Jun 2024 11:23:18 PN> > From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> PN> > Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> PN> > To: Stephan Hemeier <Sauerlandlinux@gmx.de> PN> > Cc: users@lists.opensuse.org PN> > Subject: Re: troubleshooting nvidia [...] PN> some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia PN> packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some PN> suspicious looking lines while building the modules and during dracut: PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # dracut-install: Failed to find module 'nvidia_drm' PN> # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm
PN> also noticed, that `modprobe nvidia` doesn't find the module (don't know if PN> that's normal behaviour?): PN> modprobe: ERROR: could not find module by name='nvidia' PN> modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
I think this might be of interest to you.
https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio...
Best Regards.
Indeed, there are similarities. I meanwhile discovered, that the nvidia modules are not built for the current kernel 6.8.8-lp155.9-default - and it keeps using some files of an older kernel 5.14.21.... and the rpms indeed require that kernel-dev package. I tried to uninstall these old kernel packages (and ran zypper -n purge-kernels), and it suggested downgrade of the nvidia drivers to another repository (obs://build.opensuse.org/home:regataos) - but still the build seems to fail as dracut still cannot find nvidia_drm.
Paul Neuwirth
The nvidia packages for Leap are build against the first kernel of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2. So they are working with all new kernels from the SLE Update Repo. When you install kernel 6.8.8 from somewhere, the nvidia driver will not work anymore because the driver for Leap 15.5 is not using that kernel when building. For kernel:stable:backports you need to deinstall the nvidia Packages and install the driver by Hand. Stephan
On Thursday 2024-06-20 13:39, Stephan Hemeier via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 13:39:19 From: Stephan Hemeier via openSUSE Users <users@lists.opensuse.org> Reply-To: Stephan Hemeier <Sauerlandlinux@gmx.de> To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Am Donnerstag, 20. Juni 2024, 13:25:38 CEST schrieb Paul Neuwirth via openSUSE Users:
On Thursday 2024-06-20 12:57, Masaru Nomiya wrote:
Date: Thu, 20 Jun 2024 12:57:32 From: Masaru Nomiya <nomiya@lake.dti.ne.jp> Reply-To: m.nomiya+suse@gmail.com To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Hello,
Sorry, by mistake, I sent a direct mail.
In the Message;
Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201147400.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 11:52:44 +0200 (CEST)
[PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written:
PN> On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote:
PN> > Date: Thu, 20 Jun 2024 11:23:18 PN> > From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> PN> > Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> PN> > To: Stephan Hemeier <Sauerlandlinux@gmx.de> PN> > Cc: users@lists.opensuse.org PN> > Subject: Re: troubleshooting nvidia [...] PN> some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia PN> packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some PN> suspicious looking lines while building the modules and during dracut: PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory PN> # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory PN> # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory PN> # dracut-install: Failed to find module 'nvidia_drm' PN> # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm
PN> also noticed, that `modprobe nvidia` doesn't find the module (don't know if PN> that's normal behaviour?): PN> modprobe: ERROR: could not find module by name='nvidia' PN> modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg)
I think this might be of interest to you.
https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio...
Best Regards.
Indeed, there are similarities. I meanwhile discovered, that the nvidia modules are not built for the current kernel 6.8.8-lp155.9-default - and it keeps using some files of an older kernel 5.14.21.... and the rpms indeed require that kernel-dev package. I tried to uninstall these old kernel packages (and ran zypper -n purge-kernels), and it suggested downgrade of the nvidia drivers to another repository (obs://build.opensuse.org/home:regataos) - but still the build seems to fail as dracut still cannot find nvidia_drm.
Paul Neuwirth
The nvidia packages for Leap are build against the first kernel of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2. So they are working with all new kernels from the SLE Update Repo.
When you install kernel 6.8.8 from somewhere, the nvidia driver will not work anymore because the driver for Leap 15.5 is not using that kernel when building.
For kernel:stable:backports you need to deinstall the nvidia Packages and install the driver by Hand.
Stephan
ok this was misleading then. but (current) kernel is still from official repo. nevertheless tried to reinstall the nvidia drivers from that regataos repo, and build fails in another way: # make[4]: *** [/usr/src/linux-6.8.8-lp155.9/scripts/Makefile.build:244: /usr/src/kernel-modules/nvidia-550.78-default/nvidia/nv-dmabuf.o] Error 1 # cc: error: unrecognized command line option ‘-mharden-sls=all’; did you mean ‘-mhard-float’? # make[4]: *** [/usr/src/linux-6.8.8-lp155.9/scripts/Makefile.build:244: /usr/src/kernel-modules/nvidia-550.78-default/nvidia/nv-nano-timer.o] Error 1 maybe I should try and distribution upgrade to 15.6 - would have to do that anyway sometime later. Paul
On 2024-06-20 14:12, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 13:39, Stephan Hemeier via openSUSE Users wrote:
The nvidia packages for Leap are build against the first kernel of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2. So they are working with all new kernels from the SLE Update Repo.
When you install kernel 6.8.8 from somewhere, the nvidia driver will not work anymore because the driver for Leap 15.5 is not using that kernel when building.
For kernel:stable:backports you need to deinstall the nvidia Packages and install the driver by Hand.
Stephan
ok this was misleading then. but (current) kernel is still from official repo. nevertheless tried to reinstall the nvidia drivers from that regataos repo, and build fails in another way: # make[4]: *** [/usr/src/linux-6.8.8-lp155.9/scripts/Makefile.build:244:
You are not using the official kernel for Leap 15.5. Leap 15.5 is using kernels from the 5.14 series, not 6.8 Telcontar:~ # uname -a Linux Telcontar 5.14.21-150500.55.65-default #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d) x86_64 x86_64 x86_64 GNU/Linux Telcontar:~ # cat /etc/os-release NAME="openSUSE Leap" VERSION="15.5" ID="opensuse-leap" ID_LIKE="suse opensuse" VERSION_ID="15.5" PRETTY_NAME="openSUSE Leap 15.5" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:opensuse:leap:15.5" BUG_REPORT_URL="https://bugs.opensuse.org" HOME_URL="https://www.opensuse.org/" DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap" LOGO="distributor-logo-Leap" Telcontar:~ # -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
On Thursday 2024-06-20 14:36, Carlos E. R. wrote:
Date: Thu, 20 Jun 2024 14:36:57 From: Carlos E. R. <robin.listas@telefonica.net> To: oS-EN <users@lists.opensuse.org> Subject: Re: troubleshooting nvidia
On 2024-06-20 14:12, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 13:39, Stephan Hemeier via openSUSE Users wrote:
The nvidia packages for Leap are build against the first kernel of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2. So they are working with all new kernels from the SLE Update Repo.
When you install kernel 6.8.8 from somewhere, the nvidia driver will not work anymore because the driver for Leap 15.5 is not using that kernel when building.
For kernel:stable:backports you need to deinstall the nvidia Packages and install the driver by Hand.
Stephan
ok this was misleading then. but (current) kernel is still from official repo. nevertheless tried to reinstall the nvidia drivers from that regataos repo, and build fails in another way: # make[4]: *** [/usr/src/linux-6.8.8-lp155.9/scripts/Makefile.build:244:
You are not using the official kernel for Leap 15.5.
Leap 15.5 is using kernels from the 5.14 series, not 6.8
oh. how could I miss this. thought since zypper dup I were using the official kernel. But did zypper dup with all repositories (at least those available for the new distribution) enabled.... I tried to clean some things up (install glibc to official repo) and install of libzstd just failed (first thing yast sw_single tried to reinstall). I now have a completely broken system. nearly every command fails "/lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/libzstd.so.1)". i'll never do such zypper dup with all repos again... what do you suggest? boot to a snapshot? use the installer using netboot? Paul Neuwirth
On 2024-06-20 15:02, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 14:36, Carlos E. R. wrote:
You are not using the official kernel for Leap 15.5.
Leap 15.5 is using kernels from the 5.14 series, not 6.8
oh. how could I miss this. thought since zypper dup I were using the official kernel. But did zypper dup with all repositories (at least those available for the new distribution) enabled....
I tried to clean some things up (install glibc to official repo) and install of libzstd just failed (first thing yast sw_single tried to reinstall). I now have a completely broken system. nearly every command fails "/lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/libzstd.so.1)".
i'll never do such zypper dup with all repos again...
what do you suggest? boot to a snapshot? use the installer using netboot?
The big problem is not using dup, but what repos you have enabled. You can revert the situation with a zypper dup, but first disabling the repos that caused the problem. Boot to a snapshot, well, yes, if you can find a correct snapshot. Boot the 15.5 DVD and forcing an upgrade to it, might work, too. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
On Thursday 2024-06-20 15:19, Carlos E. R. wrote:
Date: Thu, 20 Jun 2024 15:19:26 From: Carlos E. R. <robin.listas@telefonica.net> To: oS-EN <users@lists.opensuse.org> Subject: Re: troubleshooting nvidia
On 2024-06-20 15:02, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 14:36, Carlos E. R. wrote:
You are not using the official kernel for Leap 15.5.
Leap 15.5 is using kernels from the 5.14 series, not 6.8
oh. how could I miss this. thought since zypper dup I were using the official kernel. But did zypper dup with all repositories (at least those available for the new distribution) enabled....
I tried to clean some things up (install glibc to official repo) and install of libzstd just failed (first thing yast sw_single tried to reinstall). I now have a completely broken system. nearly every command fails "/lib64/libc.so.6: version `GLIBC_2.34' not found (required by /usr/lib64/libzstd.so.1)".
i'll never do such zypper dup with all repos again...
what do you suggest? boot to a snapshot? use the installer using netboot?
The big problem is not using dup, but what repos you have enabled.
You can revert the situation with a zypper dup, but first disabling the repos that caused the problem.
Boot to a snapshot, well, yes, if you can find a correct snapshot.
Boot the 15.5 DVD and forcing an upgrade to it, might work, too.
can I fix the system from a readonly snapshot boot? otherwise: I tend to try to Install 15.6 - but cannot use a physical medium. I last installed 15.1 using that method for complete remote headless installations. downloaded the initrd and linux files from download.opensuse.org for 15.6 and edited /srv/tftp/tftpboot/pxelinux.cfg/default on my main server the relevant part looks like this: --- serial 0 115200 default install64 prompt 1 timeout 30 #Install x86_64 Linux label install64 menu label Install x86_64 Linux kernel setup_linux append initrd=setup_initrd nosplash textonly showopts install=https://download.opensuse.org/distribution/leap/15.6/repo/oss/ loghost=172.18.0.1 proxy=http://172.18.0.1:3128 startshell=1 WaitReboot=1 nomodeset --- I currently want to avoid any mistakes. This is currently the only working PC with a monitor for me (and gladly I had a ssh session on another server when this dilemma occured.) If boot fails I need to use ssh on my cell phone... Thank you for cross checking
On 2024-06-20 15:32, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 15:19, Carlos E. R. wrote:
i'll never do such zypper dup with all repos again...
what do you suggest? boot to a snapshot? use the installer using netboot?
The big problem is not using dup, but what repos you have enabled.
You can revert the situation with a zypper dup, but first disabling the repos that caused the problem.
Boot to a snapshot, well, yes, if you can find a correct snapshot.
Boot the 15.5 DVD and forcing an upgrade to it, might work, too.
can I fix the system from a readonly snapshot boot?
I have no experience with snapshots, sorry. I think that all snapshots are readonly till you decide to stabilize on one and make that one final. But I am not sure.
otherwise:
I tend to try to Install 15.6 - but cannot use a physical medium.
I last installed 15.1 using that method for complete remote headless installations.
downloaded the initrd and linux files from download.opensuse.org for 15.6 and edited /srv/tftp/tftpboot/pxelinux.cfg/default on my main server
the relevant part looks like this:
--- serial 0 115200 default install64 prompt 1 timeout 30
#Install x86_64 Linux label install64 menu label Install x86_64 Linux kernel setup_linux append initrd=setup_initrd nosplash textonly showopts install=https://download.opensuse.org/distribution/leap/15.6/repo/oss/ loghost=172.18.0.1 proxy=http://172.18.0.1:3128 startshell=1 WaitReboot=1 nomodeset ---
I currently want to avoid any mistakes. This is currently the only working PC with a monitor for me (and gladly I had a ssh session on another server when this dilemma occured.)
If boot fails I need to use ssh on my cell phone... Thank you for cross checking
Sorry, I don't know. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
On Thursday 2024-06-20 20:58, Carlos E. R. wrote:
Date: Thu, 20 Jun 2024 20:58:07 From: Carlos E. R. <robin.listas@telefonica.net> To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
On 2024-06-20 15:32, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 15:19, Carlos E. R. wrote:
i'll never do such zypper dup with all repos again...
what do you suggest? boot to a snapshot? use the installer using netboot?
The big problem is not using dup, but what repos you have enabled.
You can revert the situation with a zypper dup, but first disabling the repos that caused the problem.
Boot to a snapshot, well, yes, if you can find a correct snapshot.
can I fix the system from a readonly snapshot boot?
I have no experience with snapshots, sorry.
thank you again. had a night of sleep and it sprung into my mind. fix was as easy as boot to the snapshot and "snapper rollback". great thing btrfs and snapshots... I'll clean up my repositories and do a zypper dup to Leap 15.6 and try again with nvidia. Regards, Paul
On 2024-06-21 05:09, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 20:58, Carlos E. R. wrote:
On 2024-06-20 15:32, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 15:19, Carlos E. R. wrote:
i'll never do such zypper dup with all repos again...
what do you suggest? boot to a snapshot? use the installer using netboot?
The big problem is not using dup, but what repos you have enabled.
You can revert the situation with a zypper dup, but first disabling the repos that caused the problem.
Boot to a snapshot, well, yes, if you can find a correct snapshot.
can I fix the system from a readonly snapshot boot?
I have no experience with snapshots, sorry.
thank you again. had a night of sleep and it sprung into my mind. fix was as easy as boot to the snapshot and "snapper rollback". great thing btrfs and snapshots...
I'll clean up my repositories and do a zypper dup to Leap 15.6 and try again with nvidia.
That should work. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
back in course and working now. I upgraded to Leap 15.6 with only the essential repos activated. Kernel 6.8 something was still installed afterwards, removed it. Only leaving the 6.4 of the repos. Nvidia drivers were built and are loaded at boot. but I got blank (black) screen(s) - couldn't switch between consoles, when the nvidia drivers were loaded in the initrd. I finally got it to work when booting with "nomodeset" so got finally 3 working screens (nvidia modules loaded) and I started the display-manager.service. The initialization and login and reinitialization took ages (more than 5 minutes). But it works. Thank you all again for your help. Most importantly I learned, not to do a distribution upgrade with all repos activated. But thought, that "unmaintained" packages or those installed with a higher version than in the repos would be uninstalled or downgraded respectively - but that doesn't seem to happen :-/ Regards Paul
Hello, In the Message; Subject : Re: troubleshooting nvidia Message-ID : <13327312.EHj4zDvE9Q@linux64> Date & Time: Thu, 20 Jun 2024 13:39:19 +0200 [SH] == Stephan Hemeier via openSUSE Users <users@lists.opensuse.org> has written: SH> Am Donnerstag, 20. Juni 2024, 13:25:38 CEST schrieb Paul Neuwirth via openSUSE Users: [...] SH> The nvidia packages for Leap are build against the first kernel SH> of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2. SH> So they are working with all new kernels from the SLE Update Repo. SH> When you install kernel 6.8.8 from somewhere, the nvidia driver SH> will not work anymore because the driver for Leap 15.5 is not SH> using that kernel when building. SH> For kernel:stable:backports you need to deinstall the nvidia SH> Packages and install the driver by Hand. This clears up my misunderstanding. Thanks, Stephen. For building the proprietary diver's; 1. for kernel 6.8.8, install kernel-devel, kernel-source,kernel-default, kernel-default-devel, kernel-syms,kernel-macros ※ It does not matter if you have other versions of kernel. 2. shutdown -r now, then boot up with kernel 6.8.8. 3. At the login screen, press Ctrl+Alt+F1 keys 4. Login as root 5. Type # init 3 6. press Ctrl+Alt+F2 keys 7. Log in as root, again 8. set gcc to gcc13 then, 9. # sh ./NVIDIA-Linux-x86_64-550.90.07.run -aq That's all. Best Regards & Good Night. --- ┏━━┓彡 Masaru Nomiya mail-to: m.nomiya+suse @ gmail.com ┃\/彡 ┗━━┛ "The question of who holds the platform and whether the person or organisation holding it is trustworthy has serious and profound implications in these volatile times. Once trust is broken, it is extremely difficult to restore. It is necessary to diversify in advance." -- Financial Times --
On 20.06.2024 14:39, Stephan Hemeier via openSUSE Users wrote:
The nvidia packages for Leap are build against the first kernel of Leap, for Leap 15.5 that is kernel 5.14.21-150500.53.2.
Except there is no such requirement in the package itself. bor@bor-Latitude-E5450:~/tmp$ rpm -q --requires -p nvidia-driver-G06-kmp-default-550.78_k6.8.7_1-22.1.x86_64.rpm | grep kernel-default warning: nvidia-driver-G06-kmp-default-550.78_k6.8.7_1-22.1.x86_64.rpm: Header V4 DSA/SHA512 Signature, key ID c66b6eae: NOKEY kernel-default kernel-default-devel bor@bor-Latitude-E5450:~/tmp$ Trigger on kernel-default-devel which actually builds the module is unversionned as well. triggerin scriptlet (using /bin/bash) -- kernel-default-devel There is no exact version requirement. I know it was supposed to be this way (it was in response to my bug report :) ), but apparently it was lost. So it should pick whatever kernel-default-devel was last installed (it is not the same as the highest version) and build for this kernel.
So they are working with all new kernels from the SLE Update Repo.
When you install kernel 6.8.8 from somewhere, the nvidia driver will not work anymore because the driver for Leap 15.5 is not using that kernel when building.
For kernel:stable:backports you need to deinstall the nvidia Packages and install the driver by Hand.
Stephan
Hello, In the Message; Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201309210.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 13:25:38 +0200 (CEST) [PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written: PN> On Thursday 2024-06-20 12:57, Masaru Nomiya wrote: [...] MN> > I think this might be of interest to you. MN> > https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio... PN> Indeed, there are similarities. I meanwhile discovered, that the PN> nvidia modules are not built for the current kernel PN> 6.8.8-lp155.9-default - and it keeps using some files of an older PN> kernel 5.14.21.... and the rpms indeed require that kernel-dev PN> package. I tried to uninstall these old kernel packages (and ran PN> zypper -n purge-kernels), and it suggested downgrade of the PN> nvidia drivers to another repository PN> (obs://build.opensuse.org/home:regataos) - but still the build PN> seems to fail as dracut still cannot find nvidia_drm. Which version of gcc are using? Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: m.nomiya+suse @ gmail.com ┃\/彡 ┗━━┛ "No Windows, no gains!" ... "Why, I am wrong?" -- Bill --
On Thursday 2024-06-20 14:01, Masaru Nomiya wrote:
Date: Thu, 20 Jun 2024 14:01:02 From: Masaru Nomiya <nomiya@lake.dti.ne.jp> Reply-To: m.nomiya+suse@gmail.com To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Hello,
In the Message;
Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201309210.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 13:25:38 +0200 (CEST)
[PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written:
PN> On Thursday 2024-06-20 12:57, Masaru Nomiya wrote:
[...] MN> > I think this might be of interest to you.
MN> > https://forums.opensuse.org/t/nvidia-driver-550-90-broken-plus-no-boot-optio...
PN> Indeed, there are similarities. I meanwhile discovered, that the PN> nvidia modules are not built for the current kernel PN> 6.8.8-lp155.9-default - and it keeps using some files of an older PN> kernel 5.14.21.... and the rpms indeed require that kernel-dev PN> package. I tried to uninstall these old kernel packages (and ran PN> zypper -n purge-kernels), and it suggested downgrade of the PN> nvidia drivers to another repository PN> (obs://build.opensuse.org/home:regataos) - but still the build PN> seems to fail as dracut still cannot find nvidia_drm.
Which version of gcc are using?
interesting: "gcc -v" shows gcc version 7.5.0 (SUSE Linux) "zypper if gcc" shows: Information for package gcc: ---------------------------- Repository : openSUSE-15.5-0 Name : gcc Version : 7-3.9.1 Arch : x86_64 Vendor : SUSE LLC <https://www.suse.com/> Installed Size : 0 B Installed : Yes Status : up-to-date Source package : gcc-7-3.9.1.src Upstream URL : http://gcc.gnu.org/ Summary : The system GNU C Compiler Description : The system GNU C Compiler. /usr/bin/gcc is a symbolic link to /usr/bin/gcc-7 which is part of package gcc7, this seems to be a remnant of a previous os release: Information for package gcc7: ----------------------------- Repository : @System Name : gcc7 Version : 7.5.0+r278197-150000.4.41.1 Arch : x86_64 Vendor : SUSE LLC <https://www.suse.com/> Installed Size : 72.6 MiB Installed : Yes Status : up-to-date Source package : gcc7-7.5.0+r278197-150000.4.41.1.src Upstream URL : https://gcc.gnu.org/ Summary : The GNU C Compiler and Support Files Description : Core package for the GNU Compiler Collection, including the C language frontend. Language frontends other than C are split to different sub-packages, namely gcc-ada, gcc-c++, gcc-fortran, gcc-obj, gcc-obj-c++ and gcc-go. !? now installed gcc7 from the oss repository (downgrade). uninstalled nvidia drivers, reinstalled them from official repo. result still seems to be the same. nvidia_drm not found by dracut. :-( Thank you Paul
On 2024-06-20 14:28, Paul Neuwirth via openSUSE Users wrote:
On Thursday 2024-06-20 14:01, Masaru Nomiya wrote:
now installed gcc7 from the oss repository (downgrade). uninstalled nvidia drivers, reinstalled them from official repo.
result still seems to be the same. nvidia_drm not found by dracut. :-(
Please post your repository list. Do: zypper lr --details > somefile.txt and attach that "somefile.txt". Do not paste the contents, because line wrap will make impossible to read the long lines. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
Hello, I almost turned it off of PC. (^^) In the Message; Subject : Re: troubleshooting nvidia Message-ID : <alpine.LSU.2.21.2406201412120.3177@alpha.swabian.net> Date & Time: Thu, 20 Jun 2024 14:28:56 +0200 (CEST) [PN] == Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> has written: PN> On Thursday 2024-06-20 14:01, Masaru Nomiya wrote: [...] MN> > Which version of gcc are using? PN> interesting: PN> "gcc -v" shows PN> gcc version 7.5.0 (SUSE Linux) PN> "zypper if gcc" shows: PN> Information for package gcc: PN> ---------------------------- [....] To build a proprietary driver, you must use the version of gcc used to build the kernel, or it will fail. kernel 6.8.8 is built with gcc13, so you must use gcc13. That is, you must start by installing gcc13. Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: m.nomiya+suse @ gmail.com ┃\/彡 ┗━━┛ "To hire for skills, firms will need to implement robust and intentional changes in their hiring practices ― and change is hard." -- Employers don’t practice what they preach on skills-based hiring --
participants (5)
-
Andrei Borzenkov
-
Carlos E. R.
-
Masaru Nomiya
-
Paul Neuwirth
-
Stephan Hemeier