On Thursday 2024-06-20 11:23, Paul Neuwirth via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 11:23:18 From: Paul Neuwirth via openSUSE Users <users@lists.opensuse.org> Reply-To: Paul Neuwirth <mail@paul-neuwirth.nl> To: Stephan Hemeier <Sauerlandlinux@gmx.de> Cc: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
On Thursday 2024-06-20 11:11, Stephan Hemeier via openSUSE Users wrote:
Date: Thu, 20 Jun 2024 11:11:07 From: Stephan Hemeier via openSUSE Users <users@lists.opensuse.org> Reply-To: Stephan Hemeier <Sauerlandlinux@gmx.de> To: users@lists.opensuse.org Subject: Re: troubleshooting nvidia
Am Donnerstag, 20. Juni 2024, 10:44:14 CEST schrieb Paul Neuwirth via openSUSE Users:
Hello list,
I am struggling to troubleshoot why the nvidia proprietary driver doesn't work anymore (after some kernel updates previously). The G06 driver worked fine, including cuda some time ago... reverted to nouveau then, but now I need the proprietary driver working again with 3D acceleration and so on... I followed the instructions in the SDB.
GFX card: 03:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) Subsystem: eVga.com. Corp. Device 6593 Kernel modules: nouveau
when starting the display manager service, after a long wait I get a very reduced GUI, with only one of three monitors working. neither the nvidia module nor nouveau gets loaded (at least the blacklisting nouveau seems to work...). But there are no Xorg logs created in /var/log/, which where my main resource for troubleshooting in the past.
there's nothing in dmesg or lsmod to see about the nvidia modules...
how to continue troubleshooting?
finally: # rpm -qa | grep -i nvidia | sort kernel-firmware-nvidia-20230724-150500.3.9.1.noarch kernel-firmware-nvidia-gspx-G06-545.29.02-150400.9.15.1.x86_64 kernel-firmware-nvidia-gspx-G06-550.90.07-150500.11.29.1.x86_64 libva-nvidia-driver-0.0.12-lp155.2.1.x86_64 nvidia-compute-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-compute-G06-550.90.07-lp155.23.1.x86_64 nvidia-driver-G06-kmp-default-550.90.07_k5.14.21_150500.53-lp155.23.1.x86_64 nvidia-gl-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-gl-G06-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-32bit-550.90.07-lp155.23.1.x86_64 nvidia-video-G06-550.90.07-lp155.23.1.x86_64 openSUSE-repos-Leap-NVIDIA-20230804.41e41a9-lp155.2.6.1.x86_64 pcp-pmda-nvidia-gpu-5.2.5-150400.5.6.3.x86_64
thank you and regards,
Paul Neuwirth
Have you enabled secure boot in UEFI?
If yes, have you added the new Nvidia Key to the Mok?
Stephan
No, no UEFI boot at all.
Just found out, that the upgrade somehow enabled Xwayland - thus no Xorg.0.log - uncommented the line in /etc/gdm/custom.conf to disable it.
Xorg.0.log now reads: [ 65778.364] X.Org X Server 1.21.1.12 X Protocol Version 11, Revision 0 [ 65778.364] Current Operating System: Linux lambda 6.8.8-lp155.9-default #1 SMP PREEMPT_DYNAMIC TKG Mon Apr 29 05:24:46 UTC 2024 (5cd3298 x86_64 [ 65778.364] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-lp155.9-default root=UUID=406791e3-4103-4da5-8235-aa35ed5be74c nosplash resume=/dev/disk/by-id/scsi-36c81f660cfd00400232bc7c603abcd39-part4 splash=nosplash debug showopts elevator=cfq vga=792 mitigations=auto [ 65778.364] [ 65778.364] Current version of pixman: 0.40.0 [ 65778.364] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 65778.364] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 65778.364] (==) Log file: "/var/log/Xorg.0.log", Time: Thu Jun 20 10:51:38 2024 [ 65778.365] (==) Using config file: "/etc/X11/xorg.conf" [ 65778.365] (==) Using config directory: "/etc/X11/xorg.conf.d" [ 65778.365] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 65778.365] (==) ServerLayout "Layout0" [ 65778.365] (**) |-->Screen "Screen0" (0) [ 65778.365] (**) | |-->Monitor "Monitor0" [ 65778.366] (**) | |-->Device "Device0" [ 65778.366] (**) |-->Input Device "Keyboard0" [ 65778.366] (**) |-->Input Device "Mouse0" [ 65778.366] (**) Allowing byte-swapped clients [ 65778.366] (==) Automatically adding devices [ 65778.366] (==) Automatically enabling devices [ 65778.366] (==) Automatically adding GPU devices [ 65778.366] (==) Automatically binding GPU devices [ 65778.366] (==) Max clients allowed: 512, resource mask: 0xfffff [ 65778.366] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/misc/sgi:unscaled, /usr/share/fonts/truetype/, built-ins [ 65778.366] (==) ModulePath set to "/usr/lib64/xorg/modules" [ 65778.366] (WW) Ignoring unrecognized extension "XFree86-DGA" [ 65778.366] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled. [ 65778.366] (WW) Disabling Keyboard0 [ 65778.366] (WW) Disabling Mouse0 [ 65778.366] (II) Loader magic: 0x557590578e00 [ 65778.366] (II) Module ABI versions: [ 65778.366] X.Org ANSI C Emulation: 0.4 [ 65778.366] X.Org Video Driver: 25.2 [ 65778.366] X.Org XInput driver : 24.4 [ 65778.366] X.Org Server Extension : 10.0 [ 65778.378] (--) using VT number 3
[ 65778.378] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [ 65778.379] (II) xfree86: Adding drm device (/dev/dri/card0) [ 65778.379] (II) Platform probe for /sys/devices/platform/simple-framebuffer.0/drm/card0 [ 65778.404] (--) PCI:*(3@0:0:0) 10de:1b06:3842:6593 rev 161, Mem @ 0x9e000000/16777216, 0x80000000/268435456, 0x90000000/33554432, I/O @ 0x00007000/128, BIOS @ 0x????????/131072 [ 65778.404] (II) LoadModule: "glx" [ 65778.405] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [ 65778.406] (II) Module glx: vendor="X.Org Foundation" [ 65778.406] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.406] ABI class: X.Org Server Extension, version 10.0 [ 65778.406] (II) LoadModule: "nvidia" [ 65778.406] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so [ 65778.407] (II) Module nvidia: vendor="NVIDIA Corporation" [ 65778.407] compiled for 1.6.99.901, module version = 1.0.0 [ 65778.407] Module class: X.Org Video Driver [ 65778.407] (II) NVIDIA dlloader X Driver 550.90.07 Fri May 31 09:34:34 UTC 2024 [ 65778.407] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs [ 65778.410] (II) Loading sub module "fb" [ 65778.410] (II) LoadModule: "fb" [ 65778.410] (II) Module "fb" already built-in [ 65778.410] (II) Loading sub module "wfb" [ 65778.410] (II) LoadModule: "wfb" [ 65778.410] (II) Loading /usr/lib64/xorg/modules/libwfb.so [ 65778.410] (II) Module wfb: vendor="X.Org Foundation" [ 65778.410] compiled for 1.21.1.12, module version = 1.0.0 [ 65778.410] ABI class: X.Org ANSI C Emulation, version 0.4 [ 65778.503] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the [ 65778.503] (EE) NVIDIA: system's kernel log for additional error messages and [ 65778.503] (EE) NVIDIA: consult the NVIDIA README for details. [ 65778.503] (EE) No devices detected. [ 65778.503] (EE) Fatal server error: [ 65778.503] (EE) no screens found(EE) [ 65778.503] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 65778.503] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 65778.503] (EE) [ 65778.515] (EE) Server terminated with error (1). Closing log file.
but nvidia-xconfig delivers a plausible configuration... and still nothing in dmesg.
Paul
some hints maybe. Reinstalled the nvidia drivers (`zypper rm [any nvidia packages]`, then `zypper inr --repo NVIDIA:repo-non-free`) and noticed some suspicious looking lines while building the modules and during dracut: # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-modeset.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia-uvm.ko): No such file or directory # depmod: ERROR: fstatat(5, nvidia.ko): No such file or directory # depmod: WARNING: could not open modules.order at /lib/modules/5.14.21-150500.53-default: No such file or directory # depmod: WARNING: could not open modules.builtin at /lib/modules/5.14.21-150500.53-default: No such file or directory # dracut-install: Failed to find module 'nvidia_drm' # dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.T96nUl/initramfs -N i2o_scsi --kerneldir /lib/modules/6.8.8-lp155.9-default/ -m nvidia nvidia_drm nvidia-modeset nvidia-uvm also noticed, that `modprobe nvidia` doesn't find the module (don't know if that's normal behaviour?): modprobe: ERROR: could not find module by name='nvidia' modprobe: ERROR: could not insert 'nvidia': Unknown symbol in module, or unknown parameter (see dmesg) btw. nouveau worked fine while nvidia drivers were not present... Thank you, Paul