[Bug 1226055] New: NVIDIA driver 550.90 broken, plus no boot option for kernel 6.9.3
https://bugzilla.suse.com/show_bug.cgi?id=1226055 Bug ID: 1226055 Summary: NVIDIA driver 550.90 broken, plus no boot option for kernel 6.9.3 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: screening-team-bugs@suse.de Reporter: gerald_chen@foxmail.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Hi. So NVIDIA driver 550.90 came out in Tumbleweed repo so I upgraded (system in snapshot 20240531) and things broke. Rebooted and NVIDIA drivers could not be loaded. `nvidia-smi` said `NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.`, though NVIDIA drivers were indeed installed. No `nvidia_drm` in `lsmod`. Tried `dracut --force --regenerate-all` and no improvement. Tried re-installing the drivers and re-enrolling the public key but no help. Also tried disabling secure boot to sadly see no difference. Went back to the pre snapshot before the update and found that even the read-only snapshot selection menu in grub reported the kernel version to be 6.9.3, `uname -r` after booting into the snapshot still said 6.9.1. In fact that's the case in several snapshots. And 6.9.3 could not be found in `YaST Boot Loader` > `Bootloader Options` > `Default Boot Section`. Tried `update-bootloader --reinit` and `update-bootloader --refresh` then reboot and no improvement. I’ve updated to 20240605 snapshot and it's still broken. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1226055 Andreas Stieger <Andreas.Stieger@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- QA Contact|qa-bugs@suse.de |sndirsch@suse.com Assignee|screening-team-bugs@suse.de |gfx-bugs@suse.de Component|Other |X11 3rd Party Driver -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1226055 Andreas Stieger <Andreas.Stieger@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(gerald_chen@foxma | |il.com) | -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1226055 https://bugzilla.suse.com/show_bug.cgi?id=1226055#c4 Scott Bradnick <scott.bradnick@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |scott.bradnick@suse.com --- Comment #4 from Scott Bradnick <scott.bradnick@suse.com> --- I'm not sure this is helpful, but I'll add it in case it is. I use vfio on a Dell with a T1000 and pass that discrete card to either a TW Qemu VM or Win11 Qemu VM depending on which one I want to use. Didn't have too much trouble w/ nvidia <= 550.67 and kernel <= 6.9.1. But the combo of 550.90 & 6.9.3 was a much more painful experience (for this machine). I won't claim to have any idea why, but 9 times out of 10 the Dell wouldn't boot to X and w/in 3 minutes would lockup w/ some type of vfio "cold" lockup and I'd have to hard-reset it. No manner of trying to blacklist vfio would stop it from showing up in lsmod output; neither would commenting out vfio-related items in /etc/modprobe.d and /etc/modules-load.d - it always showed back up. I removed nvidia-driver-G06-kmp-default and tried to reinstall it - locked up again w/ vfio before it seemed like the install completed, but I was prompted w/ a MOK enroll after the hard-reset. Only success I had was after "# modprobe --remove <each vfio module>" was run, then using `rpm -evh` to remove G06 and reinstalling G06 completed successfully and it seems the system is ACTUALLY not loading vfio as I'd expect considering they're still commented out. System hasn't locked up and seems happier, I'll check another day if I can re-enable vfio and GPU passthrough works again. Oddly, I have another up-to-date TW system w/ and AMD CPU and a 3070ti that didn't have any of these problems. This Dell is nothing but trouble. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1226055 https://bugzilla.suse.com/show_bug.cgi?id=1226055#c8 --- Comment #8 from Scott Bradnick <scott.bradnick@suse.com> --- Just a little update from me, not that anyone asked :P (I'm not looking to take this bug over, but it's currently the only 6.9.3 bug out there and if anyone else is having problems, hopefully they'd see this and decide if they need to open their own, I appologize Gerald if this is worthless chatter in your report). I don't think there's a issue w/ 550.90, I think there's some oddness w/ 6.9.3, but it appears to be more of an issue w/ prime laptops than desktops w/ discrete cards. I have a desktop passing a 3070ti to a TW Qemu VM, it was on 6.9.1 w/ 550.78 and all was fine pre-`zypper dup`. After the dup, the worst I could get to happen was that 6.9.1 tried to use 550.78 and reported: Failed to initialize NVML: Driver/library version mismatch NVML library version: 550.90 But it's fine w/ 6.9.3 and I'd assume I could have 6.9.1 rebuild w/ 550.90 if for some reason that was a desired setup, which it isn't presently. The Dell, it's a different story (of course). It's still hit-or-miss on if 6.9.1 or 6.9.3 boot into X and/or throw the "cold" issue w/ the T1000. Right now, I've got it booted in 6.9.3 running a TW VM (on 6.9.3 + 550.90.07, reporting the T1000 via `nvidia-smi -L`) and all seems fine, been up ~20 minutes. Other than the Dell's inability to consistenly boot w/o problem(s), and even though it was working fine even with a VM using the T1000, running `inxi --graphics` either takes >= 30s to run or hangs and basically causes the hard-lock. I'm about to the point where I just don't turn the thing on ... -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com