[Bug 1215981] New: Black Screen during boot on both internal and external screen in kernel 6.5.4-1 on Thinkpad P16 (Discrete Graphics mode)
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Bug ID: 1215981 Summary: Black Screen during boot on both internal and external screen in kernel 6.5.4-1 on Thinkpad P16 (Discrete Graphics mode) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: petr.vorel@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- I have similar problem to #1213693, but on newer kernel 6.5.4-1, which should contain the fix. #1213693 was broken by commit ca62297b2085 ("drm/edid: Fix csync detailed mode parsing") in v6.4-rc1, which was fixed by revert it in 50b6f2c82977 ("Revert "drm/edid: Fix csync detailed mode parsing"") in v6.5-rc7. In my case I'm not able to see anything after kernel being loaded. I have Tumbleweed kernel 6.5.4-1 and 6.5.2-1. Problem is on Thinkpad P16 with 2 GPU: 00:02.0 VGA compatible controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c) 01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1) The problem is on "Discrete Graphics" (Nvidia only) mode. "Hybrid Graphics" (Intel + Nvidia) works, but I need for external screen to use "Discrete Graphics" as it's the only way to get external screens working (because external output is wired only to nvidia): i.e. on Discrete Graphics there is only Intel card being used $ drm_info |grep -i node: -A1 Node: /dev/dri/card0 Driver: i915 (Intel Graphics) version 1.6.0 (20201103) I tested with internal screen only and with internal screen + 2 external GPU. I tested to disable plymouth with rd.plymouth=0 plymouth.enable=0 plymouth=0 cmdline args, also tried fbcon=map:1 also boot to runlevel 1 and 3 instead the default. None helped. $ rpm -qa |grep -i -e nouveau -e intel -e ^kernel kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64 kernel-firmware-serial-20230829-1.1.noarch libdrm_nouveau2-2.4.116-2.1.x86_64 intel-vaapi-driver-2.4.1-5.11.x86_64 kernel-firmware-mwifiex-20230829-1.1.noarch xf86-video-intel-2.99.917.916_g31486f40-3.6.x86_64 kernel-firmware-platform-20230829-1.1.noarch kernel-firmware-intel-20230829-1.1.noarch kernel-firmware-iwlwifi-20230829-1.1.noarch kernel-firmware-all-20230829-1.1.noarch intel-media-driver-23.3.3-1.1.x86_64 ucode-intel-20230808-1.1.x86_64 kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64 kernel-firmware-amdgpu-20230829-1.1.noarch kernel-firmware-usb-network-20230829-1.1.noarch kernel-firmware-i915-20230829-1.1.noarch kernel-macros-6.5.4-1.1.noarch kernel-firmware-qcom-20230829-1.1.noarch libvulkan_intel-23.2.0-1699.360.pm.1.x86_64 intel-gpu-tools-1.27.1-2.3.x86_64 kernel-firmware-sound-20230829-1.1.noarch kernel-firmware-ath10k-20230829-1.1.noarch libvdpau_nouveau-23.2.0-1699.360.pm.1.x86_64 kernel-firmware-bnx2-20230829-1.1.noarch Mesa-dri-nouveau-23.2.0-1699.360.pm.1.x86_64 kernel-firmware-dpaa2-20230829-1.1.noarch kernel-firmware-atheros-20230829-1.1.noarch kernel-firmware-radeon-20230829-1.1.noarch kernel-firmware-ueagle-20230829-1.1.noarch kernel-firmware-brcm-20230829-1.1.noarch kernel-firmware-chelsio-20230829-1.1.noarch kernel-firmware-nvidia-20230829-1.1.noarch kernel-firmware-ti-20230829-1.1.noarch kernel-firmware-media-20230829-1.1.noarch kernel-firmware-realtek-20230829-1.1.noarch kernel-firmware-mellanox-20230829-1.1.noarch libdrm_intel1-2.4.116-2.1.x86_64 kernel-firmware-network-20230829-1.1.noarch kernel-firmware-ath11k-20230829-1.1.noarch kernel-firmware-mediatek-20230829-1.1.noarch kernel-firmware-bluetooth-20230829-1.1.noarch kernel-firmware-prestera-20230829-1.1.noarch kernel-firmware-liquidio-20230829-1.1.noarch kernel-firmware-marvell-20230829-1.1.noarch kernel-default-6.5.2-1.1.x86_64 kernel-firmware-nfp-20230829-1.1.noarch kernel-default-devel-6.5.2-1.1.x86_64 kernel-devel-6.5.4-1.1.noarch kernel-firmware-qlogic-20230829-1.1.noarch kernel-default-devel-6.5.4-1.1.x86_64 kernel-default-6.5.4-1.1.x86_64 kernel-devel-6.5.2-1.1.noarch $ lsmod |grep -i -e i915 -e nvidia -e nouveau nvidia_drm 94208 0 nvidia_modeset 1794048 1 nvidia_drm nvidia_uvm 3608576 0 i915 4087808 5 drm_buddy 20480 1 i915 i2c_algo_bit 20480 1 i915 drm_display_helper 237568 1 i915 ttm 102400 1 i915 cec 90112 2 drm_display_helper,i915 nvidia 8843264 2 nvidia_uvm,nvidia_modeset video 77824 3 thinkpad_acpi,i915,nvidia_modeset $ modinfo nvidia |grep -i version version: 535.113.01 srcversion: 81566B70A70B0B19F40FD1A vermagic: 6.5.4-1-default SMP preempt mod_unload modversions $ cat /proc/cmdline # but I tested with others, see above BOOT_IMAGE=/boot/vmlinuz-6.5.4-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap mitigations=auto quiet security=apparmor modprobe.blacklist=i915 nosimplefb=1 I use these non-factory repos: https://download.opensuse.org/repositories/X11:/Drivers:/Video:/Redesign/ope... https://download.opensuse.org/repositories/X11:/XOrg/openSUSE_Tumbleweed/ https://download.nvidia.com/opensuse/tumbleweed -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |petr.vorel@suse.com, | |tzimmermann@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.suse.com/s | |how_bug.cgi?id=1213693 CC| |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |patrik.jakobsson@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c1 --- Comment #1 from Patrik Jakobsson <patrik.jakobsson@suse.com> --- Can you access the system remotely? If so, please provide dmesg and hwinfo output. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Patrik Jakobsson <patrik.jakobsson@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sndirsch@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c2 --- Comment #2 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Patrik Jakobsson from comment #1)
Can you access the system remotely? If so, please provide dmesg and hwinfo output.
Unfortunately the system does not reply to ping. I'm able to get to working system if I switch in BIOS to "Discrete Graphics". I'm not sure if the system crashes, or network requires mn-applet to start. I'll try setup network over lan cable and setup SSH so that I can get some logs. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c3 --- Comment #3 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869967 --> https://bugzilla.suse.com/attachment.cgi?id=869967&action=edit dmesg of the affected system -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c4 --- Comment #4 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869968 --> https://bugzilla.suse.com/attachment.cgi?id=869968&action=edit hwinfo of the affected system -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c5 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #869967|0 |1 is obsolete| | --- Comment #5 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869969 --> https://bugzilla.suse.com/attachment.cgi?id=869969&action=edit dmesg of the affected system (cmdline cleanup) I removed modprobe.blacklist=i915 nosimplefb=1 from cmdline. Obviously it did not solve problem, just to use the default cmdline. There are some errors, not sure [ 1.464073] BERT: [Hardware Error]: Skipped 1 error records ... [ 2.052280] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.0 [ 2.052299] pci 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 2.052345] pci 0000:01:00.0: device [10de:25b9] error status/mask=00100000/00000000 ... [ 9.027482] sof-audio-pci-intel-tgl 0000:00:1f.3: init of i915 and HDMI codec failed ... [ 12.628660] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice [ 12.629139] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device Nvidia card is visible: $ lspci |grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1) 01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c6 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #869968|0 |1 is obsolete| | --- Comment #6 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869970 --> https://bugzilla.suse.com/attachment.cgi?id=869970&action=edit hwinfo of the affected system (cmdline cleanup) The main difference is that modprobe.blacklist=i915 nosimplefb=1 (previous log file) forced efi-framebuffer instead of the default simple-framebuffer and had "Generic Monitor". But output is the same - none. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c7 --- Comment #7 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869971 --> https://bugzilla.suse.com/attachment.cgi?id=869971&action=edit dmesg on Hybrid Graphics mode (where GUI works, just for a reference) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c8 --- Comment #8 from Petr Vorel <petr.vorel@suse.com> --- Created attachment 869972 --> https://bugzilla.suse.com/attachment.cgi?id=869972&action=edit hwinfo on Hybrid Graphics mode (where GUI works, just for a reference) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c9 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|kernel-bugs@opensuse.org |gfx-bugs@suse.de QA Contact|qa-bugs@suse.de |sndirsch@suse.com Status|NEW |IN_PROGRESS Flags| |needinfo?(petr.vorel@suse.c | |om) Priority|P5 - None |P3 - Medium Component|Kernel |X11 3rd Party Driver --- Comment #9 from Stefan Dirsch <sndirsch@suse.com> --- [ 12.368440] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs. [ 12.368442] NVRM: To force use of Open nvidia.ko on other GPUs, see the [ 12.368442] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described [ 12.368443] NVRM: in the README. So have you set this in modprobe.de/50-nvidia-default.conf ? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|gfx-bugs@suse.de |sndirsch@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.suse.com/s | |how_bug.cgi?id=1211950 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c10 --- Comment #10 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Stefan Dirsch from comment #9)
[ 12.368440] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs. [ 12.368442] NVRM: To force use of Open nvidia.ko on other GPUs, see the [ 12.368442] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described [ 12.368443] NVRM: in the README.
So have you set this in modprobe.d/50-nvidia-default.conf ?
Yes, I remember setting OpenRmEnableUnsupportedGpus=1 in /usr/lib/modprobe.d/50-nvidia-default.conf before (it was in the SUSE internal docs for the laptop), but now I see it's not set. I suspect it was overwrite by rpm update. So I reenabled it again. And setting it is really required: * Both Discrete Graphics and Hybrid Graphics modes are not able to use external screens when OpenRmEnableUnsupportedGpus=1 is not set. * Discrete Graphics mode now starts normally, I can use X11 based window managers and also Wayland based compositors (tested on sway, which is picky on nvidia proprietary drivers). I guess we can close this bug. Maybe we should consider to document using OpenRmEnableUnsupportedGpus=1 also somewhere in openSUSE wiki. Or ask Nvidia, which IMHO maintains /usr/lib/modprobe.d/50-nvidia-default.conf, to somehow document which GPU need this option. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c11 --- Comment #11 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Petr Vorel from comment #10)
I guess we can close this bug.
Actually loosing whole output without OpenRmEnableUnsupportedGpus=1 is a new *feature*, maybe Nvidia driver is broken on 6.5 kernel (it should be usable, although only internal screen).
Maybe we should consider to document using OpenRmEnableUnsupportedGpus=1 also somewhere in openSUSE wiki. Or ask Nvidia, which IMHO maintains /usr/lib/modprobe.d/50-nvidia-default.conf, to somehow document which GPU need this option.
To correct myself: nvidia-open-driver-G06-signed-kmp-default-535.113.01_k6.5.4_1-43.4.x86_64 which contains /usr/lib/modprobe.d/50-nvidia-default.conf is from obs://build.opensuse.org/X11:Drivers:Video. Shouldn't be the config file in /etc? Or am I suppose to put it into /etc? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c12 --- Comment #12 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. In theory during an update a file marked as %config in RPM and edited by yourself before should not be overwritten. https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html I don't think it has changed in the package itself. But you could check if there is a .rpmsave with a timestamp of the update. /usr/lib/modprobe.d is the new location for packaged config files. But you can overwrite things permanently on your system in /etc/modprobe.d using the same filename (IIRC). Usage of the opengpu driver is documented: --> https://en.opensuse.org/SDB:NVIDIA_drivers Open GPU kernel modules versus Proprietary drivers The following article is about installing NVIDIA's Proprietary drivers. For more information about the Open GPU kernel modules, that NVIDIA released in May 2022, read this [openSUSE Blog article][https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html]. [...] I doubt nvidia opengpu driver ever worked without that option. It does only on computing cards without graphical output. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c13 --- Comment #13 from Stefan Dirsch <sndirsch@suse.com> ---
I don't think it has changed in the package itself. But you could check if there is a .rpmsave with a timestamp of the update.
Therefore keeping NEEDINFO open ... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c14 --- Comment #14 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Stefan Dirsch from comment #12)
Hmm. In theory during an update a file marked as %config in RPM and edited by yourself before should not be overwritten.
https://www.cl.cam.ac.uk/~jw35/docs/rpm_config.html
I don't think it has changed in the package itself. But you could check if there is a .rpmsave with a timestamp of the update.
Yes, there is 50-nvidia-default.conf.rpmsave with date 29th September, which is *without* "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" line (not even commented out). That also brought my suspicion that it was overwritten. Also in the file before I edited it was this line commented out (it was also after the installation before I modified it to get GPU working).
/usr/lib/modprobe.d is the new location for packaged config files. But you can overwrite things permanently on your system in /etc/modprobe.d using the same filename (IIRC).
It's ok if I'm supposed to make this copy (I'll do). I just wanted to point out whole problem in case of any problem/bug in the package itself.
Usage of the opengpu driver is documented:
--> https://en.opensuse.org/SDB:NVIDIA_drivers
Open GPU kernel modules versus Proprietary drivers The following article is about installing NVIDIA's Proprietary drivers. For more information about the Open GPU kernel modules, that NVIDIA released in May 2022, read this [openSUSE Blog article][https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html]. [...]
Yes, I've noticed both of them before. The blog document using this variable and I found it via the official docs. But none of them suggests to move content of /usr/lib/modprobe.d to /etc/modprobe.d (probably general approach which I should have known, but in this case it leads to a broken system). Blog also mentions pci_ids-unsupported [1] in our packaging. I wonder if there could be automation which would on package configure checked this list and enable or disable the variable.
I doubt nvidia opengpu driver ever worked without that option. It does only on computing cards without graphical output.
Interesting. This could be mentioned in the blog post. [1] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/nvid... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c15 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(petr.vorel@suse.c | |om) | --- Comment #15 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. So that would mean the packaged file has changed (not sure why though; I'm not aware of any changes I did) and the .rpmsave is the edited one. So apparently you would have removed the line before yourself!?! But you needed to have set it. Hmm ... I'm not happy with the situation with this option. I had the idea to make a subpackage just out of this option, i.e. just one file. Install or uninstall this package to enable the driver or not. I'm afraid I can't enable this option by default as long as nVidia call it alpha quality for cards with display engine. In my blog post I mention, which GPUs are supported by default and which need this option. Pretty obvious I believe. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c16 --- Comment #16 from Stefan Dirsch <sndirsch@suse.com> --- I just checked that 50-nvidia-default.conf of 535.104.05 and 535.113.01 is identical. So this does not explain, which such a .rpmsave file has been created. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c17 --- Comment #17 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Stefan Dirsch from comment #16)
I just checked that 50-nvidia-default.conf of 535.104.05 and 535.113.01 is identical. So this does not explain, which such a .rpmsave file has been created.
Thanks for all info. I remember only adding this option. But maybe I really removed this option, but it would have to be some time ago, not recently. But let's expect it was my fault, I'll watch next update of the driver. Also although I thought that I at least once before boot with Nvidia driver without NVreg_OpenRmEnableUnsupportedGpus=1, I'm not sure. Now I think it's unlike there is a regression in the driver or kernel. Maybe we should close this bug for now, it can be reopen if problem gets back. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c18 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #18 from Stefan Dirsch <sndirsch@suse.com> --- Yes, I would definitely appreciate if you could watch what happens with the next update! And of course this ticket can be reopened if you run into the same situation again with the next update! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c21 --- Comment #21 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Stefan Dirsch from comment #20)
Hmm ...
NVreg_OpenRMEnableSupporteGpus option is no longer needed. The support for Workstation cards is now considered beta and officially supported.
Does this apply to Open GPU kernel modules or to NVIDIA's Proprietary drivers? Your comment #12 suggests it's needed for Open GPU kernel modules which I'm trying to use. Although I need to double check if I installed only Open GPU kernel modules (the open ones) and not NVIDIA's Proprietary drivers.
fbdev option is new and eventually enables a Linux console with the nvidia driver (and no longer breaks simpledrm on newer 6.x.y kernels).
Do things work again when you remove the fbdev option?
OK, I'll test "options nvidia-drm modeset=1" (with removed "fbdev=1" from that line and removed "options nvidia NVreg_OpenRMEnableSupporteGpus=1"). But I remember last time "options nvidia-drm modeset=1" only didn't work (NVreg_OpenRMEnableSupporteGpus=1 was required on kernel 6.5 and kernel-firmware-nvidia-gspx-G06-535.113.01).
I think you need to regenerate the initrd by running 'dracut' to make the changes effective.
OK, I'll try tomorrow something like: dracut --kver $(uname -r) -f -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c22 --- Comment #22 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Petr Vorel from comment #21)
(In reply to Stefan Dirsch from comment #20)
Hmm ...
NVreg_OpenRMEnableSupporteGpus option is no longer needed. The support for Workstation cards is now considered beta and officially supported.
Does this apply to Open GPU kernel modules or to NVIDIA's Proprietary drivers? Your comment #12 suggests it's needed for Open GPU kernel modules which I'm trying to use. Although I need to double check if I installed only Open GPU kernel modules (the open ones) and not NVIDIA's Proprietary drivers.
This applies to Open GPU kernel modules. Setting this option is no longer needed for Desktop GPUs since version 545.29.02.
fbdev option is new and eventually enables a Linux console with the nvidia driver (and no longer breaks simpledrm on newer 6.x.y kernels).
Do things work again when you remove the fbdev option?
OK, I'll test "options nvidia-drm modeset=1" (with removed "fbdev=1" from that line and removed "options nvidia NVreg_OpenRMEnableSupporteGpus=1"). But I remember last time "options nvidia-drm modeset=1" only didn't work (NVreg_OpenRMEnableSupporteGpus=1 was required on kernel 6.5 and kernel-firmware-nvidia-gspx-G06-535.113.01).
See above.
I think you need to regenerate the initrd by running 'dracut' to make the changes effective.
OK, I'll try tomorrow something like: dracut --kver $(uname -r) -f
yes. I think this should do the job. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c23 --- Comment #23 from Petr Vorel <petr.vorel@suse.com> --- TL;DR: Probably problem in my setup, we can probably close this. The rest is a description if you find something which I do obviously wrong or if there is something what can be improved. I wonder how can happen that 2 driver versions can coexist together? (kernel-firmware-nvidia-gsp-G06-525.116 vs. kernel-firmware-nvidia-gspx-G06-535 and nvidia-open-driver-G06-signed-kmp-default-535 and nvidia-open-driver-G06-signed-kmp-default-545): $ rpm -qa |grep -i nvidia | sort kernel-firmware-nvidia-20231107-1.1.noarch kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64 kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-12.1.x86_64 kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64 libnvidia-egl-wayland1-1.1.12-1.2.x86_64 libva-nvidia-driver-0.0.10-1.1.x86_64 nvidia-compute-G06-32bit-535.129.03-15.1.x86_64 nvidia-compute-G06-535.129.03-15.1.x86_64 nvidia-gl-G06-32bit-535.129.03-15.1.x86_64 nvidia-gl-G06-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.6.1_1-1.2.x86_64 nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.5.9_1-57.1.x86_64 nvidia-video-G06-32bit-535.129.03-15.1.x86_64 nvidia-video-G06-535.129.03-15.1.x86_64 $ rpm -qi kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64 Name : kernel-firmware-nvidia-gspx-G06 Version : 545.29.02 Release : 13.1 Architecture: x86_64 Install Date: Út 14. listopadu 2023, 09:27:44 Group : System/Kernel Size : 64294720 License : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT Signature : RSA/SHA256, Po 13. listopadu 2023, 16:53:44, Key ID 590401a1e38fb563 Source RPM : kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.nosrc.rpm Build Date : Po 13. listopadu 2023, 16:53:25 Build Host : i04-ch2a Vendor : obs://build.opensuse.org/X11:Drivers:Video URL : https://www.nvidia.com/en-us/drivers/unix/ Summary : Kernel firmware file for open NVIDIA kernel module driver G06 Description : This package contains the versioned kernel firmware file "gsp.bin" for the OpenSource NVIDIA kernel module driver G06. Distribution: X11:Drivers:Video:Redesign / openSUSE_Tumbleweed $ rpm -qi kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64 Name : kernel-firmware-nvidia-gspx-G06 Version : 535.129.03 Release : 1.1 Architecture: x86_64 Install Date: Pá 10. listopadu 2023, 07:23:53 Group : System/Kernel Size : 61824832 License : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT Signature : RSA/SHA512, Čt 2. listopadu 2023, 20:48:50, Key ID 35a2f86e29b700a4 Source RPM : kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.nosrc.rpm Build Date : Čt 2. listopadu 2023, 20:48:26 Build Host : i04-ch1b Packager : https://bugs.opensuse.org Vendor : openSUSE URL : https://www.nvidia.com/en-us/drivers/unix/ Summary : Kernel firmware file for open NVIDIA kernel module driver G06 Description : This package contains the versioned kernel firmware file "gsp.bin" for the OpenSource NVIDIA kernel module driver G06. Distribution: openSUSE Tumbleweed I suppose this is due multiversion = provides:multiversion(kernel), right? Because I see that both nvidia-open-driver devel [1] and factory [2] have the same newer version, the same applies to kernel-firmware-nvidia-gspx-G06 [3] [4] I removed obs://build.opensuse.org/X11:Drivers:Video and removed packages and install only the latest version. After this, the default value ("options nvidia-drm modeset=1 fbdev=1" and *not* set NVreg_OpenRMEnableSupporteGpus=1) was working for xorg. After installation the still was not working even I run dracut, I needed to ssh to the system, rerun dracut and reboot to get it working. Let's assume I did something wrong, that's why I needed to rerun dracut via ssh. But sway did not work. Removing "fbdev=1" made no difference (working xorg, broken sway). Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting. For sway are also needed nvidia-video-G06 (otherwise sway startup freezes) and nvidia-gl-G06 (sway startup fails) from the proprietary NVIDIA repository. i.e. both kernel open driver nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.6.1_1-1.1.x86_64 and GPU and proprietary NVIDIA OpenGL libraries are needed for sway (while this might be obvious from the block post [5] it was new for me, because sway claims "don't use nvidia proprietary"). [1] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/nvid... [2] https://build.opensuse.org/package/view_file/openSUSE:Factory/nvidia-open-dr... [3] https://build.opensuse.org/package/view_file/X11:Drivers:Video:Redesign/kern... [4] https://build.opensuse.org/package/view_file/openSUSE:Factory/kernel-firmwar... [5] https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c24 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(petr.vorel@suse.c | |om) | --- Comment #24 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Petr Vorel from comment #23)
TL;DR: Probably problem in my setup, we can probably close this. The rest is a description if you find something which I do obviously wrong or if there is something what can be improved.
Thanks for the detailed report. Very much appreciated!
I wonder how can happen that 2 driver versions can coexist together? (kernel-firmware-nvidia-gsp-G06-525.116 vs. kernel-firmware-nvidia-gspx-G06-535 and nvidia-open-driver-G06-signed-kmp-default-535 and nvidia-open-driver-G06-signed-kmp-default-545):
$ rpm -qa |grep -i nvidia | sort kernel-firmware-nvidia-20231107-1.1.noarch kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64 kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-12.1.x86_64 kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64 libnvidia-egl-wayland1-1.1.12-1.2.x86_64 libva-nvidia-driver-0.0.10-1.1.x86_64 nvidia-compute-G06-32bit-535.129.03-15.1.x86_64 nvidia-compute-G06-535.129.03-15.1.x86_64 nvidia-gl-G06-32bit-535.129.03-15.1.x86_64 nvidia-gl-G06-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.6.1_1-1.2.x86_64 nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.5.9_1-57.1.x86_64 nvidia-video-G06-32bit-535.129.03-15.1.x86_64 nvidia-video-G06-535.129.03-15.1.x86_64
$ rpm -qi kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.x86_64 Name : kernel-firmware-nvidia-gspx-G06 Version : 545.29.02 Release : 13.1 Architecture: x86_64 Install Date: Út 14. listopadu 2023, 09:27:44 Group : System/Kernel Size : 64294720 License : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT Signature : RSA/SHA256, Po 13. listopadu 2023, 16:53:44, Key ID 590401a1e38fb563 Source RPM : kernel-firmware-nvidia-gspx-G06-545.29.02-13.1.nosrc.rpm Build Date : Po 13. listopadu 2023, 16:53:25 Build Host : i04-ch2a Vendor : obs://build.opensuse.org/X11:Drivers:Video URL : https://www.nvidia.com/en-us/drivers/unix/ Summary : Kernel firmware file for open NVIDIA kernel module driver G06 Description : This package contains the versioned kernel firmware file "gsp.bin" for the OpenSource NVIDIA kernel module driver G06. Distribution: X11:Drivers:Video:Redesign / openSUSE_Tumbleweed
$ rpm -qi kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.x86_64 Name : kernel-firmware-nvidia-gspx-G06 Version : 535.129.03 Release : 1.1 Architecture: x86_64 Install Date: Pá 10. listopadu 2023, 07:23:53 Group : System/Kernel Size : 61824832 License : GPL-2.0-only AND SUSE-Firmware AND GPL-2.0-or-later AND MIT Signature : RSA/SHA512, Čt 2. listopadu 2023, 20:48:50, Key ID 35a2f86e29b700a4 Source RPM : kernel-firmware-nvidia-gspx-G06-535.129.03-1.1.nosrc.rpm Build Date : Čt 2. listopadu 2023, 20:48:26 Build Host : i04-ch1b Packager : https://bugs.opensuse.org Vendor : openSUSE URL : https://www.nvidia.com/en-us/drivers/unix/ Summary : Kernel firmware file for open NVIDIA kernel module driver G06 Description : This package contains the versioned kernel firmware file "gsp.bin" for the OpenSource NVIDIA kernel module driver G06. Distribution: openSUSE Tumbleweed
I suppose this is due multiversion = provides:multiversion(kernel), right?
Yes, this is exactly the reason.
Because I see that both nvidia-open-driver devel [1] and factory [2] have the same newer version, the same applies to kernel-firmware-nvidia-gspx-G06 [3] [4] I removed obs://build.opensuse.org/X11:Drivers:Video and removed packages and install only the latest version.
Yes, you no longer need the devel projects, since the driver+firmware is now included in our products. So better remove these.
After this, the default value ("options nvidia-drm modeset=1 fbdev=1" and *not* set NVreg_OpenRMEnableSupporteGpus=1) was working for xorg.
Thanks for confirmation.
After installation the still was not working even I run dracut, I needed to ssh to the system, rerun dracut and reboot to get it working. Let's assume I did something wrong, that's why I needed to rerun dracut via ssh. But sway did not work.
Yeah. You need to reboot now after changing kernel modules config. You no longer can easily unload the driver when option "fbdev=1" is et which eventually added a Linux console with this driver.
Removing "fbdev=1" made no difference (working xorg, broken sway).
Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting.
Interesting that having this option still set breaks things. I think it should be removed from the driver.
For sway are also needed nvidia-video-G06 (otherwise sway startup freezes) and nvidia-gl-G06 (sway startup fails) from the proprietary NVIDIA repository.
i.e. both kernel open driver nvidia-open-driver-G06-signed-kmp-default-545.29.02_k6.6.1_1-1.1.x86_64 and GPU and proprietary NVIDIA OpenGL libraries are needed for sway (while this might be obvious from the blog post [5] it was new for me, because sway claims "don't use nvidia proprietary").
Ok. Good to know this. Maybe sway just doesn't work with Mesa's software fallback driver, no matter which KMS driver is in use. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c25 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|IN_PROGRESS |RESOLVED --- Comment #25 from Stefan Dirsch <sndirsch@suse.com> --- So I'm closing this for now. Of course you can report what happens with the next update. ;-) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c26 --- Comment #26 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Petr Vorel from comment #23)
Adding NVreg_OpenRMEnableSupporteGpus=1 is the option which breaks booting.
I cannot reproduce that issue. Driver 545.29.02 simply ignores this setting. [ 4.993601] nvidia: unknown parameter 'NVreg_OpenRMEnableSupporteGpus' ignored -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c31 --- Comment #31 from Maintenance Automation <maint-coord+maintenance-robot@suse.de> --- SUSE-RU-2023:4642-1: An update that has two fixes can now be installed. Category: recommended (moderate) Bug References: 1215981, 1217370 Sources used: openSUSE Leap 15.5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1 SUSE Linux Enterprise Micro 5.5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1 Basesystem Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1 Public Cloud Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.02-150500.3.18.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c32 --- Comment #32 from Maintenance Automation <maint-coord+maintenance-robot@suse.de> --- SUSE-RU-2023:4641-1: An update that has two fixes can now be installed. Category: recommended (moderate) Bug References: 1215981, 1217370 Sources used: openSUSE Leap 15.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 SUSE Linux Enterprise Micro for Rancher 5.3 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 SUSE Linux Enterprise Micro 5.3 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 SUSE Linux Enterprise Micro for Rancher 5.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 SUSE Linux Enterprise Micro 5.4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 Basesystem Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.02-150400.9.32.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c35 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #35 from Petr Vorel <petr.vorel@suse.com> --- I still experience black screen very often (e.g. ~ 50% of boots or resumes from boot). I guess what I reported as a configuration issue /usr/lib/modprobe.d/50-nvidia-default.conf (there probably was at least one problem with it) or with broken "systemctl suspend" is something else. It happens even I don't do any update or configuration issue. OTOH I did some updates, thus it also happened on different kernels and nvidia driver versions. When there is a black screen there is full log of repeating messages: [ 23.262590] snd_hda_intel 0000:01:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 23.262597] snd_hda_intel 0000:01:00.1: device [10de:2291] error status/mask=00100000/00000000 [ 23.262602] snd_hda_intel 0000:01:00.1: [20] UnsupReq (First) [ 23.262606] snd_hda_intel 0000:01:00.1: AER: TLP Header: 60000008 000000ff 00000040 00840000 [ 23.262613] pci 0000:01:00.0: AER: can't recover (no error_detected callback) [ 23.262615] snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback) [ 23.262646] pcieport 0000:00:01.0: AER: device recovery failed [ 23.349965] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.1 I already reported it in comment #5, but in dmesg #7 it was added only once. Later it become permanent (i.e. dmesg ring buffer contains only these messages). Is that a hardware error? Documenting current state of the config files (IMHO they are correct). $ rpm -qa |grep -i -e kernel-default -e nvidia | sort kernel-default-devel-6.6.2-1.1.x86_64 kernel-default-devel-6.6.3-1.1.x86_64 kernel-default-6.6.2-1.1.x86_64 kernel-default-6.6.3-1.1.x86_64 kernel-firmware-nvidia-gspx-G06-545.29.06-1.1.x86_64 kernel-firmware-nvidia-20231128-1.1.noarch libnvidia-egl-wayland1-1.1.13-1.1.x86_64 libva-nvidia-driver-0.0.11-1.1.x86_64 nvidia-compute-G06-32bit-545.29.06-18.1.x86_64 nvidia-compute-G06-545.29.06-18.1.x86_64 nvidia-driver-G06-kmp-default-545.29.06_k6.6.2_1-18.1.x86_64 nvidia-gl-G06-32bit-545.29.06-18.1.x86_64 nvidia-gl-G06-545.29.06-18.1.x86_64 nvidia-video-G06-32bit-545.29.06-18.1.x86_64 nvidia-video-G06-545.29.06-18.1.x86_64 $ uname -a Linux p16 6.6.3-1-default #1 SMP PREEMPT_DYNAMIC Wed Nov 29 05:06:07 UTC 2023 (d766c57) x86_64 x86_64 x86_64 GNU/Linux $ cat /usr/lib/modprobe.d/50-nvidia-default.conf |grep -v ^# options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=485 NVreg_DeviceFileMode=0660 NVreg_PreserveVideoMemoryAllocations=1 options nvidia-drm modeset=1 fbdev=1 install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then if /sbin/modprobe nvidia_uvm; then if [ ! -c /dev/nvidia-uvm ]; then mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if [ "$device" = "nvidia-uvm" ]; then echo $major; break; fi ; done) 0; chown :video /dev/nvidia-uvm; fi; if [ ! -c /dev/nvidia-uvm-tools ]; then mknod -m 660 /dev/nvidia-uvm-tools c $(cat /proc/devices | while read major device; do if [ "$device" = "nvidia-uvm" ]; then echo $major; break; fi ; done) 1; chown :video /dev/nvidia-uvm-tools; fi; fi; if [ ! -c /dev/nvidiactl ]; then mknod -m 660 /dev/nvidiactl c 195 255; chown :video /dev/nvidiactl; fi; devid=-1; for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor); if [ "$vendorid" = "0x10de" ]; then class=$(cat $dev/class); classid=${class%%00}; if [ "$classid" = "0x0300" -o "$classid" = "0x0302" ]; then devid=$((devid+1)); if [ ! -c /dev/nvidia${devid} ]; then mknod -m 660 /dev/nvidia${devid} c 195 ${devid}; chown :video /dev/nvidia${devid}; fi; fi; fi; done; /sbin/modprobe nvidia_drm; if [ ! -c /dev/nvidia-modeset ]; then mknod -m 660 /dev/nvidia-modeset c 195 254; chown :video /dev/nvidia-modeset; fi; fi $ cat /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G06.conf L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0 $ cat /usr/lib/modprobe.d/nvidia-default.conf blacklist nouveau $ cat /usr/lib/dracut/dracut.conf.d/60-nvidia-default.conf add_drivers+=" nvidia nvidia-drm nvidia-modeset nvidia-uvm " $ cat /usr/src/kernel-modules/nvidia-545.29.06-default/dkms.conf |grep -v ^# PACKAGE_NAME="nvidia" PACKAGE_VERSION="__VERSION_STRING" AUTOINSTALL="yes" MAKE[0]="'make' -j__JOBS NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES' KERNEL_UNAME=${kernelver} modules" __DKMS_MODULES -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c36 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |IN_PROGRESS --- Comment #36 from Stefan Dirsch <sndirsch@suse.com> --- Hmm, snd_hda_intel sounds like the driver for the internal Intel sound chip.
[ 23.262646] pcieport 0000:00:01.0: AER: device recovery failed [ 23.349965] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:01:00.1
No idea. Google tells me https://www.videogames.ai/dmesg-aer-error#:~:text=You%20can%20fix%20the%20problem,and%20disabling%20memory%20mapping%20support.&text=Just%20need%20to%20reboot%20and%20the%20error%20should%20disapear. Maybe it's worth a try. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c37 --- Comment #37 from Takashi Iwai <tiwai@suse.com> --- AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed. (And yes, it's worth to test the boot options to see whether it suppresses or not.) -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c38 --- Comment #38 from Maintenance Automation <maint-coord+maintenance-robot@suse.de> --- SUSE-RU-2024:0143-1: An update that has one fix can now be installed. Category: recommended (moderate) Bug References: 1215981 Sources used: openSUSE Leap 15.5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5 SUSE Linux Enterprise Micro 5.5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5 Basesystem Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5 Public Cloud Module 15-SP5 (src): nvidia-open-driver-G06-signed-545.29.06-150500.3.21.5 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c39 --- Comment #39 from Maintenance Automation <maint-coord+maintenance-robot@suse.de> --- SUSE-RU-2024:0169-1: An update that has one fix can now be installed. Category: recommended (moderate) Bug References: 1215981 Sources used: SUSE Manager Retail Branch Server 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Manager Server 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 openSUSE Leap 15.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Micro for Rancher 5.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Micro 5.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Micro for Rancher 5.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Micro 5.4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 Public Cloud Module 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise High Performance Computing ESPOS 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise High Performance Computing LTSS 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Desktop 15 SP4 LTSS 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Server 15 SP4 LTSS 15-SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Linux Enterprise Server for SAP Applications 15 SP4 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 SUSE Manager Proxy 4.3 (src): nvidia-open-driver-G06-signed-545.29.06-150400.9.35.2 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c44 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(petr.vorel@suse.c | |om) --- Comment #44 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Takashi Iwai from comment #37)
AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed. (And yes, it's worth to test the boot options to see whether it suppresses or not.)
So have you tried this meanwhile? Instructions in the link you posted in comment #36. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c46 --- Comment #46 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Stefan Dirsch from comment #44)
(In reply to Takashi Iwai from comment #37)
AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed. (And yes, it's worth to test the boot options to see whether it suppresses or not.)
So have you tried this meanwhile? Instructions in the link you posted in comment #36.
Any news on this one? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c47 --- Comment #47 from Stefan Dirsch <sndirsch@suse.com> --- @Petr ping ... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c48 --- Comment #48 from Petr Vorel <petr.vorel@suse.com> --- I'm sorry, meanwhile I reinstalled to nouveau, but I'll reinstall back and check it. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c49 --- Comment #49 from Petr Vorel <petr.vorel@suse.com> --- (In reply to Stefan Dirsch from comment #46)
(In reply to Stefan Dirsch from comment #44)
(In reply to Takashi Iwai from comment #37)
AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed. (And yes, it's worth to test the boot options to see whether it suppresses or not.)
So have you tried this meanwhile? Instructions in the link you posted in comment #36.
Any news on this one?
Yes, pci=nommconf kernel command parameter suppresses AER error message in dmesg. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c50 --- Comment #50 from Petr Vorel <petr.vorel@suse.com> --- Just for the record, nouveau kernel driver does not have the problem (going to retest nvidia kernel drivers). -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c51 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(petr.vorel@suse.c | |om) | --- Comment #51 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Petr Vorel from comment #49)
(In reply to Stefan Dirsch from comment #46)
(In reply to Stefan Dirsch from comment #44)
(In reply to Takashi Iwai from comment #37)
AER report is usually harmless, but if it happens even with a newer kernel, it's a regression and should be addressed. (And yes, it's worth to test the boot options to see whether it suppresses or not.)
So have you tried this meanwhile? Instructions in the link you posted in comment #36.
Any news on this one?
Yes, pci=nommconf kernel command parameter suppresses AER error message in dmesg.
Thanks for verifying that! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215981 https://bugzilla.suse.com/show_bug.cgi?id=1215981#c52 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |WONTFIX Status|IN_PROGRESS |RESOLVED --- Comment #52 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Petr Vorel from comment #50)
Just for the record, nouveau kernel driver does not have the problem (going to retest nvidia kernel drivers).
I think with that we should close this bug. I understand that it's a hassle testing again and again a driver when you already found another solution. And since nobody else seems to be affected ... -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com