Petr Vorel changed bug 1215981
What Removed Added
Status RESOLVED REOPENED
Resolution FIXED ---

Comment # 35 on bug 1215981 from Petr Vorel
I still experience black screen very often (e.g. ~ 50% of boots or resumes from
boot). I guess what I reported as a configuration issue
/usr/lib/modprobe.d/50-nvidia-default.conf (there probably was at least one
problem with it) or with broken "systemctl suspend" is something else. It
happens even I don't do any update or configuration issue. OTOH I did some
updates, thus it also happened on different kernels and nvidia driver versions.

When there is a black screen there is full log of repeating messages:

[   23.262590] snd_hda_intel 0000:01:00.1: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   23.262597] snd_hda_intel 0000:01:00.1:   device [10de:2291] error
status/mask=00100000/00000000
[   23.262602] snd_hda_intel 0000:01:00.1:    [20] UnsupReq              
(First)
[   23.262606] snd_hda_intel 0000:01:00.1: AER:   TLP Header: 60000008 000000ff
00000040 00840000
[   23.262613] pci 0000:01:00.0: AER: can't recover (no error_detected
callback)
[   23.262615] snd_hda_intel 0000:01:00.1: AER: can't recover (no
error_detected callback)
[   23.262646] pcieport 0000:00:01.0: AER: device recovery failed
[   23.349965] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:01:00.1

I already reported it in comment #5, but in dmesg #7 it was added only once.
Later it become permanent (i.e. dmesg ring buffer contains only these
messages). Is that a hardware error?

Documenting current state of the config files (IMHO they are correct).

$ rpm -qa |grep -i -e kernel-default -e nvidia | sort 
kernel-default-devel-6.6.2-1.1.x86_64
kernel-default-devel-6.6.3-1.1.x86_64
kernel-default-6.6.2-1.1.x86_64
kernel-default-6.6.3-1.1.x86_64
kernel-firmware-nvidia-gspx-G06-545.29.06-1.1.x86_64
kernel-firmware-nvidia-20231128-1.1.noarch
libnvidia-egl-wayland1-1.1.13-1.1.x86_64
libva-nvidia-driver-0.0.11-1.1.x86_64
nvidia-compute-G06-32bit-545.29.06-18.1.x86_64
nvidia-compute-G06-545.29.06-18.1.x86_64
nvidia-driver-G06-kmp-default-545.29.06_k6.6.2_1-18.1.x86_64
nvidia-gl-G06-32bit-545.29.06-18.1.x86_64
nvidia-gl-G06-545.29.06-18.1.x86_64
nvidia-video-G06-32bit-545.29.06-18.1.x86_64
nvidia-video-G06-545.29.06-18.1.x86_64

$ uname -a
Linux p16 6.6.3-1-default #1 SMP PREEMPT_DYNAMIC Wed Nov 29 05:06:07 UTC 2023
(d766c57) x86_64 x86_64 x86_64 GNU/Linux

$ cat /usr/lib/modprobe.d/50-nvidia-default.conf |grep -v ^#
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=485
NVreg_DeviceFileMode=0660 NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1 fbdev=1
install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install
nvidia; then   if /sbin/modprobe nvidia_uvm; then     if [ ! -c /dev/nvidia-uvm
]; then       mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read
major device; do if [ "$device" = "nvidia-uvm" ]; then echo $major; break; fi ;
done) 0;        chown :video /dev/nvidia-uvm;     fi;     if [ ! -c
/dev/nvidia-uvm-tools ]; then       mknod -m 660 /dev/nvidia-uvm-tools c $(cat
/proc/devices | while read major device; do if [ "$device" = "nvidia-uvm" ];
then echo $major; break; fi ; done) 1;       chown :video
/dev/nvidia-uvm-tools;     fi;   fi;   if [ ! -c /dev/nvidiactl ]; then    
mknod -m 660 /dev/nvidiactl c 195 255;     chown :video /dev/nvidiactl;   fi;  
devid=-1;   for dev in $(ls -d /sys/bus/pci/devices/*); do      vendorid=$(cat
$dev/vendor);     if [ "$vendorid" = "0x10de" ]; then       class=$(cat
$dev/class);       classid=${class%%00};       if [ "$classid" = "0x0300" -o
"$classid" = "0x0302" ]; then          devid=$((devid+1));         if [ ! -c
/dev/nvidia${devid} ]; then            mknod -m 660 /dev/nvidia${devid} c 195
${devid};            chown :video /dev/nvidia${devid};         fi;       fi;   
 fi;   done;   /sbin/modprobe nvidia_drm;   if [ ! -c /dev/nvidia-modeset ];
then     mknod -m 660 /dev/nvidia-modeset c 195 254;     chown :video
/dev/nvidia-modeset;   fi; fi

$ cat /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G06.conf 
L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl
L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm
L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - -
/dev/nvidia-uvm-tools
L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset
L /run/udev/static_node-tags/uaccess/nvidia0 - - - - /dev/nvidia0

$ cat /usr/lib/modprobe.d/nvidia-default.conf 
blacklist nouveau

$ cat /usr/lib/dracut/dracut.conf.d/60-nvidia-default.conf 
add_drivers+=" nvidia nvidia-drm nvidia-modeset nvidia-uvm "

$ cat /usr/src/kernel-modules/nvidia-545.29.06-default/dkms.conf |grep -v ^#
PACKAGE_NAME="nvidia"
PACKAGE_VERSION="__VERSION_STRING"
AUTOINSTALL="yes"

MAKE[0]="'make' -j__JOBS NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES'
KERNEL_UNAME=${kernelver} modules"

__DKMS_MODULES


You are receiving this mail because: