Bug ID 1216871
Summary Thinkpad P16 with Open GPU kernel modules does not resume after sleep/hibernate (BUG: kernel NULL pointer dereference, address: 0000000000000008)
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter petr.vorel@suse.com
QA Contact qa-bugs@suse.de
Target Milestone ---
Found By ---
Blocker ---

This is similar to #1211950, but I don't use nouveau but these packages
nvidia-open-driver-G06-signed-kmp-default kernel-firmware-nvidia-gsp-G06
nvidia-open-driver-G06-signed-kmp suggested in [1].

I tried both of these [2], but although sleep worked it did not resume:
sudo systemctl suspend
sudo systemctl hibernate

(I searched to [2] due errors reported in dmesg after trying to suspend with
echo mem > /sys/power/state).

suspend is ok:
[    0.000000] Linux version 6.5.9-1-default (geeko@buildhost) (gcc (SUSE
Linux) 13.2.1 20230912 [revision b96e66fd4ef3e36983969fb8cdd1956f551a074b], GNU
ld (GNU Binutils; openSUSE Tumbleweed) 2.40.0.20230412-5) #1 SMP
PREEMPT_DYNAMIC Wed Oct 25 10:31:37 UTC 2023 (29edc7c)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.9-1-default
root=/dev/mapper/system-root splash=silent resume=/dev/system/swap
mitigations=auto quiet security=apparmor nosimplefb=1
...
[  247.802155] NVRM nvAssertFailedNoLog: Assertion failed: 0 @ mem_list.c:293
[  247.802161] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported
[NV_ERR_NOT_SUPPORTED] (0x00000056) returned from pRmApi->Alloc(pRmApi,
pMemoryManager->hClient, pMemoryManager->hSubdevice, pHandle, hClass,
&listAllocParams, sizeof(listAllocParams)) @ mem_desc.c:4790
[  247.802163] NVRM serverFreeResourceTree: hObject 0xcaf00003 not found for
client 0xc1e00003
[  247.802164] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported
[NV_ERR_NOT_SUPPORTED] (0x00000056) returned from memdescSendMemDescToGSP(pGpu,
pFbsr->pSysMemDesc, &hSysMem) @ fbsr_gm107.c:113
[  247.802165] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported
[NV_ERR_NOT_SUPPORTED] (0x00000056) returned from _fbsrInitGsp(pGpu, pFbsr) @
fbsr_gm107.c:548
[  248.155408] PM: suspend entry (s2idle)
[  248.167460] Filesystems sync: 0.012 seconds
[  248.386725] Freezing user space processes
[  248.387547] Freezing user space processes completed (elapsed 0.000 seconds)
[  248.387549] OOM killer disabled.
[  248.387550] Freezing remaining freezable tasks
[  248.388805] Freezing remaining freezable tasks completed (elapsed 0.001
seconds)
[  248.388808] printk: Suspending console(s) (use no_console_suspend to debug)
[  248.801471] ACPI: EC: interrupt blocked
[  262.602984] ACPI: EC: interrupt unblocked
[  263.174146] iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
[  263.252329] nvme nvme0: 24/0/0 default/read/poll queues
...
[  263.524598] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95)
[  263.719374] OOM killer enabled.
[  263.719375] Restarting tasks ... 
[  263.719441] usb 1-3: USB disconnect, device number 3
[  263.719734] done.
[  263.719738] random: crng reseeded on system resumption
[  263.788084] PM: suspend exit
...
[  264.672684] NVRM: GPU at PCI:0000:01:00:
GPU-80c21799-19c6-1198-2255-31aa55463b1e
[  264.672688] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch
00000000
[  264.673536] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch
00000001
[  264.674314] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch
00000002
[  264.675135] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch
00000003
[  264.676290] NVRM kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70d000
returned garbage 0x0
[  264.676296] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory
error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusVerifyBar2_HAL(pGpu,
pKernelBus, NULL, NULL, 0, 0) @ kern_bus_gm107.c:457
[  264.676299] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory
error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from gpuStateLoad(pGpu,
IS_GPU_GC6_STATE_EXITING(pGpu) ? GPU_STATE_FLAGS_PRESERVING |
GPU_STATE_FLAGS_PM_TRANSITION | GPU_STATE_FLAGS_GC6_TRANSITION :
GPU_STATE_FLAGS_PRESERVING | GPU_STATE_FLAGS_PM_TRANSITION) @ gpu_suspend.c:247
[  264.678312] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch
00000000
[  264.679493] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch
00000001
[  264.680580] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch
00000002
[  264.681653] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch
00000003
[  264.690483] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error
[NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi,
nv->rmapi.hClient, nv->rmapi.hSubDevice,
NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams,
sizeof(unixConsoleParams)) @ unix_console.c:105
[  264.690775] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001;
hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670;
paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062
[  264.690781] nvidia-modeset: ERROR: GPU:0: Failed to initialize display
engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED])
[  264.690799] NVRM serverFreeResourceTree: hObject 0x10011 not found for
client 0xc1d00001
[  264.691195] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error
[NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi,
nv->rmapi.hClient, nv->rmapi.hSubDevice,
NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams,
sizeof(unixConsoleParams)) @ unix_console.c:105
[  264.691372] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001;
hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670;
paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062
[  264.691375] nvidia-modeset: ERROR: GPU:0: Failed to initialize display
engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED])
[  264.691386] NVRM serverFreeResourceTree: hObject 0x10011 not found for
client 0xc1d00001
[  264.691889] NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed!
(4f02, 0000)
[  264.692482] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error
[NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi,
nv->rmapi.hClient, nv->rmapi.hSubDevice,
NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams,
sizeof(restoreParams)) @ unix_console.c:197
[  264.721179] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0000b;
hParent=0x01000001; hObject=0x01000012; hClass=0x0000c56f;
paramsSize=0x00000168; paramsStatus=0x00000062; status=0x00000062
[  264.721182] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @
kernel_channel.c:2588
[  264.721185] NVRM nvAssertOkFailedNoLog: Assertion failed: Reset required
[NV_ERR_RESET_REQUIRED] (0x00000062) returned from
_kchannelSendChannelAllocRpc(pKernelChannel, pChannelGpfifoParams,
pKernelChannelGroup, bFullSriov) @ kernel_channel.c:863

But the problem is with resume:
...
[  268.636544] wlp0s20f3: authenticated
[  268.654389] wlp0s20f3: associated
[  276.745060] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  276.745071] #PF: supervisor read access in kernel mode
[  276.745074] #PF: error_code(0x0000) - not-present page
[  276.745077] PGD 0 P4D 0 
[  276.745081] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  276.745086] CPU: 6 PID: 2788 Comm: Xorg.bin Tainted: G           OE     
6.5.9-1-default #1 openSUSE Tumbleweed eb5faaeb0a34bed614de16eec67e50ac769ec453
[  276.745092] Hardware name: LENOVO 21D7S22N08/21D7S22N08, BIOS N3FET36W (1.21
) 05/31/2023
[  276.745095] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset]
[  276.745174] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd
41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08
48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48
[  276.745177] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286
[  276.745182] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffa72880d57b8f
[  276.745184] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff933cb01c2008
[  276.745185] RBP: ffff933cb01c2008 R08: 0000000000000014 R09:
ffffa72882ccd008
[  276.745187] R10: ffffa72881921008 R11: 000000000003f0e0 R12:
0000000000000000
[  276.745189] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15:
0000000000000001
[  276.745191] FS:  00007f743ce03980(0000) GS:ffff93434f700000(0000)
knlGS:0000000000000000
[  276.745193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  276.745195] CR2: 0000000000000008 CR3: 0000000107780000 CR4:
0000000000f50ee0
[  276.745197] PKRU: 55555554
[  276.745199] Call Trace:
[  276.745202]  <TASK>
[  276.745205]  ? __die+0x23/0x70
[  276.745212]  ? page_fault_oops+0x14d/0x490
[  276.745217]  ? EvoIsModePossibleC3+0xe1/0x5b0 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745269]  ? exc_page_fault+0x71/0x160
[  276.745275]  ? asm_exc_page_fault+0x26/0x30
[  276.745279]  ? EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745326]  nvRMIdleBaseChannel+0x6b/0xf0 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745385]  nvSetDispModeEvo+0x12c9/0x42f0 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745442]  ? Flip+0xf0/0xf0 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745501]  nvKmsIoctl+0xdc/0x220 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745557]  nvkms_ioctl+0x109/0x170 [nvidia_modeset
2547d6f3000deb268fd07ecb612b5ef73687c832]
[  276.745587]  nvidia_frontend_unlocked_ioctl+0x3c/0x60 [nvidia
ce71fbe41fb2be9720a1b7ffb01074e41d182b8e]
[  276.745757]  __x64_sys_ioctl+0x94/0xd0
[  276.745763]  do_syscall_64+0x5d/0x90
[  276.745768]  ? do_user_addr_fault+0x179/0x640
[  276.745772]  ? exit_to_user_mode_prepare+0x133/0x1f0
[  276.745778]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  276.745781] RIP: 0033:0x7f743cd139cf
[  276.745839] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00
00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  276.745842] RSP: 002b:00007ffc7c213670 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  276.745845] RAX: ffffffffffffffda RBX: 00000000c0106d00 RCX:
00007f743cd139cf
[  276.745846] RDX: 00007ffc7c2136d0 RSI: 00000000c0106d00 RDI:
0000000000000014
[  276.745848] RBP: 00007ffc7c2136d0 R08: 0000000000000000 R09:
0000555aa6b25490
[  276.745849] R10: 00007ffc7c22aa40 R11: 0000000000000246 R12:
0000000000000014
[  276.745851] R13: 00007f743c41cbc8 R14: 00007ffc7c215fd8 R15:
0000000000000003
[  276.745854]  </TASK>
[  276.745855] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg
af_packet joydev nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct bnep nft_chain_nat nf_tables
ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security btusb btrtl btbcm btintel btmtk
bluetooth nfnetlink uvcvideo videobuf2_vmalloc uvc videobuf2_memops
videobuf2_v4l2 ebtable_filter ebtables videodev ip6table_filter
videobuf2_common ip6_tables ecdh_generic iptable_filter bpfilter qrtr
nvidia_drm(OE) nvidia_modeset(OE) nvidia_uvm(OE) binfmt_misc snd_ctl_led
snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi
snd_sof_probes snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device
mc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic xfs
snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel
snd_sof_intel_hda_mlink
[  276.745909]  soundwire_cadence snd_sof_intel_hda nls_iso8859_1 snd_sof_pci
nls_cp437 snd_sof_xtensa_dsp vfat snd_sof fat snd_sof_utils snd_soc_hdac_hda
snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi
soundwire_generic_allocation soundwire_bus iwlmvm snd_soc_core
intel_uncore_frequency intel_uncore_frequency_common snd_compress
intel_tcc_cooling snd_pcm_dmaengine mac80211 libarc4 x86_pkg_temp_thermal
intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_intel_sdw_acpi
kvm_intel snd_hda_codec nvidia(OE) iwlwifi spi_nor snd_hda_core iTCO_wdt
think_lmi mei_wdt pmt_telemetry processor_thermal_device_pci intel_pmc_bxt kvm
mei_hdcp mei_pxp mtd iTCO_vendor_support processor_thermal_device snd_hwdep
intel_rapl_msr pmt_class igc irqbypass pcspkr thunderbolt cfg80211
firmware_attributes_class wmi_bmof thinkpad_acpi processor_thermal_rfim
i2c_i801 snd_pcm mei_me processor_thermal_mbox spi_intel_pci ledtrig_audio
processor_thermal_rapl spi_intel i2c_smbus platform_profile snd_timer mei
intel_rapl_common rfkill
[  276.745959]  thermal intel_vsec fan snd int3403_thermal soundcore ac
int340x_thermal_zone intel_hid int3400_thermal intel_pmc_core acpi_pad
sparse_keymap acpi_thermal_rel acpi_tad tiny_power_button fuse efi_pstore
configfs dmi_sysfs ip_tables x_tables dm_crypt essiv authenc trusted
asn1_encoder tee hid_logitech_hidpp hid_logitech_dj hid_generic usbhid
crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul
ghash_clmulni_intel xhci_pci rtsx_pci_sdmmc sha512_ssse3 xhci_pci_renesas
mmc_core xhci_hcd aesni_intel ucsi_acpi nvme typec_ucsi crypto_simd cryptd
usbcore nvme_core roles rtsx_pci typec button video battery wmi
pinctrl_alderlake serio_raw br_netfilter btrfs bridge stp llc dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua sd_mod t10_pi sg scsi_mod blake2b_generic
libcrc32c scsi_common crc32c_intel xor msr raid6_pq dm_mirror dm_region_hash
dm_log dm_mod bbswitch(O) efivarfs
[  276.746018] CR2: 0000000000000008
[  276.746021] ---[ end trace 0000000000000000 ]---
[  276.746023] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset]
[  276.746071] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd
41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08
48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48
[  276.746073] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286
[  276.746076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffa72880d57b8f
[  276.746077] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff933cb01c2008
[  276.746079] RBP: ffff933cb01c2008 R08: 0000000000000014 R09:
ffffa72882ccd008
[  276.746080] R10: ffffa72881921008 R11: 000000000003f0e0 R12:
0000000000000000
[  276.746082] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15:
0000000000000001
[  276.746083] FS:  00007f743ce03980(0000) GS:ffff93434f700000(0000)
knlGS:0000000000000000
[  276.746085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  276.746087] CR2: 0000000000000008 CR3: 0000000107780000 CR4:
0000000000f50ee0
[  276.746089] PKRU: 55555554
[  276.746090] note: Xorg.bin[2788] exited with irqs disabled

$ rpm -qa |grep -i -e nvidia
kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64
kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64
nvidia-compute-G06-32bit-535.129.03-15.1.x86_64
libva-nvidia-driver-0.0.9-1.10.x86_64
libnvidia-egl-wayland1-1.1.12-1.2.x86_64
nvidia-compute-G06-535.129.03-15.1.x86_64
kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64
nvidia-gl-G06-535.129.03-15.1.x86_64
nvidia-gl-G06-32bit-535.129.03-15.1.x86_64
nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.5.9_1-55.2.x86_64
nvidia-video-G06-32bit-535.129.03-15.1.x86_64
nvidia-open-driver-G06-signed-kmp-default-535.113.01_k6.5.6_1-51.3.x86_64
kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64
nvidia-video-G06-535.129.03-15.1.x86_64
kernel-firmware-nvidia-20231019-1.1.noarch

$ lspci |grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000
Laptop GPU] (rev a1)

$ lsgpu
card0                    10de:25b9                         drm:/dev/dri/card0
└─renderD128                                              
drm:/dev/dri/renderD128

[1] https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html
[2]
https://download.nvidia.com/XFree86/Linux-x86_64/535.129.03/README/powermanagement.html


You are receiving this mail because: