https://bugzilla.suse.com/show_bug.cgi?id=1216871 Bug ID: 1216871 Summary: Thinkpad P16 with Open GPU kernel modules does not resume after sleep/hibernate (BUG: kernel NULL pointer dereference, address: 0000000000000008) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: petr.vorel@suse.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- This is similar to #1211950, but I don't use nouveau but these packages nvidia-open-driver-G06-signed-kmp-default kernel-firmware-nvidia-gsp-G06 nvidia-open-driver-G06-signed-kmp suggested in [1]. I tried both of these [2], but although sleep worked it did not resume: sudo systemctl suspend sudo systemctl hibernate (I searched to [2] due errors reported in dmesg after trying to suspend with echo mem > /sys/power/state). suspend is ok: [ 0.000000] Linux version 6.5.9-1-default (geeko@buildhost) (gcc (SUSE Linux) 13.2.1 20230912 [revision b96e66fd4ef3e36983969fb8cdd1956f551a074b], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.40.0.20230412-5) #1 SMP PREEMPT_DYNAMIC Wed Oct 25 10:31:37 UTC 2023 (29edc7c) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.9-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap mitigations=auto quiet security=apparmor nosimplefb=1 ... [ 247.802155] NVRM nvAssertFailedNoLog: Assertion failed: 0 @ mem_list.c:293 [ 247.802161] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from pRmApi->Alloc(pRmApi, pMemoryManager->hClient, pMemoryManager->hSubdevice, pHandle, hClass, &listAllocParams, sizeof(listAllocParams)) @ mem_desc.c:4790 [ 247.802163] NVRM serverFreeResourceTree: hObject 0xcaf00003 not found for client 0xc1e00003 [ 247.802164] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from memdescSendMemDescToGSP(pGpu, pFbsr->pSysMemDesc, &hSysMem) @ fbsr_gm107.c:113 [ 247.802165] NVRM nvAssertOkFailedNoLog: Assertion failed: Call not supported [NV_ERR_NOT_SUPPORTED] (0x00000056) returned from _fbsrInitGsp(pGpu, pFbsr) @ fbsr_gm107.c:548 [ 248.155408] PM: suspend entry (s2idle) [ 248.167460] Filesystems sync: 0.012 seconds [ 248.386725] Freezing user space processes [ 248.387547] Freezing user space processes completed (elapsed 0.000 seconds) [ 248.387549] OOM killer disabled. [ 248.387550] Freezing remaining freezable tasks [ 248.388805] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 248.388808] printk: Suspending console(s) (use no_console_suspend to debug) [ 248.801471] ACPI: EC: interrupt blocked [ 262.602984] ACPI: EC: interrupt unblocked [ 263.174146] iwlwifi 0000:00:14.3: WRT: Invalid buffer destination [ 263.252329] nvme nvme0: 24/0/0 default/read/poll queues ... [ 263.524598] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-95) [ 263.719374] OOM killer enabled. [ 263.719375] Restarting tasks ... [ 263.719441] usb 1-3: USB disconnect, device number 3 [ 263.719734] done. [ 263.719738] random: crng reseeded on system resumption [ 263.788084] PM: suspend exit ... [ 264.672684] NVRM: GPU at PCI:0000:01:00: GPU-80c21799-19c6-1198-2255-31aa55463b1e [ 264.672688] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000000 [ 264.673536] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000001 [ 264.674314] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000002 [ 264.675135] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000003 [ 264.676290] NVRM kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70d000 returned garbage 0x0 [ 264.676296] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusVerifyBar2_HAL(pGpu, pKernelBus, NULL, NULL, 0, 0) @ kern_bus_gm107.c:457 [ 264.676299] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from gpuStateLoad(pGpu, IS_GPU_GC6_STATE_EXITING(pGpu) ? GPU_STATE_FLAGS_PRESERVING | GPU_STATE_FLAGS_PM_TRANSITION | GPU_STATE_FLAGS_GC6_TRANSITION : GPU_STATE_FLAGS_PRESERVING | GPU_STATE_FLAGS_PM_TRANSITION) @ gpu_suspend.c:247 [ 264.678312] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000000 [ 264.679493] NVRM: Xid (PCI:0000:01:00): 45, pid=1781, name=modprobe, Ch 00000001 [ 264.680580] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000002 [ 264.681653] NVRM: Xid (PCI:0000:01:00): 45, pid=2788, name=Xorg.bin, Ch 00000003 [ 264.690483] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams, sizeof(unixConsoleParams)) @ unix_console.c:105 [ 264.690775] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001; hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670; paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062 [ 264.690781] nvidia-modeset: ERROR: GPU:0: Failed to initialize display engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED]) [ 264.690799] NVRM serverFreeResourceTree: hObject 0x10011 not found for client 0xc1d00001 [ 264.691195] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams, sizeof(unixConsoleParams)) @ unix_console.c:105 [ 264.691372] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00001; hParent=0x00010001; hObject=0x00010011; hClass=0x0000c670; paramsSize=0x00000000; paramsStatus=0x00000062; status=0x00000062 [ 264.691375] nvidia-modeset: ERROR: GPU:0: Failed to initialize display engine: 0x62 (Reset required [NV_ERR_RESET_REQUIRED]) [ 264.691386] NVRM serverFreeResourceTree: hObject 0x10011 not found for client 0xc1d00001 [ 264.691889] NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000) [ 264.692482] NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197 [ 264.721179] NVRM rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0000b; hParent=0x01000001; hObject=0x01000012; hClass=0x0000c56f; paramsSize=0x00000168; paramsStatus=0x00000062; status=0x00000062 [ 264.721182] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_channel.c:2588 [ 264.721185] NVRM nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from _kchannelSendChannelAllocRpc(pKernelChannel, pChannelGpfifoParams, pKernelChannelGroup, bFullSriov) @ kernel_channel.c:863 But the problem is with resume: ... [ 268.636544] wlp0s20f3: authenticated [ 268.654389] wlp0s20f3: associated [ 276.745060] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 276.745071] #PF: supervisor read access in kernel mode [ 276.745074] #PF: error_code(0x0000) - not-present page [ 276.745077] PGD 0 P4D 0 [ 276.745081] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 276.745086] CPU: 6 PID: 2788 Comm: Xorg.bin Tainted: G OE 6.5.9-1-default #1 openSUSE Tumbleweed eb5faaeb0a34bed614de16eec67e50ac769ec453 [ 276.745092] Hardware name: LENOVO 21D7S22N08/21D7S22N08, BIOS N3FET36W (1.21 ) 05/31/2023 [ 276.745095] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset] [ 276.745174] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd 41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08 48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48 [ 276.745177] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286 [ 276.745182] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa72880d57b8f [ 276.745184] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff933cb01c2008 [ 276.745185] RBP: ffff933cb01c2008 R08: 0000000000000014 R09: ffffa72882ccd008 [ 276.745187] R10: ffffa72881921008 R11: 000000000003f0e0 R12: 0000000000000000 [ 276.745189] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15: 0000000000000001 [ 276.745191] FS: 00007f743ce03980(0000) GS:ffff93434f700000(0000) knlGS:0000000000000000 [ 276.745193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 276.745195] CR2: 0000000000000008 CR3: 0000000107780000 CR4: 0000000000f50ee0 [ 276.745197] PKRU: 55555554 [ 276.745199] Call Trace: [ 276.745202] <TASK> [ 276.745205] ? __die+0x23/0x70 [ 276.745212] ? page_fault_oops+0x14d/0x490 [ 276.745217] ? EvoIsModePossibleC3+0xe1/0x5b0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745269] ? exc_page_fault+0x71/0x160 [ 276.745275] ? asm_exc_page_fault+0x26/0x30 [ 276.745279] ? EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745326] nvRMIdleBaseChannel+0x6b/0xf0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745385] nvSetDispModeEvo+0x12c9/0x42f0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745442] ? Flip+0xf0/0xf0 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745501] nvKmsIoctl+0xdc/0x220 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745557] nvkms_ioctl+0x109/0x170 [nvidia_modeset 2547d6f3000deb268fd07ecb612b5ef73687c832] [ 276.745587] nvidia_frontend_unlocked_ioctl+0x3c/0x60 [nvidia ce71fbe41fb2be9720a1b7ffb01074e41d182b8e] [ 276.745757] __x64_sys_ioctl+0x94/0xd0 [ 276.745763] do_syscall_64+0x5d/0x90 [ 276.745768] ? do_user_addr_fault+0x179/0x640 [ 276.745772] ? exit_to_user_mode_prepare+0x133/0x1f0 [ 276.745778] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 276.745781] RIP: 0033:0x7f743cd139cf [ 276.745839] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 276.745842] RSP: 002b:00007ffc7c213670 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 276.745845] RAX: ffffffffffffffda RBX: 00000000c0106d00 RCX: 00007f743cd139cf [ 276.745846] RDX: 00007ffc7c2136d0 RSI: 00000000c0106d00 RDI: 0000000000000014 [ 276.745848] RBP: 00007ffc7c2136d0 R08: 0000000000000000 R09: 0000555aa6b25490 [ 276.745849] R10: 00007ffc7c22aa40 R11: 0000000000000246 R12: 0000000000000014 [ 276.745851] R13: 00007f743c41cbc8 R14: 00007ffc7c215fd8 R15: 0000000000000003 [ 276.745854] </TASK> [ 276.745855] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg af_packet joydev nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct bnep nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security btusb btrtl btbcm btintel btmtk bluetooth nfnetlink uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 ebtable_filter ebtables videodev ip6table_filter videobuf2_common ip6_tables ecdh_generic iptable_filter bpfilter qrtr nvidia_drm(OE) nvidia_modeset(OE) nvidia_uvm(OE) binfmt_misc snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic xfs snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink [ 276.745909] soundwire_cadence snd_sof_intel_hda nls_iso8859_1 snd_sof_pci nls_cp437 snd_sof_xtensa_dsp vfat snd_sof fat snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus iwlmvm snd_soc_core intel_uncore_frequency intel_uncore_frequency_common snd_compress intel_tcc_cooling snd_pcm_dmaengine mac80211 libarc4 x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec nvidia(OE) iwlwifi spi_nor snd_hda_core iTCO_wdt think_lmi mei_wdt pmt_telemetry processor_thermal_device_pci intel_pmc_bxt kvm mei_hdcp mei_pxp mtd iTCO_vendor_support processor_thermal_device snd_hwdep intel_rapl_msr pmt_class igc irqbypass pcspkr thunderbolt cfg80211 firmware_attributes_class wmi_bmof thinkpad_acpi processor_thermal_rfim i2c_i801 snd_pcm mei_me processor_thermal_mbox spi_intel_pci ledtrig_audio processor_thermal_rapl spi_intel i2c_smbus platform_profile snd_timer mei intel_rapl_common rfkill [ 276.745959] thermal intel_vsec fan snd int3403_thermal soundcore ac int340x_thermal_zone intel_hid int3400_thermal intel_pmc_core acpi_pad sparse_keymap acpi_thermal_rel acpi_tad tiny_power_button fuse efi_pstore configfs dmi_sysfs ip_tables x_tables dm_crypt essiv authenc trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj hid_generic usbhid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel xhci_pci rtsx_pci_sdmmc sha512_ssse3 xhci_pci_renesas mmc_core xhci_hcd aesni_intel ucsi_acpi nvme typec_ucsi crypto_simd cryptd usbcore nvme_core roles rtsx_pci typec button video battery wmi pinctrl_alderlake serio_raw br_netfilter btrfs bridge stp llc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sd_mod t10_pi sg scsi_mod blake2b_generic libcrc32c scsi_common crc32c_intel xor msr raid6_pq dm_mirror dm_region_hash dm_log dm_mod bbswitch(O) efivarfs [ 276.746018] CR2: 0000000000000008 [ 276.746021] ---[ end trace 0000000000000000 ]--- [ 276.746023] RIP: 0010:EvoIsChannelMethodPendingC3+0x22/0xc0 [nvidia_modeset] [ 276.746071] Code: 00 00 00 00 00 00 00 00 f3 0f 1e fa 41 55 89 d0 49 89 cd 41 b8 14 00 00 00 41 54 49 89 c4 55 48 89 fd 53 48 89 f3 48 83 ec 28 <8b> 56 08 48 c7 44 24 14 00 00 00 00 48 8d 4c 24 08 48 c1 e2 20 48 [ 276.746073] RSP: 0018:ffffa72880d57b30 EFLAGS: 00010286 [ 276.746076] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa72880d57b8f [ 276.746077] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff933cb01c2008 [ 276.746079] RBP: ffff933cb01c2008 R08: 0000000000000014 R09: ffffa72882ccd008 [ 276.746080] R10: ffffa72881921008 R11: 000000000003f0e0 R12: 0000000000000000 [ 276.746082] R13: ffffa72880d57b8f R14: ffffa72880d57d30 R15: 0000000000000001 [ 276.746083] FS: 00007f743ce03980(0000) GS:ffff93434f700000(0000) knlGS:0000000000000000 [ 276.746085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 276.746087] CR2: 0000000000000008 CR3: 0000000107780000 CR4: 0000000000f50ee0 [ 276.746089] PKRU: 55555554 [ 276.746090] note: Xorg.bin[2788] exited with irqs disabled $ rpm -qa |grep -i -e nvidia kernel-firmware-nvidia-gsp-G06-525.116.04-2.1.x86_64 kernel-firmware-nvidia-gspx-G06-535.113.01-1.1.x86_64 nvidia-compute-G06-32bit-535.129.03-15.1.x86_64 libva-nvidia-driver-0.0.9-1.10.x86_64 libnvidia-egl-wayland1-1.1.12-1.2.x86_64 nvidia-compute-G06-535.129.03-15.1.x86_64 kernel-firmware-nvidia-gsp-G06-535.54.03-1.1.x86_64 nvidia-gl-G06-535.129.03-15.1.x86_64 nvidia-gl-G06-32bit-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.129.03_k6.5.9_1-55.2.x86_64 nvidia-video-G06-32bit-535.129.03-15.1.x86_64 nvidia-open-driver-G06-signed-kmp-default-535.113.01_k6.5.6_1-51.3.x86_64 kernel-firmware-nvidia-gspx-G06-535.129.03-11.1.x86_64 nvidia-video-G06-535.129.03-15.1.x86_64 kernel-firmware-nvidia-20231019-1.1.noarch $ lspci |grep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation GA107GLM [RTX A1000 Laptop GPU] (rev a1) $ lsgpu card0 10de:25b9 drm:/dev/dri/card0 └─renderD128 drm:/dev/dri/renderD128 [1] https://sndirsch.github.io/nvidia/2022/06/07/nvidia-opengpu.html [2] https://download.nvidia.com/XFree86/Linux-x86_64/535.129.03/README/powermana... -- You are receiving this mail because: You are on the CC list for the bug.