[Bug 1215523] New: kernel 5.14.21.150400.24.84.1 amdgpu critical error
https://bugzilla.suse.com/show_bug.cgi?id=1215523 Bug ID: 1215523 Summary: kernel 5.14.21.150400.24.84.1 amdgpu critical error Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.4 Hardware: x86-64 OS: openSUSE Leap 15.4 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: bugs@clearingstelle-eeg-kwkg.de QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- After the kernel update today to 5.14.21.150400.24.84.1 the system boots, but due to constantly repeated amdgpu error the files /var/log/messages and /var/log/warn grows up to gigabytes in a few minutes and the root file system gets exhausted and no space is left on the device. switching back to the previous kernel 5.14.21.150400.24.81.1 "solves" the issue. GIT Branch: SLE15-SP4_EMBARGO Distribution: SUSE Linux Enterprise 15 Name : kernel-default Version : 5.14.21 Release : 150400.24.84.1 Architecture: x86_64 /var/log/messages is full of errors like this ------------[ cut here ]------------ WARNING: CPU: 10 PID: 8062 at ../include/linux/dma-fence.h:478 amdgpu_sync_keep_later+0xab/0xc0 [amdgpu] 2023-09-20T15:06:06.968445+02:00 fermium kernel: [ 219.037070][ T8062] Modules linked in: rfcomm nf_nat_sip nft_objref nf_conntrack_sip nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject cmac algif_hash alg if_skcipher af_alg nft_ct bnep nft_chain_nat nf_tables btusb btrtl btbcm btintel ebtable_nat ebtable_broute bluetooth ip6table_nat uvcvideo snd_usb_audio videobuf2_vmalloc ip6table_mangle videobuf2_memops videobuf2_v4l2 ip6table_raw videobuf2_common ip6table_security sn d_usbmidi_lib videodev snd_rawmidi ecdh_generic snd_seq_device mc iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfil ter vboxnetadp(OEN) vboxnetflt(OEN) vboxdrv(OEN) dmi_sysfs iwlmvm joydev sunrpc mac80211 libarc4 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_intel 2023-09-20T15:06:06.968538+02:00 fermium kernel: [ 219.037204][ T8062] snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_amd nls_iso8859_1 snd_hda_core iwlwifi hid_multitouch nls_cp437 snd_hwdep vfat kvm snd_pcm r8169 fat snd_timer cfg80211 acer_wmi irqbypass real tek sparse_keymap snd mdio_devres rfkill pcspkr ucsi_acpi snd_pci_acp5x wmi_bmof efi_pstore(N) i2c_piix4 libphy snd_rn_pci_acp3x typec_ucsi soundcore k10temp snd_pci_acp3x thermal typec roles acer_wireless(N) button acpi_cpufreq i2c_designware_platform i2c_designware_co re amd_pmc ac fuse configfs ip_tables x_tables ext4 crc16 mbcache jbd2 dm_crypt essiv authenc hid_generic usbhid amdgpu crc32_pclmul crc32c_intel ghash_clmulni_intel drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit rtsx_pci_sdmmc drm_kms_helper mmc_core syscopyarea sy sfillrect sysimgblt fb_sys_fops aesni_intel cec rc_core xhci_pci crypto_simd xhci_pci_renesas xhci_hcd drm cryptd nvme usbcore serio_raw nvme_core ccp rtsx_pci sp5100_tco(N) nvme_common t10_pi mfd_core battery video wmi i2c_hid_acpi 2023-09-20T15:06:06.968543+02:00 fermium kernel: [ 219.037378][ T8062] i2c_hid dm_mirror dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod msr efivarfs 2023-09-20T15:06:06.968547+02:00 fermium kernel: [ 219.037403][ T8062] Supported: No, Unsupported modules are loaded 2023-09-20T15:06:06.968551+02:00 fermium kernel: [ 219.037406][ T8062] CPU: 10 PID: 8062 Comm: kwin_x11:cs0 Tainted: G W OE N 5.14.21-150400.24.84-default #1 SLE15-SP4 2d2aae51046e63e9f8c5f181ee9c884ea3512c4e 2023-09-20T15:06:06.968554+02:00 fermium kernel: [ 219.037414][ T8062] Hardware name: Acer TravelMate P215-41/Bassdrum_RC, BIOS V1.02 02/18/2021 2023-09-20T15:06:06.968557+02:00 fermium kernel: [ 219.037417][ T8062] RIP: 0010:amdgpu_sync_keep_later+0xab/0xc0 [amdgpu] 2023-09-20T15:06:06.968561+02:00 fermium kernel: [ 219.037602][ T8062] Code: d1 0f 92 c2 eb b4 e8 04 ae bb ed 48 85 db 75 ca eb e1 be 01 00 00 00 e8 83 3a 97 ed eb d5 be 03 00 00 00 e8 77 3a 97 ed eb ab <0f> 0b eb 90 be 02 00 00 00 e8 67 3a 97 ed eb b9 0f 1f 44 00 00 0 f 2023-09-20T15:06:06.968564+02:00 fermium kernel: [ 219.037607][ T8062] RSP: 0018:ffffa76502227ac8 EFLAGS: 00010206 2023-09-20T15:06:06.968567+02:00 fermium kernel: [ 219.037612][ T8062] RAX: ffffffffb0afffc0 RBX: ffff9c1263211940 RCX: 0000000000000000 2023-09-20T15:06:06.968570+02:00 fermium kernel: [ 219.037616][ T8062] RDX: ffff9c1244b00e18 RSI: ffff9c1263211940 RDI: ffff9c1327f51cf8 2023-09-20T15:06:06.968573+02:00 fermium kernel: [ 219.037619][ T8062] RBP: ffff9c1327f51cf8 R08: 0000000000000020 R09: 0000000000000000 2023-09-20T15:06:06.968575+02:00 fermium kernel: [ 219.037622][ T8062] R10: ffff9c1244b00e38 R11: 0000000000000003 R12: 0000000000000000 2023-09-20T15:06:06.968578+02:00 fermium kernel: [ 219.037625][ T8062] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9c12556e9948 2023-09-20T15:06:06.968582+02:00 fermium kernel: [ 219.037629][ T8062] FS: 00007fbac6fa8700(0000) GS:ffff9c153f880000(0000) knlGS:0000000000000000 2023-09-20T15:06:06.968584+02:00 fermium kernel: [ 219.037634][ T8062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2023-09-20T15:06:06.968587+02:00 fermium kernel: [ 219.037637][ T8062] CR2: 00007f773801e0b0 CR3: 0000000178bfc000 CR4: 0000000000350ee0 2023-09-20T15:06:06.968590+02:00 fermium kernel: [ 219.037641][ T8062] Call Trace: 2023-09-20T15:06:06.968594+02:00 fermium kernel: [ 219.037645][ T8062] <TASK> 2023-09-20T15:06:06.968596+02:00 fermium kernel: [ 219.037649][ T8062] amdgpu_sync_vm_fence+0x1e/0x40 [amdgpu 28d5a9707e1f6057a70f2fd1d73854000d464e2e] 2023-09-20T15:06:06.968599+02:00 fermium kernel: [ 219.037834][ T8062] amdgpu_cs_ioctl+0x1574/0x1ca0 [amdgpu 28d5a9707e1f6057a70f2fd1d73854000d464e2e] 2023-09-20T15:06:06.968603+02:00 fermium kernel: [ 219.038037][ T8062] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu 28d5a9707e1f6057a70f2fd1d73854000d464e2e] 2023-09-20T15:06:06.968606+02:00 fermium kernel: [ 219.038249][ T8062] drm_ioctl_kernel+0xb6/0x100 [drm b72f874ec848c4be9b8c39c6d4b6cda18e282a1a] 2023-09-20T15:06:06.968609+02:00 fermium kernel: [ 219.038285][ T8062] drm_ioctl+0x35a/0x400 [drm b72f874ec848c4be9b8c39c6d4b6cda18e282a1a] 2023-09-20T15:06:06.968612+02:00 fermium kernel: [ 219.038318][ T8062] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu 28d5a9707e1f6057a70f2fd1d73854000d464e2e] 2023-09-20T15:06:06.968615+02:00 fermium kernel: [ 219.038500][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968618+02:00 fermium kernel: [ 219.038505][ T8062] ? try_to_wake_up+0x177/0x550 2023-09-20T15:06:06.968740+02:00 fermium kernel: [ 219.038519][ T8062] amdgpu_drm_ioctl+0x49/0x80 [amdgpu 28d5a9707e1f6057a70f2fd1d73854000d464e2e] 2023-09-20T15:06:06.968743+02:00 fermium kernel: [ 219.038694][ T8062] __x64_sys_ioctl+0x92/0xd0 2023-09-20T15:06:06.968746+02:00 fermium kernel: [ 219.038702][ T8062] do_syscall_64+0x5b/0x80 2023-09-20T15:06:06.968750+02:00 fermium kernel: [ 219.038708][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968753+02:00 fermium kernel: [ 219.038711][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968756+02:00 fermium kernel: [ 219.038715][ T8062] ? __x64_sys_futex+0x5e/0x1d0 2023-09-20T15:06:06.968760+02:00 fermium kernel: [ 219.038721][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968762+02:00 fermium kernel: [ 219.038725][ T8062] ? syscall_exit_to_user_mode+0x28/0x40 2023-09-20T15:06:06.968765+02:00 fermium kernel: [ 219.038731][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968768+02:00 fermium kernel: [ 219.038735][ T8062] ? syscall_exit_to_user_mode+0x28/0x40 2023-09-20T15:06:06.968771+02:00 fermium kernel: [ 219.038740][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968775+02:00 fermium kernel: [ 219.038743][ T8062] ? do_syscall_64+0x67/0x80 2023-09-20T15:06:06.968780+02:00 fermium kernel: [ 219.038747][ T8062] ? syscall_exit_to_user_mode+0x28/0x40 2023-09-20T15:06:06.968783+02:00 fermium kernel: [ 219.038752][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968786+02:00 fermium kernel: [ 219.038756][ T8062] ? do_syscall_64+0x67/0x80 2023-09-20T15:06:06.968789+02:00 fermium kernel: [ 219.038760][ T8062] ? do_syscall_64+0x67/0x80 2023-09-20T15:06:06.968792+02:00 fermium kernel: [ 219.038764][ T8062] ? do_syscall_64+0x67/0x80 2023-09-20T15:06:06.968795+02:00 fermium kernel: [ 219.038768][ T8062] ? srso_return_thunk+0x5/0x10 2023-09-20T15:06:06.968798+02:00 fermium kernel: [ 219.038772][ T8062] entry_SYSCALL_64_after_hwframe+0x61/0xcb 2023-09-20T15:06:06.968801+02:00 fermium kernel: [ 219.038778][ T8062] RIP: 0033:0x7fbae5d02437 2023-09-20T15:06:06.968804+02:00 fermium kernel: [ 219.038782][ T8062] Code: 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 29 da 0d 00 f7 d8 64 89 01 48 2023-09-20T15:06:06.968807+02:00 fermium kernel: [ 219.038786][ T8062] RSP: 002b:00007fbac6fa7888 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 2023-09-20T15:06:06.968811+02:00 fermium kernel: [ 219.038791][ T8062] RAX: ffffffffffffffda RBX: 00007fbac6fa79e8 RCX: 00007fbae5d02437 2023-09-20T15:06:06.968813+02:00 fermium kernel: [ 219.038794][ T8062] RDX: 00007fbac6fa7900 RSI: 00000000c0186444 RDI: 0000000000000009 2023-09-20T15:06:06.968817+02:00 fermium kernel: [ 219.038797][ T8062] RBP: 00007fbac6fa7900 R08: 00007fbac6fa7a40 R09: 0000000000000020 2023-09-20T15:06:06.968822+02:00 fermium kernel: [ 219.038800][ T8062] R10: 00007fbac6fa7a40 R11: 0000000000000246 R12: 00000000c0186444 2023-09-20T15:06:06.968826+02:00 fermium kernel: [ 219.038802][ T8062] R13: 0000000000000009 R14: 00005636906067b0 R15: 0000000000000020 2023-09-20T15:06:06.968830+02:00 fermium kernel: [ 219.038813][ T8062] </TASK> 2023-09-20T15:06:06.968833+02:00 fermium kernel: [ 219.038815][ T8062] ---[ end trace 3d22bee0ff27a2c8 ]--- -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215523 https://bugzilla.suse.com/show_bug.cgi?id=1215523#c1 --- Comment #1 from RELAW GmbH <bugs@clearingstelle-eeg-kwkg.de> --- The processor is AMD Ryzen 5 4650UPro at an Acer TravelMate P215 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215523 RELAW GmbH <bugs@clearingstelle-eeg-kwkg.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P1 - Urgent -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215523 https://bugzilla.suse.com/show_bug.cgi?id=1215523#c18 Barry Foxbat <barryfoxbat@btinternet.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |barryfoxbat@btinternet.com --- Comment #18 from Barry Foxbat <barryfoxbat@btinternet.com> --- This bug has caused me a big headache this last week or so. The laptop machine has been running super hot, and I've not known why. I've even opened up the machine to look for possible dust build-up. The machine could have easily cratered because of it all. This morning I found this thread, and I've reverted the kernel from the installed 84.1 to the available 81.1. That didn't immediately stop the machine heating up at the slightest bit of load. However, I looked in /var/log and found absolutely humungous log files (45Gb each at the max), two each for each day since the 22nd Sept 2023. No wonder the machine's been screaming at me! I can't believe this bug hasn't been highlighted to a wider audience! -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1215523 https://bugzilla.suse.com/show_bug.cgi?id=1215523#c25 --- Comment #25 from Andreas Stieger <Andreas.Stieger@gmx.de> --- *** Bug 1215990 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com