[Bug 1205774] New: Kernel 6.0.8 + nvidia 525.53 crash on BUG: kernel NULL pointer dereference, #PF: supervisor instruction fetch in kernel mode
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 Bug ID: 1205774 Summary: Kernel 6.0.8 + nvidia 525.53 crash on BUG: kernel NULL pointer dereference, #PF: supervisor instruction fetch in kernel mode Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: bruno@ioda-net.ch QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 863125 --> http://bugzilla.opensuse.org/attachment.cgi?id=863125&action=edit Last kernel 6.0.8 boot and crash Hello I need help to find the root cause of crashes (complete freeze of the workstation). Since the introduction of kernel 6.0.8 and latest nvidia 525.53 I constantly face complete machine freeze, when I'm using my external monitors. nov 25 16:05:34 qt-kt kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 nov 25 16:05:34 qt-kt kernel: #PF: supervisor instruction fetch in kernel mode nov 25 16:05:34 qt-kt kernel: #PF: error_code(0x0010) - not-present page nov 25 16:05:34 qt-kt kernel: PGD 0 P4D 0 nov 25 16:05:34 qt-kt kernel: Oops: 0010 [#1] PREEMPT SMP PTI nov 25 16:05:34 qt-kt kernel: CPU: 3 PID: 1004 Comm: nvidia-modeset/ Tainted: P OE 6.0.8-1-default #1 openSUSE Tumbleweed 9d20364b934f5aab0a9bdf84e8f45cfdfae39dab nov 25 16:05:34 qt-kt kernel: Hardware name: Dell Inc. Precision 7510/0YH43H, BIOS 1.29.3 09/18/2022 nov 25 16:05:34 qt-kt kernel: RIP: 0010:0x0 nov 25 16:05:34 qt-kt kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. nov 25 16:05:34 qt-kt kernel: RSP: 0018:ffffadc300703c90 EFLAGS: 00010207 nov 25 16:05:34 qt-kt kernel: RAX: 0000000000000000 RBX: ffff9b2996c75378 RCX: 0000000000000004 nov 25 16:05:34 qt-kt kernel: RDX: 0000000000000010 RSI: ffff9b2996c75378 RDI: ffffadc300703a10 nov 25 16:05:34 qt-kt kernel: RBP: ffff9b292ef8b560 R08: 0000000000000000 R09: 0000000000000040 nov 25 16:05:34 qt-kt kernel: R10: ffff9b2886b28008 R11: ffffadc300703c00 R12: 0000000000000000 nov 25 16:05:34 qt-kt kernel: R13: 0000000000000000 R14: ffffffffc4fbee70 R15: ffffffffc4eb7580 nov 25 16:05:34 qt-kt kernel: FS: 0000000000000000(0000) GS:ffff9b37c44c0000(0000) knlGS:0000000000000000 nov 25 16:05:34 qt-kt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 nov 25 16:05:34 qt-kt kernel: CR2: ffffffffffffffd6 CR3: 000000075be10001 CR4: 00000000003706e0 nov 25 16:05:34 qt-kt kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 nov 25 16:05:34 qt-kt kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 nov 25 16:05:34 qt-kt kernel: Call Trace: nov 25 16:05:34 qt-kt kernel: <TASK> nov 25 16:05:34 qt-kt kernel: _nv001281kms+0x194/0x1a0 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? _nv001274kms+0x9a/0xd0 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? _nv001466kms+0x172/0x180 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? _nv001149kms+0xf3/0xa50 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? schedule+0x5a/0xd0 nov 25 16:05:34 qt-kt kernel: ? schedule_timeout+0x10e/0x150 nov 25 16:05:34 qt-kt kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? __down_common+0x10b/0x1e0 nov 25 16:05:34 qt-kt kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? _nv002457kms+0x2d/0x70 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? nvkms_kthread_q_callback+0x88/0xf0 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? _main_loop+0x77/0x130 [nvidia_modeset 4cdb9c16932eaf3d76d37eaaf2544a31bf3316d8] nov 25 16:05:34 qt-kt kernel: ? kthread+0xd7/0x100 nov 25 16:05:34 qt-kt kernel: ? kthread_complete_and_exit+0x20/0x20 nov 25 16:05:34 qt-kt kernel: ? ret_from_fork+0x1f/0x30 nov 25 16:05:34 qt-kt kernel: </TASK> nov 25 16:05:34 qt-kt kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq nf_nat_sip nft_objref af_packet nf_conntrack_sip nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct n> nov 25 16:05:34 qt-kt kernel: processor_thermal_device_pci_legacy irqbypass iwlwifi processor_thermal_device pcspkr ledtrig_audio processor_thermal_rfim snd_usbmidi_lib efi_pstore snd_intel_sdw_acpi sparse_keymap processor_thermal_mbox cdc_wdm videob> nov 25 16:05:34 qt-kt kernel: fuse configfs dmi_sysfs ip_tables x_tables cmac algif_hash algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj hid_generic dm_crypt usbhid essiv authenc trusted asn1_encoder tee bnep btusb btrtl btbcm btintel btmtk > nov 25 16:05:34 qt-kt kernel: CR2: 0000000000000000 nov 25 16:05:34 qt-kt kernel: ---[ end trace 0000000000000000 ]--- Most of the time this happen after resuming the screens, the primary one get on, but the second is always in suspend mode, I have to open neither display settings in plasma or nvidia-settings to set on the second monitor (displayport slave link) after switching off and on the monitor. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c1 --- Comment #1 from Bruno Friedmann <bruno@ioda-net.ch> --- installed kernel S | Name | Type | Version | Arch | Repository ---+-----------------------------+---------+------------------------+--------+------------------ i+ | kernel-default | package | 6.0.7-1.1 | x86_64 | (System Packages) i+ | kernel-default | package | 6.0.6-1.1 | x86_64 | (System Packages) i+ | kernel-default | package | 6.0.8-1.1 | x86_64 | oss i+ | kernel-default-devel | package | 6.0.7-1.1 | x86_64 | (System Packages) i+ | kernel-default-devel | package | 6.0.6-1.1 | x86_64 | (System Packages) i+ | kernel-default-devel | package | 6.0.8-1.1 | x86_64 | oss i+ | kernel-devel | package | 6.0.7-1.1 | noarch | (System Packages) i+ | kernel-devel | package | 6.0.6-1.1 | noarch | (System Packages) i+ | kernel-devel | package | 6.0.8-1.1 | noarch | oss i+ | kernel-firmware-all | package | 20221109-1.1 | noarch | oss i+ | kernel-macros | package | 6.0.8-1.1 | noarch | oss i+ | kernel-obs-build | package | 6.0.8-1.1 | x86_64 | oss i+ | kernel-syms | package | 6.0.7-1.1 | x86_64 | (System Packages) i+ | kernel-syms | package | 6.0.6-1.1 | x86_64 | (System Packages) i+ | kernel-syms | package | 6.0.8-1.1 | x86_64 | oss installed nvidia (official repo) S | Name | Type | Version | Arch | Repository ---+---------------------------+---------+----------------------+--------+----------- i+ | kernel-firmware-nvidia | package | 20221109-1.1 | noarch | oss i+ | libnvidia-egl-wayland1 | package | 1.1.11-1.1 | x86_64 | oss i+ | nvidia-computeG06 | package | 525.53-16.1 | x86_64 | nvidia i+ | nvidia-computeG06-32bit | package | 525.53-16.1 | x86_64 | nvidia i+ | nvidia-gfxG06-kmp-default | package | 525.53_k6.0.8_1-16.3 | x86_64 | nvidia i+ | nvidia-glG06 | package | 525.53-16.1 | x86_64 | nvidia i+ | nvidia-glG06-32bit | package | 525.53-16.1 | x86_64 | nvidia i+ | nvidia-texture-tools | package | 2.1.2-2.8 | x86_64 | oss i+ | x11-video-nvidiaG06 | package | 525.53-16.1 | x86_64 | nvidia i+ | x11-video-nvidiaG06-32bit | package | 525.53-16.1 | x86_64 | nvidia NAME="openSUSE Tumbleweed" # VERSION="20221123" -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c2 --- Comment #2 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 863126 --> http://bugzilla.opensuse.org/attachment.cgi?id=863126&action=edit lshw full Hardware is 6 years old now, and has always work before recently. I've another computer running same TW version, which is a workstation not a laptop, with AMD CPU and nvidia card and this one doesn't seems to have those crashes. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sndirsch@suse.com, | |tiwai@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c3 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ddadap@nvidia.com --- Comment #3 from Stefan Dirsch <sndirsch@suse.com> --- Hmm. No idea. Adding my contact at NVIDIA. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c4 --- Comment #4 from Bruno Friedmann <bruno@ioda-net.ch> --- With kernel 6.0.10-1.1 and nvidia 525.60.11 I'm now seeing a lot of [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event I will change next week the DP cable in case it make a difference. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c5 --- Comment #5 from Bruno Friedmann <bruno@ioda-net.ch> --- Under wayland I got more crashes from this style [170893.490622] plasmashell[3405]: segfault at 8 ip 00007f27a65bd546 sp 00007ffe8c057ac0 error 4 in libQt5Gui.so.5.15.7[7f27a6511000+4ec000] [170893.490633] Code: 00 00 00 90 48 8b 36 48 03 76 10 e9 d4 1b f6 ff 0f 1f 40 00 53 48 89 fb 48 83 ec 10 64 48 8b 04 25 28 00 00 00 48 89 44 24 08 <48> 8b 46 08 48 8b b0 88 00 00 00 48 85 f6 74 22 48 8b 06 ff 50 18 [170904.203109] [drm:nv_drm_fence_context_create_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate fence signaling event -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c6 --- Comment #6 from Bruno Friedmann <bruno@ioda-net.ch> --- To Stefan, really strange, since yesterday I've remove the cmdline parameter simplefb=1 and I simply didn't experiment crash, kernel freeze, firefox code error, and both external screen can work. Maybe that parameter is no more needed? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c7 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bruno@ioda-net.ch Flags| |needinfo?(bruno@ioda-net.ch | |) --- Comment #7 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Bruno Friedmann from comment #6)
To Stefan, really strange, since yesterday I've remove the cmdline parameter simplefb=1
simplefb=1 ? It should be nosimplefb=1 !
and I simply didn't experiment crash, kernel freeze, firefox code error, and both external screen can work.
Maybe that parameter is no more needed?
Could you doublecheck what option you had set? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c8 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(bruno@ioda-net.ch | |) | --- Comment #8 from Bruno Friedmann <bruno@ioda-net.ch> --- Yes sorry, non ecc memory, nosimplefb=1 is added when nvidia kmp are installed. I've clean now this one and rebuild mkinitrd. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c9 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(bruno@ioda-net.ch | |) --- Comment #9 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Bruno Friedmann from comment #8)
Yes sorry, non ecc memory,
?
nosimplefb=1 is added when nvidia kmp are installed.
Yes, of course, but I don't what or if you added simplefb=1 option ...
I've clean now this one and rebuild mkinitrd.
So you're testing again with nosimplefb=1 ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c10 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(bruno@ioda-net.ch | |) | --- Comment #10 from Bruno Friedmann <bruno@ioda-net.ch> --- Actually with TW 20221205 I'm using those packages kernel-default 6.0.10-1.1 kernel-firmware-nvidia 20221130-1.1 nvidia 525.60.11-15.1 I've removed the nosimplefb=1 parameter, and since then, I didn't detect any crash, hang, nor spurious message about Firefox or other programs segfaulting. My external dual screen are again working without freezing the computer. About nosimplefb, My laptop has internal intel gpu desactivated, and only nvidia gpu quadro is in use, I'm using efi, and the console start already with 4k resolution (grub prompt for example). I don't know if that make a difference. I will have now to watch every nvidia install to remove and recreate initrd afterwards. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c11 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS Component|Kernel |X11 3rd Party Driver Assignee|kernel-bugs@opensuse.org |gfx-bugs@suse.de Summary|Kernel 6.0.8 + nvidia |Kernel 6.0.8 + nvidia >= |525.53 crash on BUG: kernel |525.53 + nosimplefb=1: |NULL pointer dereference, |crash on BUG: kernel NULL |#PF: supervisor instruction |pointer dereference, #PF: |fetch in kernel mode |supervisor instruction | |fetch in kernel mode QA Contact|qa-bugs@suse.de |sndirsch@suse.com --- Comment #11 from Stefan Dirsch <sndirsch@suse.com> --- First time I hear about issues with nvidia driver, which go away by re-enabling simplefb/simpledrm. Tracking now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium Assignee|gfx-bugs@suse.de |sndirsch@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c12 --- Comment #12 from Bruno Friedmann <bruno@ioda-net.ch> --- Created attachment 863420 --> http://bugzilla.opensuse.org/attachment.cgi?id=863420&action=edit kernel 6.10.1 / nvidia 525.60.11 crash after screen resume Here my last attempt which proove that nosimplefb present or not doesn't really help. Just after the crash I've run the full hardware diagnostic which return with no errors... A good news for my wallet, but not to understand why is this crashing. I've now completely remove the ~/.local/share/kscreen directory content, and start it again from 0. My dual external screen are linked to daisy chain. Laptopt -> DP cable -> Dell P2715Q (MST set to primary)-> DP Cable -> HP ZR2240w Again that setup has work since 2016 I'm puzzled, because the last two days, it was working with crash completely. Just tell me if you want me running whatever tests is needed. Thanks for your attention. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c13 --- Comment #13 from Bruno Friedmann <bruno@ioda-net.ch> --- To ease readiness last trace was kernel 6.0.10, nvidia 525.60.11 TW 20221205 Dec 08 15:09:52 qt-kt kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: #PF: supervisor instruction fetch in kernel mode Dec 08 15:09:52 qt-kt kernel: #PF: error_code(0x0010) - not-present page Dec 08 15:09:52 qt-kt kernel: PGD 0 P4D 0 Dec 08 15:09:52 qt-kt kernel: Oops: 0010 [#1] PREEMPT SMP PTI Dec 08 15:09:52 qt-kt kernel: CPU: 7 PID: 1004 Comm: nvidia-modeset/ Tainted: P OE 6.0.10-1-default #1 openSUSE Tumbleweed 4a16f579cfdae0483b65cebdea4b5e462535ee0d Dec 08 15:09:52 qt-kt kernel: Hardware name: Dell Inc. Precision 7510/0YH43H, BIOS 1.29.3 09/18/2022 Dec 08 15:09:52 qt-kt kernel: RIP: 0010:0x0 Dec 08 15:09:52 qt-kt kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Dec 08 15:09:52 qt-kt kernel: RSP: 0018:ffffba0f030efc90 EFLAGS: 00010203 Dec 08 15:09:52 qt-kt kernel: RAX: 0000000000000000 RBX: ffff984eca530378 RCX: 0000000000000004 Dec 08 15:09:52 qt-kt kernel: RDX: 0000000000000010 RSI: ffff984eca530378 RDI: ffffba0f030efa10 Dec 08 15:09:52 qt-kt kernel: RBP: ffff9848c7dafce0 R08: 0000000000000000 R09: 0000000000000040 Dec 08 15:09:52 qt-kt kernel: R10: ffff9845243c8008 R11: ffffba0f030efc00 R12: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: R13: 0000000000000000 R14: ffffffffc523feb0 R15: ffffffffc51376e0 Dec 08 15:09:52 qt-kt kernel: FS: 0000000000000000(0000) GS:ffff9854045c0000(0000) knlGS:0000000000000000 Dec 08 15:09:52 qt-kt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 08 15:09:52 qt-kt kernel: CR2: ffffffffffffffd6 CR3: 00000007d0410005 CR4: 00000000003706e0 Dec 08 15:09:52 qt-kt kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Dec 08 15:09:52 qt-kt kernel: Call Trace: Dec 08 15:09:52 qt-kt kernel: <TASK> Dec 08 15:09:52 qt-kt kernel: _nv001282kms+0x194/0x1a0 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? _nv001275kms+0x9a/0xd0 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? _nv001467kms+0x172/0x180 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? _nv001150kms+0xf3/0xa50 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? schedule+0x5a/0xd0 Dec 08 15:09:52 qt-kt kernel: ? schedule_timeout+0x10e/0x150 Dec 08 15:09:52 qt-kt kernel: ? __down_common+0x10b/0x1e0 Dec 08 15:09:52 qt-kt kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? _nv002458kms+0x2d/0x70 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? nvkms_kthread_q_callback+0xab/0x130 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? _main_loop+0x77/0x130 [nvidia_modeset 013906695b05c0c02abd73cae7e9a032761f6948] Dec 08 15:09:52 qt-kt kernel: ? kthread+0xd7/0x100 Dec 08 15:09:52 qt-kt kernel: ? kthread_complete_and_exit+0x20/0x20 Dec 08 15:09:52 qt-kt kernel: ? ret_from_fork+0x1f/0x30 Dec 08 15:09:52 qt-kt kernel: </TASK> Dec 08 15:09:52 qt-kt kernel: Modules linked in: isofs cdrom tun overlay rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rfcomm snd_seq_dummy snd_hrtimer snd_seq nf_nat_sip nft_objref af_packet nf_conntrack_sip nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter nvidia_drm(POE) nvidia_modeset(POE) intel_tcc_cooling x86_pkg_temp_thermal iwlmvm snd_ctl_led intel_powerclamp nvidia_uvm(POE) coretemp iTCO_wdt intel_pmc_bxt snd_hda_codec_realtek dell_laptop snd_hda_codec_generic iTCO_vendor_support kvm_intel snd_hda_codec_hdmi mei_pxp mei_wdt mei_hdcp uvcvideo ee1004 mac80211 snd_usb_audio ppdev snd_hda_intel intel_rapl_msr Dec 08 15:09:52 qt-kt kernel: videobuf2_vmalloc cdc_mbim cdc_wdm cdc_ncm videobuf2_memops libarc4 snd_usbmidi_lib dell_wmi processor_thermal_device_pci_legacy snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi dell_smm_hwmon kvm processor_thermal_device snd_rawmidi cdc_ether videobuf2_common iwlwifi dell_smbios usbnet snd_seq_device i2c_i801 snd_hda_codec irqbypass dcdbas pcspkr ledtrig_audio qcserial videodev sparse_keymap firmware_attributes_class intel_wmi_thunderbolt nvidia(POE) usb_wwan dell_wmi_descriptor efi_pstore mii e1000e mxm_wmi wmi_bmof i2c_smbus snd_hda_core usbserial cfg80211 mc processor_thermal_rfim snd_hwdep snd_pcm processor_thermal_mbox processor_thermal_rapl snd_timer mei_me snd intel_rapl_common nls_iso8859_1 intel_pch_thermal mei thermal intel_soc_dts_iosf ie31200_edac soundcore nls_cp437 vfat fat tiny_power_button parport_pc int3403_thermal parport dell_smo8800 int3402_thermal int3400_thermal acpi_thermal_rel int340x_thermal_zone intel_pmc_core dell_rbtn button Dec 08 15:09:52 qt-kt kernel: acpi_pad ac joydev nfsd auth_rpcgss nfs_acl lockd grace fuse configfs sunrpc dmi_sysfs ip_tables x_tables cmac algif_hash algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj hid_generic usbhid dm_crypt essiv authenc trusted asn1_encoder tee bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic rfkill crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel xhci_pci rtsx_pci_sdmmc xhci_pci_renesas mmc_core xhci_hcd aesni_intel nvme crypto_simd cryptd nvme_core usbcore rtsx_pci battery wmi video serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua br_netfilter bridge stp llc msr efivarfs Dec 08 15:09:52 qt-kt kernel: CR2: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: ---[ end trace 0000000000000000 ]--- Dec 08 15:09:52 qt-kt kernel: RIP: 0010:0x0 Dec 08 15:09:52 qt-kt kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Dec 08 15:09:52 qt-kt kernel: RSP: 0018:ffffba0f030efc90 EFLAGS: 00010203 Dec 08 15:09:52 qt-kt kernel: RAX: 0000000000000000 RBX: ffff984eca530378 RCX: 0000000000000004 Dec 08 15:09:52 qt-kt kernel: RDX: 0000000000000010 RSI: ffff984eca530378 RDI: ffffba0f030efa10 Dec 08 15:09:52 qt-kt kernel: RBP: ffff9848c7dafce0 R08: 0000000000000000 R09: 0000000000000040 Dec 08 15:09:52 qt-kt kernel: R10: ffff9845243c8008 R11: ffffba0f030efc00 R12: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: R13: 0000000000000000 R14: ffffffffc523feb0 R15: ffffffffc51376e0 Dec 08 15:09:52 qt-kt kernel: FS: 0000000000000000(0000) GS:ffff9854045c0000(0000) knlGS:0000000000000000 Dec 08 15:09:52 qt-kt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 08 15:09:52 qt-kt kernel: CR2: ffffffffffffffd6 CR3: 00000007d0410005 CR4: 00000000003706e0 Dec 08 15:09:52 qt-kt kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 08 15:09:52 qt-kt kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 b -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c14 --- Comment #14 from Bruno Friedmann <bruno@ioda-net.ch> --- With kernel 6.0.12 + nvidia 525.60.11 still same issue. I'm now able to reproduce 100% of time the crash, which lead me to believe it has something to do with DisplayPort Multi-Stream Transport. The architecture here is Laptop docked to Dell E-Plus dock, main DP goes to Dell 27" 4k P2715Q monitor. The second DP port on the monitor is linked to the second external. There's no issue if the monitor is used alone. If you activate MST Primary on this monitor to allow chaining to second, and you poweroff the second monitor the crash occur. I've seen that this driver has been touched since August https://lore.kernel.org/dri-devel/20220826092019.23151-1-stanislav.lisovskiy... Would it be related too ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Kernel 6.0.8 + nvidia >= |Kernel 6.0.8 + nvidia >= |525.53 + nosimplefb=1: |525.53 (displaylink MST): |crash on BUG: kernel NULL |crash on BUG: kernel NULL |pointer dereference, #PF: |pointer dereference, #PF: |supervisor instruction |supervisor instruction |fetch in kernel mode |fetch in kernel mode -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c15 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED --- Comment #15 from Bruno Friedmann <bruno@ioda-net.ch> --- on openSUSE Tumbleweed 20230112 Linux kernel 6.1.4-1-default x86_64 GNU/Linux, nvidia: 525.78.01 Qt: 5.15.8, KDE Frameworks: 5.101.0, Plasma: 5.26.5 There's no more crashes seen, and moreover the second screen is back and can be used. I guess the refactoring of all the DisplayPort is now mature again. I'm just closing it now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c16 --- Comment #16 from Stefan Dirsch <sndirsch@suse.com> --- Thanks for the update! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c17 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #17 from Bruno Friedmann <bruno@ioda-net.ch> --- It happen again :-( As the second monitor doesn't come back to life after resume the main one (both geing in suspend mode) I just power off/on the secondary, and the computer just freeze completely dbus-daemon[1102]: [system] Activating service name='org.kde.powerdevil.backlighthelper' requested by ':1.95' (uid=1502 pid=10149 comm="/usr/libexec/org_kde_powerdevil") (using servicehelper) Jan 19 10:42:14 dbus-daemon[1102]: [system] Successfully activated service 'org.kde.powerdevil.backlighthelper' Jan 19 10:42:28 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Jan 19 10:42:28 kernel: #PF: supervisor instruction fetch in kernel mode Jan 19 10:42:28 kernel: #PF: error_code(0x0010) - not-present page Jan 19 10:42:28 kernel: PGD 0 P4D 0 Jan 19 10:42:28 kernel: Oops: 0010 [#1] PREEMPT SMP PTI Jan 19 10:42:28 kernel: CPU: 2 PID: 1002 Comm: nvidia-modeset/ Tainted: P OE 6.1.6-1-default #1 openSUSE Tumbleweed 959c5df2a923a5b5173b76f0e1bbaa419fcea273 Jan 19 10:42:28 kernel: Hardware name: Dell Inc. Precision 7510/0YH43H, BIOS 1.30.3 11/17/2022 Jan 19 10:42:28 kernel: RIP: 0010:0x0 Jan 19 10:42:28 kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Jan 19 10:42:28 kernel: RSP: 0018:ffffbeb201107c90 EFLAGS: 00010207 Jan 19 10:42:28 kernel: RAX: 0000000000000000 RBX: ffff9ad7742bd378 RCX: 0000000000000004 Jan 19 10:42:28 kernel: RDX: 0000000000000010 RSI: ffff9ad7742bd378 RDI: ffffbeb201107a10 Jan 19 10:42:28 kernel: RBP: ffff9ad2dbd23860 R08: 0000000000000000 R09: 0000000080150012 Jan 19 10:42:28 kernel: R10: ffff9ad476738008 R11: ffffbeb201107c00 R12: 0000000000000000 Jan 19 10:42:28 kernel: R13: 0000000000000000 R14: ffffffffc48c5eb0 R15: ffffffffc47bc6d0 Jan 19 10:42:28 kernel: FS: 0000000000000000(0000) GS:ffff9ae304480000(0000) knlGS:0000000000000000 Jan 19 10:42:28 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 19 10:42:28 kernel: CR2: ffffffffffffffd6 CR3: 0000000ce8e10006 CR4: 00000000003706e0 Jan 19 10:42:28 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 19 10:42:28 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jan 19 10:42:28 kernel: Call Trace: Jan 19 10:42:28 kernel: <TASK> Jan 19 10:42:28 kernel: _nv001283kms+0x194/0x1a0 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? _nv001276kms+0x9a/0xd0 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? _nv001472kms+0x172/0x180 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? _nv001151kms+0xf3/0xa50 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? schedule+0x5a/0xd0 Jan 19 10:42:28 kernel: ? schedule_timeout+0x10e/0x150 Jan 19 10:42:28 kernel: ? __down_common+0x10b/0x1e0 Jan 19 10:42:28 kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? _nv002470kms+0x2d/0x70 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? nvkms_kthread_q_callback+0xa3/0x120 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? _main_loop+0x77/0x130 [nvidia_modeset aa464975ebb097967d1397948e3262a7f7d03928] Jan 19 10:42:28 kernel: ? kthread+0xd7/0x100 Jan 19 10:42:28 kernel: ? kthread_complete_and_exit+0x20/0x20 Jan 19 10:42:28 kernel: ? ret_from_fork+0x1f/0x30 Jan 19 10:42:28 kernel: </TASK> Jan 19 10:42:28 kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs nf_nat_sip nft_objref af_packet nf_conntrack_sip nft_fib_inet nft_fib_ipv> Jan 19 10:42:28 kernel: cdc_ncm snd_hda_intel snd_intel_dspcfg irqbypass libarc4 pcspkr dell_wmi efi_pstore snd_usb_audio snd_intel_sdw_acpi cdc_ether processor_thermal_device_pci_legacy snd_usbmidi_lib uvcvid> Jan 19 10:42:28 kernel: acpi_pad button joydev nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse configfs dmi_sysfs ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic usbhid dm_crypt essiv authen> Jan 19 10:42:28 kernel: CR2: 0000000000000000 Jan 19 10:42:28 kernel: ---[ end trace 0000000000000000 ]--- Jan 19 10:42:28 kernel: RIP: 0010:0x0 Jan 19 10:42:28 kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Jan 19 10:42:28 kernel: RSP: 0018:ffffbeb201107c90 EFLAGS: 00010207 Jan 19 10:42:28 kernel: RAX: 0000000000000000 RBX: ffff9ad7742bd378 RCX: 0000000000000004 Jan 19 10:42:28 kernel: RDX: 0000000000000010 RSI: ffff9ad7742bd378 RDI: ffffbeb201107a10 Jan 19 10:42:28 kernel: RBP: ffff9ad2dbd23860 R08: 0000000000000000 R09: 0000000080150012 Jan 19 10:42:28 kernel: R10: ffff9ad476738008 R11: ffffbeb201107c00 R12: 0000000000000000 Jan 19 10:42:28 kernel: R13: 0000000000000000 R14: ffffffffc48c5eb0 R15: ffffffffc47bc6d0 Jan 19 10:42:28 kernel: FS: 0000000000000000(0000) GS:ffff9ae304480000(0000) knlGS:0000000000000000 Jan 19 10:42:28 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 19 10:42:28 kernel: CR2: ffffffffffffffd6 CR3: 0000000ce8e10006 CR4: 00000000003706e0 Jan 19 10:42:28 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 19 10:42:28 kernel 6.1.6-1.1 nvidia 525.78.01-16.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c18 Stefan Dirsch <sndirsch@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |IN_PROGRESS --- Comment #18 from Stefan Dirsch <sndirsch@suse.com> --- Oh well. :-( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c19 Bruno Friedmann <bruno@ioda-net.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |WONTFIX --- Comment #19 from Bruno Friedmann <bruno@ioda-net.ch> --- After having so much troubles with the displaylink MST technology and maybe my hardware is starting to be old (2015) I decided to stop using this, remove the docking station and use now only one external monitor. So I'm not able to reproduce this configuration and will close the ticket as no fix. Feel free to update if you prefer another status. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1205774 http://bugzilla.opensuse.org/show_bug.cgi?id=1205774#c20 --- Comment #20 from Stefan Dirsch <sndirsch@suse.com> --- (In reply to Bruno Friedmann from comment #19)
After having so much troubles with the displaylink MST technology and maybe my hardware is starting to be old (2015) I decided to stop using this, remove the docking station and use now only one external monitor.
So I'm not able to reproduce this configuration and will close the ticket as no fix. Feel free to update if you prefer another status.
Thanks for updating this ticket! -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com