[Bug 1159304] New: Soft lockup in flush_tlb_mm_range
http://bugzilla.opensuse.org/show_bug.cgi?id=1159304 Bug ID: 1159304 Summary: Soft lockup in flush_tlb_mm_range Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: davidalro@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36 Build Identifier: Hi. I have openSUSE Tubleweed (20191214) with the NVIDIA Drivers for my GTX1060. My CPU is a AMD Ryzen 1700X. I have been experiencing random soft lockups where everything freezes. I enabled the systemd journal and I captured the following trace: Dec 16 18:29:46 david-tumble kernel: NVRM: GPU at PCI:0000:07:00: GPU-<snip> Dec 16 18:29:46 david-tumble kernel: NVRM: GPU Board Serial Number: Dec 16 18:29:46 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=1670, Head 00000000 Count 00007d7f Dec 16 18:29:46 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=1670, Head 00000001 Count 00007d7a Dec 16 18:29:47 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 8, pid=1670, Channel 00000033 Dec 16 18:29:56 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0, Head 00000000 Count 00007da0 Dec 16 18:29:56 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0, Head 00000001 Count 00007d9b Dec 16 18:30:04 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0, Head 00000000 Count 00007da1 Dec 16 18:30:04 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0, Head 00000001 Count 00007d9c Dec 16 18:30:10 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 8, pid=0, Channel 00000010 Dec 16 18:30:14 david-tumble kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [QSGRenderThread:6352] Dec 16 18:30:14 david-tumble kernel: Modules linked in: af_packet fuse xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6tab> Dec 16 18:30:14 david-tumble kernel: gpio_amdpt gpio_generic acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) ipmi_msghandler drm_kms_helper syscopyarea sysfillrect sysimgblt > Dec 16 18:30:14 david-tumble kernel: CPU: 0 PID: 6352 Comm: QSGRenderThread Tainted: P OE 5.3.12-1-default #1 openSUSE Tumbleweed (unreleased) Dec 16 18:30:14 david-tumble kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018 Dec 16 18:30:14 david-tumble kernel: RIP: 0010:smp_call_function_many+0x21a/0x280 Dec 16 18:30:14 david-tumble kernel: Code: e8 8b 5b 7e 00 3b 05 19 57 22 01 89 c7 0f 83 7d fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 a9 f8 8e 8b 41 18 a8 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c9 48 c7 c2 40 9e 16 8f 4c 89 fe 89 df Dec 16 18:30:14 david-tumble kernel: RSP: 0018:ffffb7408355fd10 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Dec 16 18:30:14 david-tumble kernel: RAX: 0000000000000003 RBX: ffff9c868ea2db00 RCX: ffff9c868ec72260 Dec 16 18:30:14 david-tumble kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000009 Dec 16 18:30:14 david-tumble kernel: RBP: ffffffff8de86bf0 R08: ffff9c868ea2db08 R09: ffff9c868ea2db48 Dec 16 18:30:14 david-tumble kernel: R10: ffff9c868ea2db08 R11: 0000000000000008 R12: ffff9c868ea2c4c0 Dec 16 18:30:14 david-tumble kernel: R13: ffff9c868ea2db08 R14: 0000000000000001 R15: 0000000000000200 Dec 16 18:30:14 david-tumble kernel: FS: 00007f5681951700(0000) GS:ffff9c868ea00000(0000) knlGS:0000000000000000 Dec 16 18:30:14 david-tumble kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 16 18:30:14 david-tumble kernel: CR2: 00002a85909f4000 CR3: 000000032990a000 CR4: 00000000003406f0 Dec 16 18:30:14 david-tumble kernel: Call Trace: Dec 16 18:30:14 david-tumble kernel: flush_tlb_mm_range+0xb3/0xf0 Dec 16 18:30:14 david-tumble kernel: tlb_flush_mmu+0xa4/0x160 Dec 16 18:30:14 david-tumble kernel: tlb_finish_mmu+0x3d/0x70 Dec 16 18:30:14 david-tumble kernel: unmap_region+0xdc/0x110 Dec 16 18:30:14 david-tumble kernel: ? __vma_rb_erase+0x132/0x270 Dec 16 18:30:14 david-tumble kernel: __do_munmap+0x279/0x490 Dec 16 18:30:14 david-tumble kernel: __vm_munmap+0x67/0xc0 Dec 16 18:30:14 david-tumble kernel: __x64_sys_munmap+0x28/0x30 Dec 16 18:30:14 david-tumble kernel: do_syscall_64+0x6e/0x200 Dec 16 18:30:14 david-tumble kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe Dec 16 18:30:14 david-tumble kernel: RIP: 0033:0x7f569ca1b297 Dec 16 18:30:14 david-tumble kernel: Code: 38 eb 85 48 8b 15 e9 5b 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b9 5b 0c 00 f7 d8 64 89 01 48 Dec 16 18:30:14 david-tumble kernel: RSP: 002b:00007f56819502d8 EFLAGS: 00000206 ORIG_RAX: 000000000000000b Dec 16 18:30:14 david-tumble kernel: RAX: ffffffffffffffda RBX: 00007f5670377fe8 RCX: 00007f569ca1b297 Dec 16 18:30:14 david-tumble kernel: RDX: 000000000000000f RSI: 00000000007ea000 RDI: 00007f5637015000 Dec 16 18:30:14 david-tumble kernel: RBP: 00007f5670377fd0 R08: 00007f5637015000 R09: 0000000000000000 Dec 16 18:30:14 david-tumble kernel: R10: 00000000c1d00011 R11: 0000000000000206 R12: 00007f5637015010 Dec 16 18:30:14 david-tumble kernel: R13: 00007f5670377fe8 R14: 0000000000000000 R15: 0000000000000001 Notice the suspicious Xid 16 (Driver hung) nvidia message at the start. I cannot say whether this is because the kernel hung first or the driver did. After that I get some soft lockups in flush_tlb_mm_range and I have to reboot. I have not found a way to reproduce this reliably, it does not look to me that I have to do anything special. Reproducible: Couldn't Reproduce Steps to Reproduce: Normal use. Actual Results: Soft lockup, frozen X. Expected Results: Not crash :) I am attaching my xorg.conf in case it is relevant. Please tell me if there is anything I can do to provide more info. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1159304 http://bugzilla.opensuse.org/show_bug.cgi?id=1159304#c1 --- Comment #1 from David Álvarez <davidalro@gmail.com> --- Created attachment 826222 --> http://bugzilla.opensuse.org/attachment.cgi?id=826222&action=edit X.org Config -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com