Bug ID 1159304
Summary Soft lockup in flush_tlb_mm_range
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware x86-64
OS Other
Status NEW
Severity Critical
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter davidalro@gmail.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/79.0.3945.79 Safari/537.36
Build Identifier: 

Hi.

I have openSUSE Tubleweed (20191214) with the NVIDIA Drivers for my GTX1060. My
CPU is a AMD Ryzen 1700X.

I have been experiencing random soft lockups where everything freezes. I
enabled the systemd journal and I captured the following trace:

Dec 16 18:29:46 david-tumble kernel: NVRM: GPU at PCI:0000:07:00: GPU-<snip>
Dec 16 18:29:46 david-tumble kernel: NVRM: GPU Board Serial Number: 
Dec 16 18:29:46 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=1670,
Head 00000000 Count 00007d7f
Dec 16 18:29:46 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=1670,
Head 00000001 Count 00007d7a
Dec 16 18:29:47 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 8, pid=1670,
Channel 00000033
Dec 16 18:29:56 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0,
Head 00000000 Count 00007da0
Dec 16 18:29:56 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0,
Head 00000001 Count 00007d9b
Dec 16 18:30:04 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0,
Head 00000000 Count 00007da1
Dec 16 18:30:04 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 16, pid=0,
Head 00000001 Count 00007d9c
Dec 16 18:30:10 david-tumble kernel: NVRM: Xid (PCI:0000:07:00): 8, pid=0,
Channel 00000010
Dec 16 18:30:14 david-tumble kernel: watchdog: BUG: soft lockup - CPU#0 stuck
for 22s! [QSGRenderThread:6352]
Dec 16 18:30:14 david-tumble kernel: Modules linked in: af_packet fuse
xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle
ip6table_raw ip6tab>
Dec 16 18:30:14 david-tumble kernel:  gpio_amdpt gpio_generic acpi_cpufreq
nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE)
nvidia_uvm(OE) nvidia(POE) ipmi_msghandler drm_kms_helper syscopyarea
sysfillrect sysimgblt >
Dec 16 18:30:14 david-tumble kernel: CPU: 0 PID: 6352 Comm: QSGRenderThread
Tainted: P           OE     5.3.12-1-default #1 openSUSE Tumbleweed
(unreleased)
Dec 16 18:30:14 david-tumble kernel: Hardware name: System manufacturer System
Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
Dec 16 18:30:14 david-tumble kernel: RIP:
0010:smp_call_function_many+0x21a/0x280
Dec 16 18:30:14 david-tumble kernel: Code: e8 8b 5b 7e 00 3b 05 19 57 22 01 89
c7 0f 83 7d fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 a9 f8 8e 8b 41 18 a8 01
74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c9 48 c7 c2 40 9e 16 8f 4c 89 fe 89 df
Dec 16 18:30:14 david-tumble kernel: RSP: 0018:ffffb7408355fd10 EFLAGS:
00000202 ORIG_RAX: ffffffffffffff13
Dec 16 18:30:14 david-tumble kernel: RAX: 0000000000000003 RBX:
ffff9c868ea2db00 RCX: ffff9c868ec72260
Dec 16 18:30:14 david-tumble kernel: RDX: 0000000000000001 RSI:
0000000000000000 RDI: 0000000000000009
Dec 16 18:30:14 david-tumble kernel: RBP: ffffffff8de86bf0 R08:
ffff9c868ea2db08 R09: ffff9c868ea2db48
Dec 16 18:30:14 david-tumble kernel: R10: ffff9c868ea2db08 R11:
0000000000000008 R12: ffff9c868ea2c4c0
Dec 16 18:30:14 david-tumble kernel: R13: ffff9c868ea2db08 R14:
0000000000000001 R15: 0000000000000200
Dec 16 18:30:14 david-tumble kernel: FS:  00007f5681951700(0000)
GS:ffff9c868ea00000(0000) knlGS:0000000000000000
Dec 16 18:30:14 david-tumble kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Dec 16 18:30:14 david-tumble kernel: CR2: 00002a85909f4000 CR3:
000000032990a000 CR4: 00000000003406f0
Dec 16 18:30:14 david-tumble kernel: Call Trace:
Dec 16 18:30:14 david-tumble kernel:  flush_tlb_mm_range+0xb3/0xf0
Dec 16 18:30:14 david-tumble kernel:  tlb_flush_mmu+0xa4/0x160
Dec 16 18:30:14 david-tumble kernel:  tlb_finish_mmu+0x3d/0x70
Dec 16 18:30:14 david-tumble kernel:  unmap_region+0xdc/0x110
Dec 16 18:30:14 david-tumble kernel:  ? __vma_rb_erase+0x132/0x270
Dec 16 18:30:14 david-tumble kernel:  __do_munmap+0x279/0x490
Dec 16 18:30:14 david-tumble kernel:  __vm_munmap+0x67/0xc0
Dec 16 18:30:14 david-tumble kernel:  __x64_sys_munmap+0x28/0x30
Dec 16 18:30:14 david-tumble kernel:  do_syscall_64+0x6e/0x200
Dec 16 18:30:14 david-tumble kernel:  entry_SYSCALL_64_after_hwframe+0x49/0xbe
Dec 16 18:30:14 david-tumble kernel: RIP: 0033:0x7f569ca1b297
Dec 16 18:30:14 david-tumble kernel: Code: 38 eb 85 48 8b 15 e9 5b 0c 00 f7 d8
64 89 02 48 c7 c0 ff ff ff ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00
00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b9 5b 0c 00 f7 d8 64 89 01 48
Dec 16 18:30:14 david-tumble kernel: RSP: 002b:00007f56819502d8 EFLAGS:
00000206 ORIG_RAX: 000000000000000b
Dec 16 18:30:14 david-tumble kernel: RAX: ffffffffffffffda RBX:
00007f5670377fe8 RCX: 00007f569ca1b297
Dec 16 18:30:14 david-tumble kernel: RDX: 000000000000000f RSI:
00000000007ea000 RDI: 00007f5637015000
Dec 16 18:30:14 david-tumble kernel: RBP: 00007f5670377fd0 R08:
00007f5637015000 R09: 0000000000000000
Dec 16 18:30:14 david-tumble kernel: R10: 00000000c1d00011 R11:
0000000000000206 R12: 00007f5637015010
Dec 16 18:30:14 david-tumble kernel: R13: 00007f5670377fe8 R14:
0000000000000000 R15: 0000000000000001

Notice the suspicious Xid 16 (Driver hung) nvidia message at the start. I
cannot say whether this is because the kernel hung first or the driver did.
After that I get some soft lockups in flush_tlb_mm_range and I have to reboot.
I have not found a way to reproduce this reliably, it does not look to me that
I have to do anything special.

Reproducible: Couldn't Reproduce

Steps to Reproduce:
Normal use.
Actual Results:  
Soft lockup, frozen X.

Expected Results:  
Not crash :)

I am attaching my xorg.conf in case it is relevant.
Please tell me if there is anything I can do to provide more info.


You are receiving this mail because: