Bug ID 1185516
Summary Radeon driver from xf86-video-ati-19.1.0-3.1.x86_64 crashes, black screen, video issues
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware x86-64
OS openSUSE Tumbleweed
Status NEW
Severity Major
Priority P5 - None
Component X.Org
Assignee gfx-bugs@suse.de
Reporter bob@muhlenberg.edu
QA Contact gfx-bugs@suse.de
Found By ---
Blocker ---

Created attachment 848930 [details]
dmesg showing exceptipon in radeon driver code

Hardware is HPE DL-390 Gen 8 with two Radeon HD 6450 boards. this setup has
been working with Tumbleweed for a couple years + and no issues with previous
"zypper dup" updates which happen periodically.   Performed update today and
machine boots into GRUB, and the usual tumbleweed "spinner" on both monitors,
etc.

Then when display manager starts, one card presents no video ( no HDMI signal )
the other, which is the primary display, presents a black screen, no cursor /
pointer.

At this point console is unresponsive.  Ctrl-Backspace x 2 does not restart the
GUI.  Nor does this generate the system bell "Beep!"   On sshing into the
system there is a defunct Xorg.bin process which usually can be hard killed to
get a crippled TTY on the primary video display.  In this state text is
displayed, but there is no local echo of characaters being typed.   e.g. you
can type "ls" and a directory listing without carriage returns is displayed.  
using "clear" to reset the terminal does clear the screen, but the non-echo
issue remains.

Sometimes the process is not in a defunct state, e.g. 

/usr/bin/Xorg.bin :1 vt1 -keeptty -auth /root/.serverauth.3742 -nolisten tcp
-nolisten tcp

which is not killable.   A TTY appears, but no cursor, and is not responsive. 
To regain a TTY on the console a reboot --force is necessary as a normal reboot
/ halt / shutdown etc. never completes - power cycling the system is now
necessary.

On setting default runlevel to 3, and rebooting, the syste, [resents a normal
TTY console.  on logging in an using startx, bypassing the GUI login, etc, the
same issue is observed.   The one screen loses HDMI signal, the other presents
a black screen.

Anyway, that's the narrative version.  On to the logs:

dmesg often reports a crash in the radeon driver as follows:

[   61.341857] radeon 0000:04:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=io+mem
[   61.341862] radeon 0000:21:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[   61.995557] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   61.995569] #PF: supervisor read access in kernel mode
[   61.995572] #PF: error_code(0x0000) - not-present page
[   61.995574] PGD 0 P4D 0 
[   61.995578] Oops: 0000 [#1] SMP PTI
[   61.995582] CPU: 19 PID: 2777 Comm: Xorg.bin Tainted: G S        I      
5.12.0-1-default #1 openSUSE Tumbleweed
[   61.995587] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 01/22/2018
[   61.995590] RIP: 0010:radeon_gart_bind+0x3c/0xf0 [radeon]
[   61.995639] Code: 08 80 bf 98 04 00 00 00 0f 84 b3 00 00 00 c1 ee 0c 48 89
fd 45 89 ce 49 89 cf 8d 04 32 89 f3 4d 89 c5 89 44 24 04 85 d2 7e 66 <49> 8b 17
48 8b 85 88 04 00 00 41 89 dc 44 89 f6 4a 89 14 e0 48 8b
[   61.995643] RSP: 0018:ffffaad9495cfa40 EFLAGS: 00010202
[   61.995646] RAX: 0000000000000a8d RBX: 00000000000002a4 RCX:
0000000000000000
[   61.995648] RDX: 00000000000007e9 RSI: 00000000000002a4 RDI:
ffff96e6c9cdc000
[   61.995650] RBP: ffff96e6c9cdc000 R08: ffff96df4f358000 R09:
000000000000000f
[   61.995652] R10: 0000000000000000 R11: fffff053843cd708 R12:
ffffaad9495cfb50
[   61.995654] R13: ffff96df4f358000 R14: 000000000000000f R15:
0000000000000000
[   61.995656] FS:  00007f8bbc53a940(0000) GS:ffff96e69f8c0000(0000)
knlGS:0000000000000000
[   61.995659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.995661] CR2: 0000000000000000 CR3: 0000000201bfc003 CR4:
00000000001706e0
[   61.995664] Call Trace:
[   61.995668]  radeon_bo_move+0x374/0x6a0 [radeon]
[   61.995704]  ttm_bo_handle_move_mem+0x90/0x170 [ttm]
[   61.995711]  ttm_bo_validate+0x14d/0x180 [ttm]
[   61.995717]  ttm_bo_init_reserved+0x18e/0x310 [ttm]
[   61.995722]  ttm_bo_init+0x64/0xd0 [ttm]
[   61.995726]  ? radeon_update_memory_usage.isra.0+0x40/0x40 [radeon]
[   61.995748]  radeon_bo_create+0x184/0x200 [radeon]
[   61.995770]  ? radeon_update_memory_usage.isra.0+0x40/0x40 [radeon]
[   61.995791]  radeon_gem_prime_import_sg_table+0x5e/0xf0 [radeon]
[   61.995824]  drm_gem_prime_import_dev.part.0+0x63/0xc0 [drm]
[   61.995863]  drm_gem_prime_fd_to_handle+0x196/0x1d0 [drm]
[   61.995883]  ? drm_prime_destroy_file_private+0x20/0x20 [drm]
[   61.995902]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[   61.995920]  drm_ioctl+0x202/0x3b0 [drm]
[   61.995937]  ? drm_prime_destroy_file_private+0x20/0x20 [drm]
[   61.995957]  ? new_sync_write+0x11c/0x1b0
[   61.995963]  radeon_drm_ioctl+0x49/0x80 [radeon]
[   61.995986]  __x64_sys_ioctl+0x83/0xb0
[   61.995992]  do_syscall_64+0x33/0x80
[   61.995998]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   61.996004] RIP: 0033:0x7f8bbca520bb
[   61.996008] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0
41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 85 bd 0c 00 f7 d8 64 89 01 48
[   61.996012] RSP: 002b:00007ffefa013de8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[   61.996016] RAX: ffffffffffffffda RBX: 00007ffefa013e2c RCX:
00007f8bbca520bb
[   61.996019] RDX: 00007ffefa013e2c RSI: 00000000c00c642e RDI:
0000000000000015
[   61.996022] RBP: 00000000c00c642e R08: 00007ffefa013ed0 R09:
00007f8bbcb1ea60
[   61.996025] R10: 00007f8bbb0012a0 R11: 0000000000000246 R12:
00005640a0d77fc0
[   61.996028] R13: 0000000000000015 R14: 0000000000100000 R15:
00007ffefa0145e0
[   61.996032] Modules linked in: af_packet nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6
nf_log_ipv4 nf_log_common nft_log nft_ct nft_chain_nat nf_tables ebtable_nat
ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
iptable_mangle iptable_raw iptable_security bridge stp llc iscsi_ibft
iscsi_boot_sysfs ip_set nfnetlink ebtable_filter ebtables rfkill
ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter dmi_sysfs
ocrdma ib_uverbs ib_core intel_rapl_msr iTCO_wdt intel_pmc_bxt
iTCO_vendor_support ipmi_ssif intel_rapl_common sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass pcspkr joydev be2net hpwdt
hpilo lpc_ich snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg
snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep
[   61.996073]  snd_pcm ioatdma snd_timer tg3 dca snd acpi_ipmi libphy
soundcore ipmi_si thermal ipmi_devintf ipmi_msghandler tiny_power_button button
fuse configfs hid_logitech_hidpp hid_logitech_dj hid_generic usbhid ata_generic
radeon i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops cec rc_core uhci_hcd ehci_pci crct10dif_pclmul
crc32_pclmul ehci_hcd crc32c_intel ghash_clmulni_intel drm aesni_intel usbcore
crypto_simd cryptd hpsa ata_piix serio_raw scsi_transport_sas sg dm_multipath
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr
[   61.996122] CR2: 0000000000000000
[   61.996125] ---[ end trace 65e096e6c12aea74 ]---
[   62.006624] RIP: 0010:radeon_gart_bind+0x3c/0xf0 [radeon]
[   62.006681] Code: 08 80 bf 98 04 00 00 00 0f 84 b3 00 00 00 c1 ee 0c 48 89
fd 45 89 ce 49 89 cf 8d 04 32 89 f3 4d 89 c5 89 44 24 04 85 d2 7e 66 <49> 8b 17
48 8b 85 88 04 00 00 41 89 dc 44 89 f6 4a 89 14 e0 48 8b
[   62.006686] RSP: 0018:ffffaad9495cfa40 EFLAGS: 00010202
[   62.006689] RAX: 0000000000000a8d RBX: 00000000000002a4 RCX:
0000000000000000
[   62.006691] RDX: 00000000000007e9 RSI: 00000000000002a4 RDI:
ffff96e6c9cdc000
[   62.006694] RBP: ffff96e6c9cdc000 R08: ffff96df4f358000 R09:
000000000000000f
[   62.006697] R10: 0000000000000000 R11: fffff053843cd708 R12:
ffffaad9495cfb50
[   62.006699] R13: ffff96df4f358000 R14: 000000000000000f R15:
0000000000000000
[   62.006702] FS:  00007f8bbc53a940(0000) GS:ffff96e69f8c0000(0000)
knlGS:0000000000000000
[   62.006705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.006708] CR2: 0000000000000000 CR3: 0000000201bfc003 CR4:
00000000001706e0

( Will add full log files )

/var/log/Xorg.N.log seem unremarkable and does not show an error.   However its
not clear if it actually gets to log anything after the driver croaks.

During troubleshooting, during one attempt, all other X drivers except radeon,
were disabled, and the same issue occured, so it did not appear to be an issue
with X probing for drivers. 

Also we tried removing one of the two cards. Issue persisted with one card
present.   Also tried swapping the cards.  The issue persisted with any
combination of cards or using them individually.  Cards themselves are in
risers, we swapped the risers too.

We have an identical server without radeon cards and X has no issue with the
latest code from zypper dup.

( Gotta go, will upload X logs soon or anything else someone wants. )


You are receiving this mail because: