Bug ID 1199355
Summary aarch64 oopses on raspberry pi4 KVM host (OBS worker)
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware aarch64
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-bugs@opensuse.org
Reporter seife@novell.slipkontur.de
QA Contact qa-bugs@suse.de
CC guillaume.gardet@arm.com
Found By ---
Blocker ---

I have two raspberry pi 4 OBS workers for packman build service running, both
with 8GB RAM.
One of them (configured to run one build worker with 4 jobs/7,5GB RAM) recently
started crashing.
First I guessed it was related to overclocking, but reducing the overclocking
did not change anything.

It always crashes towards the end of a rather long, big build job (kodi), with
the following Oops message:

[ 8891.760241] Unable to handle kernel access to user memory outside uaccess
routines at virtual address 0000000000000134
[ 8891.771989] Mem abort info:
[ 8891.775574]   ESR = 0x96000004
[ 8891.779609]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 8891.785850]   SET = 0, FnV = 0
[ 8891.789772]   EA = 0, S1PTW = 0
[ 8891.793819]   FSC = 0x04: level 0 translation fault
[ 8891.799591] Data abort info:
[ 8891.803330]   ISV = 0, ISS = 0x00000004
[ 8891.808174]   CM = 0, WnR = 0
[ 8891.811900] user pgtable: 4k pages, 48-bit VAs, pgdp=000000015e6f2000
[ 8891.819194] [0000000000000134] pgd=0000000000000000, p4d=0000000000000000
[ 8891.826833] Internal error: Oops: 96000004 [#1] SMP
[ 8891.832496] Modules linked in: loop tun af_packet iscsi_ibft
iscsi_boot_sysfs nls_iso8859_1 xfs nls_cp437 vfat fat btsdio bluetooth
libcrc32c ecdh_generic cpufreq_dt brcmfmac broadcom brcmutil bcm_phy_lib
cfg80211 rfkill bcm2711_thermal raspberrypi_cpufreq iproc_rng200 genet
mdio_bcm_unimac leds_gpio uio_pdrv_genirq uio nvmem_rmem efi_pstore drm fuse
ip_tables x_tables ext4 mbcache jbd2 uas usb_storage xhci_pci xhci_pci_renesas
xhci_hcd usbcore usb_common raspberrypi_hwmon crct10dif_ce gpio_raspberrypi_exp
clk_raspberrypi bcm2835_wdt bcm2835_dma virt_dma sdhci_iproc sdhci_pltfm sdhci
mmc_core gpio_regulator pcie_brcmstb phy_generic fixed sg efivarfs
[ 8891.891666] CPU: 0 PID: 6193 Comm: qemu-system-aar Not tainted
5.17.4-1-default #1 openSUSE Tumbleweed
3c702964721983dd61af41a493a0755b03b09b96
[ 8891.905435] Hardware name: Unknown Unknown Product/Unknown Product, BIOS
2022.04 04/01/2022
[ 8891.914612] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 8891.922381] pc : blk_mq_submit_bio+0x178/0x680
[ 8891.927612] lr : blk_mq_submit_bio+0x15c/0x680
[ 8891.932830] sp : ffff800009d62d90
[ 8891.936894] x29: ffff800009d62d90 x28: fffffcd369257208 x27:
fffffcd369257200
[ 8891.944844] x26: ffff800009d63128 x25: 0000000000000001 x24:
ffffa8dc6edaaae0
[ 8891.952793] x23: 0000000000000000 x22: ffff34da90f7c8e0 x21:
ffff34da8680c1a0
[ 8891.960740] x20: ffff34da90fbf300 x19: ffff34da86849000 x18:
0000000000000003
[ 8891.968686] x17: 0000000000000001 x16: ffffa8dc6edd8840 x15:
0000000000000003
[ 8891.976631] x14: 0000000000000000 x13: 0000000000000038 x12:
0000000000000000
[ 8891.984578] x11: 0000000000000040 x10: 0000000000001b60 x9 :
ffffa8dc6f0228c0
[ 8891.992526] x8 : ffff34da80917b00 x7 : 0000000000000000 x6 :
0000000000000001
[ 8892.000472] x5 : 00000000410fd080 x4 : 0000000000000000 x3 :
ffff8bff0e894000
[ 8892.008419] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
0000000004000001
[ 8892.016366] Call trace:
[ 8892.019549]  blk_mq_submit_bio+0x178/0x680
[ 8892.024417]  __submit_bio+0x118/0x180
[ 8892.028843]  submit_bio_noacct+0x1ec/0x240
[ 8892.033707]  submit_bio+0x64/0x140
[ 8892.037864]  __swap_writepage+0x1b4/0x4a0
[ 8892.042641]  swap_writepage+0x50/0x114
[ 8892.047150]  pageout+0x104/0x340
[ 8892.051133]  shrink_page_list+0x6b0/0xe84
[ 8892.055909]  shrink_lruvec+0x584/0xb40
[ 8892.060419]  shrink_node+0x3e8/0x774
[ 8892.064756]  do_try_to_free_pages+0xec/0x580
[ 8892.069796]  try_to_free_pages+0x118/0x210
[ 8892.074660]  __alloc_pages+0x4a8/0xd04
[ 8892.079172]  alloc_pages+0xb8/0x16c
[ 8892.083425]  folio_alloc+0x28/0x64
[ 8892.087583]  filemap_alloc_folio+0xd8/0xf0
[ 8892.092447]  page_cache_ra_unbounded+0xac/0x220
[ 8892.097751]  ondemand_readahead+0x12c/0x2b0
[ 8892.102701]  page_cache_async_ra+0xc4/0xe0
[ 8892.107565]  filemap_get_pages+0x4b8/0x684
[ 8892.112430]  filemap_read+0xc4/0x30c
[ 8892.116763]  generic_file_read_iter+0x114/0x1b0
[ 8892.122068]  xfs_file_buffered_read+0xb4/0xe0 [xfs
cbfa202397a50e39d50211b94f5332ed2c6954f4]
[ 8892.131638]  xfs_file_read_iter+0xa8/0x124 [xfs
cbfa202397a50e39d50211b94f5332ed2c6954f4]
[ 8892.140936]  io_read+0xe4/0x3dc
[ 8892.144834]  io_issue_sqe+0x218/0x1aa0
[ 8892.149346]  io_submit_sqes+0x28c/0x15f4
[ 8892.154033]  __arm64_sys_io_uring_enter+0x4f0/0x76c
[ 8892.159689]  invoke_syscall+0x78/0x100
[ 8892.164202]  el0_svc_common.constprop.0+0x180/0x184
[ 8892.169859]  do_el0_svc+0x34/0x9c
[ 8892.173931]  el0_svc+0x30/0x100
[ 8892.177830]  el0t_64_sync_handler+0xa4/0x130
[ 8892.182871]  el0t_64_sync+0x1a4/0x1a8
[ 8892.187300] Code: 37c00060 72001c1f 1a9f17e1 f9400a62 (79426842) 
[ 8892.194187] ---[ end trace 0000000000000000 ]---

After that, I most of the time can still log in to the box via ssh, examine
dmesg etc, but no writing to storage is possible anymore (sync, ... all hangs).
Usually, I then reboot the box via "echo b > /proc/syrq-trigger"

The raspberry pi 4 is running off a USB3-connected SATA ssd, it has a good
passive-cooling case (CPU temperature always below 70���C) and the original
raspberry pi USB-C power supply.

Running up to date tumbleweed/ARM

Another raspberry pi 4 8GB, set up tu run two workers with 2 jobs / 3,7GB each
does run fine without crashes or issues.


You are receiving this mail because: