[Bug 1199355] New: aarch64 oopses on raspberry pi4 KVM host (OBS worker)
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355 Bug ID: 1199355 Summary: aarch64 oopses on raspberry pi4 KVM host (OBS worker) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: aarch64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: seife@novell.slipkontur.de QA Contact: qa-bugs@suse.de CC: guillaume.gardet@arm.com Found By: --- Blocker: --- I have two raspberry pi 4 OBS workers for packman build service running, both with 8GB RAM. One of them (configured to run one build worker with 4 jobs/7,5GB RAM) recently started crashing. First I guessed it was related to overclocking, but reducing the overclocking did not change anything. It always crashes towards the end of a rather long, big build job (kodi), with the following Oops message: [ 8891.760241] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000134 [ 8891.771989] Mem abort info: [ 8891.775574] ESR = 0x96000004 [ 8891.779609] EC = 0x25: DABT (current EL), IL = 32 bits [ 8891.785850] SET = 0, FnV = 0 [ 8891.789772] EA = 0, S1PTW = 0 [ 8891.793819] FSC = 0x04: level 0 translation fault [ 8891.799591] Data abort info: [ 8891.803330] ISV = 0, ISS = 0x00000004 [ 8891.808174] CM = 0, WnR = 0 [ 8891.811900] user pgtable: 4k pages, 48-bit VAs, pgdp=000000015e6f2000 [ 8891.819194] [0000000000000134] pgd=0000000000000000, p4d=0000000000000000 [ 8891.826833] Internal error: Oops: 96000004 [#1] SMP [ 8891.832496] Modules linked in: loop tun af_packet iscsi_ibft iscsi_boot_sysfs nls_iso8859_1 xfs nls_cp437 vfat fat btsdio bluetooth libcrc32c ecdh_generic cpufreq_dt brcmfmac broadcom brcmutil bcm_phy_lib cfg80211 rfkill bcm2711_thermal raspberrypi_cpufreq iproc_rng200 genet mdio_bcm_unimac leds_gpio uio_pdrv_genirq uio nvmem_rmem efi_pstore drm fuse ip_tables x_tables ext4 mbcache jbd2 uas usb_storage xhci_pci xhci_pci_renesas xhci_hcd usbcore usb_common raspberrypi_hwmon crct10dif_ce gpio_raspberrypi_exp clk_raspberrypi bcm2835_wdt bcm2835_dma virt_dma sdhci_iproc sdhci_pltfm sdhci mmc_core gpio_regulator pcie_brcmstb phy_generic fixed sg efivarfs [ 8891.891666] CPU: 0 PID: 6193 Comm: qemu-system-aar Not tainted 5.17.4-1-default #1 openSUSE Tumbleweed 3c702964721983dd61af41a493a0755b03b09b96 [ 8891.905435] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.04 04/01/2022 [ 8891.914612] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 8891.922381] pc : blk_mq_submit_bio+0x178/0x680 [ 8891.927612] lr : blk_mq_submit_bio+0x15c/0x680 [ 8891.932830] sp : ffff800009d62d90 [ 8891.936894] x29: ffff800009d62d90 x28: fffffcd369257208 x27: fffffcd369257200 [ 8891.944844] x26: ffff800009d63128 x25: 0000000000000001 x24: ffffa8dc6edaaae0 [ 8891.952793] x23: 0000000000000000 x22: ffff34da90f7c8e0 x21: ffff34da8680c1a0 [ 8891.960740] x20: ffff34da90fbf300 x19: ffff34da86849000 x18: 0000000000000003 [ 8891.968686] x17: 0000000000000001 x16: ffffa8dc6edd8840 x15: 0000000000000003 [ 8891.976631] x14: 0000000000000000 x13: 0000000000000038 x12: 0000000000000000 [ 8891.984578] x11: 0000000000000040 x10: 0000000000001b60 x9 : ffffa8dc6f0228c0 [ 8891.992526] x8 : ffff34da80917b00 x7 : 0000000000000000 x6 : 0000000000000001 [ 8892.000472] x5 : 00000000410fd080 x4 : 0000000000000000 x3 : ffff8bff0e894000 [ 8892.008419] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000004000001 [ 8892.016366] Call trace: [ 8892.019549] blk_mq_submit_bio+0x178/0x680 [ 8892.024417] __submit_bio+0x118/0x180 [ 8892.028843] submit_bio_noacct+0x1ec/0x240 [ 8892.033707] submit_bio+0x64/0x140 [ 8892.037864] __swap_writepage+0x1b4/0x4a0 [ 8892.042641] swap_writepage+0x50/0x114 [ 8892.047150] pageout+0x104/0x340 [ 8892.051133] shrink_page_list+0x6b0/0xe84 [ 8892.055909] shrink_lruvec+0x584/0xb40 [ 8892.060419] shrink_node+0x3e8/0x774 [ 8892.064756] do_try_to_free_pages+0xec/0x580 [ 8892.069796] try_to_free_pages+0x118/0x210 [ 8892.074660] __alloc_pages+0x4a8/0xd04 [ 8892.079172] alloc_pages+0xb8/0x16c [ 8892.083425] folio_alloc+0x28/0x64 [ 8892.087583] filemap_alloc_folio+0xd8/0xf0 [ 8892.092447] page_cache_ra_unbounded+0xac/0x220 [ 8892.097751] ondemand_readahead+0x12c/0x2b0 [ 8892.102701] page_cache_async_ra+0xc4/0xe0 [ 8892.107565] filemap_get_pages+0x4b8/0x684 [ 8892.112430] filemap_read+0xc4/0x30c [ 8892.116763] generic_file_read_iter+0x114/0x1b0 [ 8892.122068] xfs_file_buffered_read+0xb4/0xe0 [xfs cbfa202397a50e39d50211b94f5332ed2c6954f4] [ 8892.131638] xfs_file_read_iter+0xa8/0x124 [xfs cbfa202397a50e39d50211b94f5332ed2c6954f4] [ 8892.140936] io_read+0xe4/0x3dc [ 8892.144834] io_issue_sqe+0x218/0x1aa0 [ 8892.149346] io_submit_sqes+0x28c/0x15f4 [ 8892.154033] __arm64_sys_io_uring_enter+0x4f0/0x76c [ 8892.159689] invoke_syscall+0x78/0x100 [ 8892.164202] el0_svc_common.constprop.0+0x180/0x184 [ 8892.169859] do_el0_svc+0x34/0x9c [ 8892.173931] el0_svc+0x30/0x100 [ 8892.177830] el0t_64_sync_handler+0xa4/0x130 [ 8892.182871] el0t_64_sync+0x1a4/0x1a8 [ 8892.187300] Code: 37c00060 72001c1f 1a9f17e1 f9400a62 (79426842) [ 8892.194187] ---[ end trace 0000000000000000 ]--- After that, I most of the time can still log in to the box via ssh, examine dmesg etc, but no writing to storage is possible anymore (sync, ... all hangs). Usually, I then reboot the box via "echo b > /proc/syrq-trigger" The raspberry pi 4 is running off a USB3-connected SATA ssd, it has a good passive-cooling case (CPU temperature always below 70�C) and the original raspberry pi USB-C power supply. Running up to date tumbleweed/ARM Another raspberry pi 4 8GB, set up tu run two workers with 2 jobs / 3,7GB each does run fine without crashes or issues. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c2
--- Comment #2 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c3
--- Comment #3 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c4
--- Comment #4 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c5
--- Comment #5 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c7
--- Comment #7 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c8
--- Comment #8 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c9
--- Comment #9 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c10
--- Comment #10 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c11
--- Comment #11 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c13
--- Comment #13 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c14
--- Comment #14 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c16
--- Comment #16 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c18
--- Comment #18 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c19
--- Comment #19 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c21
--- Comment #21 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c23
--- Comment #23 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c30
--- Comment #30 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c33
--- Comment #33 from Stefan Seyfried
WARN_ON(READ_ONCE(rq->state) == MQ_RQ_IDLE);
it's line 2272 raspi4c:/var/crash/2022-05-25-07:59 # grep -c block/blk-mq.c:2772 dmesg.txt 69 so it triggered 69 times before finally crashing. I don't think I can switch easily to another qemu invocation, as the worker fetches the build script from the OBS server and the OBS server (PMBS) is not under my control. Shall I try with the patch from linux-next applied? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c34
--- Comment #34 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c35
--- Comment #35 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c36
Hannes Reinecke
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c37
--- Comment #37 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c39
--- Comment #39 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c40
Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c41
--- Comment #41 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c42
--- Comment #42 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c43
--- Comment #43 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c44
--- Comment #44 from Stefan Seyfried
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355
http://bugzilla.opensuse.org/show_bug.cgi?id=1199355#c45
--- Comment #45 from Stefan Seyfried
participants (1)
-
bugzilla_noreply@suse.com