Comment # 6 on bug 1177800 from
I had a fresh look into this today and managed to find the cause of the
problem!

In summary the Layerscape PCIe controller generates a synchronous abort related
to reading PCI config data for the PCIe switch/bridge.

This read does not happen in normal operation but is triggered by getsysinfo
archiving/enumerating the /sys tree, where one can read out the pci config
register as a file.

The synchronous abort problem exists in mainline kernels / non SUSE systems as
well.

The Ten64 retail (1064-0201C) board has a Diodes/Pericom PI7C9X2G304SV PCIe
switch to split 1xPCIe lane to 2xPCIe 2.0 for the miniPCIe slots

lspci -nn
0000:00:00.0 PCI bridge [0604]: Freescale Semiconductor Inc Device [1957:80c0]
(rev 10)
0001:00:00.0 PCI bridge [0604]: Freescale Semiconductor Inc Device [1957:80c0]
(rev 10)
0001:01:00.0 PCI bridge [0604]: Pericom Semiconductor Device [12d8:b304] (rev
01)
0001:02:01.0 PCI bridge [0604]: Pericom Semiconductor Device [12d8:b304] (rev
01)
0001:02:02.0 PCI bridge [0604]: Pericom Semiconductor Device [12d8:b304] (rev
01)
0001:03:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI
Express Wireless Network Adapter [14c3:7915]
0001:04:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac
Wireless Network Adapter [168c:003c]
0002:00:00.0 PCI bridge [0604]: Freescale Semiconductor Inc Device [1957:80c0]
(rev 10)
0002:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd
NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
root@recovery000afa24295d:/tmp# lspci -tnn
-+-[0002:00]---00.0-[01-ff]----00.0
 +-[0001:00]---00.0-[01-ff]----00.0-[02-04]--+-01.0-[03]----00.0
 |                                           \-02.0-[04]----00.0

If the PCIe switch is hidden (disable it's upstream PCIe controller in the FDT
blob) or missing (it's been removed from some Ten64 board variants), the
problem does not occur and getsysinfo will not cause a panic.

FreeBSD had a similar issue and the cause sounds very similar to what is
happening here.

"pci: Don't try to read cfg registers of non-existing devices
Instead of returning 0xffs some controllers, such as Layerscape generate
an external exception when someone attempts to read any register
of config space of a non-existing device other than PCIR_VENDOR.
This causes a kernel panic.
Fix it by bailing during device enumeration if a device vendor register
returns invalid value. (0xffff)
Use this opportunity to replace some hardcoded values with a macro."
From
https://cgit.freebsd.org/src/commit/?id=68cbe189fdd3c572476f8af9219a5d335f05b51a

I have been able to isolate it down to the 'config' sysfs file, here is a
reduced testcase:
for i in $(find /sys/devices/platform/soc/3500000.pcie -type f); do
echo "Opening $i"
echo "------------------------------------------"
sleep 1 # allow time for console to flush
cat $i
echo "------------------------------------------"
done
....
------------------------------------------
Opening
/sys/devices/platform/soc/3500000.pcie/pci0001:00/0001:00:00.0/0001:01:00.0/0001:02:02.0/config
------------------------------------------
[  150.192901] Internal error: synchronous external abort: 96000210 [#1] SMP

I have verified the problem exists on non-SUSE systems so it's just a kernel
bug (including 5.19.0-rc5) which getsysinfo triggers.

Here is the trace from the latest Tumbleweed snapshot:
openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2022.07.01-Snapshot20220704.raw.xz

Linux localhost.localdomain 5.18.6-1-default #1 SMP PREEMPT_DYNAMIC Thu Jun 23
05:46:18 UTC 2022 (5aa0763) aarch64 aarch64 aarch64 GNU/Linux
[   36.849750][ T2016] Internal error: synchronous external abort: 96000210
[#1]                                                                           
                                               SMP
[   36.857252][ T2016] Modules linked in: af_packet mt7915e ath10k_pci
ath10k_co                                                                      
                                                   re mt76_connac_lib mt76 ath
mac80211 libarc4 fsl_dpaa2_eth pcs_lynx cfg80211 phy                           
                                                                               
              link rfkill i2c_mux_pca954x i2c_mux pci_endpoint_test
tpm_i2c_atmel qoriq_therma                                                     
                                                                    l tee sfp
uio_pdrv_genirq mdio_i2c leds_gpio uio qoriq_cpufreq nls_iso8859_1 nls         
                                                                               
                                _cp437 vfat fat fuse drm ip_tables x_tables
xhci_plat_hcd xhci_hcd caam_jr crypt                                           
                                                                             
o_engine usbcore dpaa2_caam caamhash_desc caamalg_desc aes_ce_blk aes_ce_cipher
                                                                               
                                          crct10dif_ce ghash_ce gf128mul
sha2_ce sha256_arm64 sha1_ce sp805_wdt fsl_mc_dpi                              
                                                                               
           o dpaa2_console authenc libdes caam nvme nvme_core error dwc3
sdhci_of_esdhc sdh                                                             
                                                            ci_pltfm sdhci
udc_core rtc_fsl_ftm_alarm roles mmc_core ulpi i2c_imx usb_common              
                                                                               
                            gpio_keys btrfs blake2b_generic xor xor_neon
raid6_pq libcrc32c dm_mirror dm_re                                             
                                                                           
gion_hash dm_log dm_mod sg
[   36.929404][ T2016] CPU: 0 PID: 2016 Comm: cp Not tainted 5.18.6-1-default
#1                                                                             
                                             openSUSE Tumbleweed
a3ce01492e87efb4fa7f3baf169c992c0c69c4b7
[   36.941846][ T2016] Hardware name: traverse ten64/ten64, BIOS
2020.07-rc1-ga9                                                                
                                                         4e0d21 03/15/2022
[   36.950460][ T2016] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
BTY                                                                            
                                             PE=--)
[   36.958119][ T2016] pc : pci_generic_config_read+0x44/0xcc
[   36.963613][ T2016] lr : pci_generic_config_read+0x30/0xcc
[   36.969099][ T2016] sp : ffff80000a31b9f0
[   36.973105][ T2016] x29: ffff80000a31b9f0 x28: ffff08be45472400 x27:
00000000                                                                       
                                                  00000400
[   36.980941][ T2016] x26: 00000000000003ff x25: ffff08be45472000 x24:
00000000                                                                       
                                                  00001000
[   36.988779][ T2016] x23: 0000000000001000 x22: ffff80000a31bae4 x21:
ffffbac4                                                                       
                                                  1ea22fa0
[   36.996616][ T2016] x20: ffff80000a31ba64 x19: 0000000000000004 x18:
00000000                                                                       
                                                  00000000
[   37.004453][ T2016] x17: 0000000000000000 x16: 0000000000000000 x15:
00000000                                                                       
                                                  00000000
[   37.012289][ T2016] x14: 0000000000000000 x13: 0000000000000000 x12:
00000000                                                                       
                                                  00000000
[   37.020132][ T2016] x11: 0000000000000000 x10: 0000000000000000 x9 :
ffffbac4                                                                       
                                                  1ca785dc
[   37.027975][ T2016] x8 : 0000000000000004 x7 : ffff800008e00000 x6 :
ffff8000                                                                       
                                                  08e00000
[   37.035816][ T2016] x5 : ffff08be41a93c80 x4 : 0000000000000908 x3 :
00000000                                                                       
                                                  00000000
[   37.043656][ T2016] x2 : 0000000000000000 x1 : ffff08be4a0de000 x0 :
ffff8000                                                                       
                                                  08202400
[   37.051494][ T2016] Call trace:
[   37.054631][ T2016]  pci_generic_config_read+0x44/0xcc
[   37.059774][ T2016]  dw_pcie_rd_other_conf+0x24/0x7c
[   37.064741][ T2016]  pci_user_read_config_dword+0x84/0x124
[   37.070229][ T2016]  pci_read_config+0xf0/0x2a0
[   37.074760][ T2016]  sysfs_kf_bin_read+0x78/0xa0
[   37.079378][ T2016]  kernfs_fop_read_iter+0xac/0x1d4
[   37.084344][ T2016]  new_sync_read+0xd8/0x160
[   37.088700][ T2016]  vfs_read+0x19c/0x1e4
[   37.092710][ T2016]  ksys_read+0x78/0x10c
[   37.096718][ T2016]  __arm64_sys_read+0x28/0x34
[   37.101248][ T2016]  invoke_syscall+0x78/0x100
[   37.105693][ T2016]  el0_svc_common.constprop.0+0x58/0x190
[   37.111181][ T2016]  do_el0_svc+0x30/0x90
[   37.115191][ T2016]  el0_svc+0x34/0x130
[   37.119029][ T2016]  el0t_64_sync_handler+0x10c/0x140
[   37.124080][ T2016]  el0t_64_sync+0x1a0/0x1a4
[   37.128439][ T2016] Code: 7100067f 540001c0 71000a7f 54000280 (b9400001)
[   37.135228][ T2016] ---[ end trace 0000000000000000 ]---
[   37.140539][ T2016] note: cp[2016] exited with preempt_count 1

And from Leap 15.4:
Linux localhost 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11
06:57:18 UTC 2022 (49db222) aarch64 aarch64 aarch64 GNU/Linux
[  445.922445][ T2950] Call trace:
[  445.925582][ T2950]  pci_generic_config_read+0x40/0x100
[  445.930810][ T2950]  dw_pcie_rd_other_conf+0x20/0x80
[  445.935777][ T2950]  pci_user_read_config_dword+0x88/0x140


You are receiving this mail because: