[Bug 1218552] New: [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck
https://bugzilla.suse.com/show_bug.cgi?id=1218552 Bug ID: 1218552 Summary: [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: monkeyboyted@yahoo.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 871668 --> https://bugzilla.suse.com/attachment.cgi?id=871668&action=edit dmesg-pci-eror Hi everyone, Every once in awhile, this Steam Deck print out this error and the system drops me to tty. I do not know why. I did change the SSD to a WD Black as listed below. I do not know how to reproduce the error. lsb_release -a LSB Version: n/a Distributor ID: openSUSE Description: openSUSE Tumbleweed Release: 20231228 Codename: n/a Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Valve Version: F7A0120 Release Date: 12/01/2023 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 16 MB Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.6.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 238.1 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.6.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2023-12-14 17:36:48 +0000 GIT Revision: 6869d093e8485475463bc171d23d7c4142fb6fa4 GIT Branch: stable === START OF INFORMATION SECTION === Model Number: WD_BLACK SN770M 1TB Serial Number: 233101400993 Firmware Version: 731100WD PCI Vendor/Subsystem ID: 0x15b7 IEEE OUI Identifier: 0x001b44 Total NVM Capacity: 1,000,204,886,016 [1.00 TB] Unallocated NVM Capacity: 0 Controller ID: 0 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 001b44 4a48dc08dc Local Time is: Thu Jan 4 17:36:48 2024 PST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify Log Page Attributes (0x7e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Log0_FISE_MI Telmtry_Ar_4 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 36 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 3,267,997 [1.67 TB] Data Units Written: 3,844,737 [1.96 TB] Host Read Commands: 22,073,106 Host Write Commands: 62,512,660 Controller Busy Time: 79 Power Cycles: 576 Power On Hours: 46 Unsafe Shutdowns: 123 [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 [ 402.012342] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) [ 402.012346] nvme 0000:01:00.0: device [15b7:5042] error status/mask=00000001/0000e000 [ 402.012351] nvme 0000:01:00.0: [ 0] RxErr [ 421.302005] usb 3-1.1: new full-speed USB device -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c1 --- Comment #1 from ted chang <monkeyboyted@yahoo.com> --- lspci -vnn 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Root Complex [1022:1645] Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] VanGogh IOMMU [1022:1646] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ -2147483648 Capabilities: <access denied> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] Flags: fast devsel, IOMMU group 0 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 1 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: 80600000-806fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 29, IOMMU group 2 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: 80500000-805fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1453] Flags: bus master, fast devsel, latency 0, IRQ 30, IOMMU group 3 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 2000-2fff [size=4K] [16-bit] Memory behind bridge: 80400000-804fffff [size=1M] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632] Flags: fast devsel, IOMMU group 4 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 31, IOMMU group 4 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 1000-1fff [size=4K] [16-bit] Memory behind bridge: 80000000-803fffff [size=4M] [32-bit] Prefetchable memory behind bridge: f8e0000000-f8f01fffff [size=258M] [32-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 32, IOMMU group 4 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: [disabled] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] Flags: bus master, fast devsel, latency 0, IRQ 33, IOMMU group 4 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: [disabled] [32-bit] Memory behind bridge: [disabled] [32-bit] Prefetchable memory behind bridge: [disabled] [64-bit] Capabilities: <access denied> Kernel driver in use: pcieport 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71) Subsystem: Valve Software Device [1e44:1776] Flags: 66MHz, medium devsel, IOMMU group 5 Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4, sp5100_tco 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, 66MHz, medium devsel, latency 0, IOMMU group 5 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 0 [1022:1660] Flags: fast devsel, IOMMU group 6 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 1 [1022:1661] Flags: fast devsel, IOMMU group 6 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 2 [1022:1662] Flags: fast devsel, IOMMU group 6 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 3 [1022:1663] Flags: fast devsel, IOMMU group 6 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 4 [1022:1664] Flags: fast devsel, IOMMU group 6 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 5 [1022:1665] Flags: fast devsel, IOMMU group 6 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 6 [1022:1666] Flags: fast devsel, IOMMU group 6 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 7 [1022:1667] Flags: fast devsel, IOMMU group 6 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp Device [15b7:5042] (rev 01) (prog-if 02 [NVM Express]) Subsystem: Sandisk Corp Device [15b7:5042] Flags: bus master, fast devsel, latency 0, IRQ 49, IOMMU group 7 Memory at 80600000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: nvme Kernel modules: nvme 02:00.0 SD Host controller [0805]: O2 Micro, Inc. SD/MMC Card Reader Controller [1217:8621] (rev 01) (prog-if 01) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 39, IOMMU group 8 Memory at 80501000 (32-bit, non-prefetchable) [size=4K] Memory at 80500000 (32-bit, non-prefetchable) [size=2K] Capabilities: <access denied> Kernel driver in use: sdhci-pci Kernel modules: sdhci_pci 03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter [10ec:c822] DeviceName: Broadcom 5762 Subsystem: AzureWave Device [1a3b:4210] Flags: bus master, fast devsel, latency 0, IRQ 73, IOMMU group 9 I/O ports at 2000 [size=256] Memory at 80400000 (64-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: rtw_8822ce Kernel modules: rtw88_8822ce 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] [1002:163f] (rev ae) (prog-if 00 [VGA controller]) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:0123] Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4 Memory at f8e0000000 (64-bit, prefetchable) [size=256M] Memory at f8f0000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at 80300000 (32-bit, non-prefetchable) [size=512K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu 04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 72, IOMMU group 4 Memory at 803c0000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP [1022:1649] Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 35, IOMMU group 4 Memory at 80200000 (32-bit, non-prefetchable) [size=1M] Memory at 803c4000 (32-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: ccp Kernel modules: ccp 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB0 [1022:163a] (prog-if fe [USB Device]) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 69, IOMMU group 4 Memory at 80000000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: dwc3-pci Kernel modules: dwc3_pci 04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB1 [1022:163b] (prog-if 30 [XHCI]) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4 Memory at 80100000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 50) Subsystem: Valve Software Device [1e44:1776] Flags: bus master, fast devsel, latency 0, IRQ 70, IOMMU group 4 Memory at 80380000 (32-bit, non-prefetchable) [size=256K] Capabilities: <access denied> Kernel driver in use: snd_pci_acp5x Kernel modules: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x, snd_pci_acp6x, snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps, snd_sof_amd_renoir, snd_sof_amd_rembrandt, snd_sof_amd_vangogh 05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] (rev 61) Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel, IOMMU group 4 Capabilities: <access denied> 06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] Subsystem: Valve Software Device [1e44:1776] Flags: fast devsel, IOMMU group 4 Capabilities: <access denied> -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c3 --- Comment #3 from ted chang <monkeyboyted@yahoo.com> --- (In reply to Daniel Wagner from comment #2)
Random idea, disable the power safe modes on the pci link if they are enabled.
nvme_core.default_ps_max_latency_us=0
Some details on this topic:
https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst- problems-for-linux
Are you looking for something in particular? Are you waiting until I see AER: Corrected error received: 0000:01:00.0 again? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c6 --- Comment #6 from ted chang <monkeyboyted@yahoo.com> --- (In reply to Daniel Wagner from comment #5)
BTW, you could still try to disable the powersafe modes and see if this makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS quirk, maybe this device is one of these.
Hmmm. I contact WD and they told me I am running the newest firmware. I asked them whether or not they can send my information to their engineers to fix this SSD. I am a direct consumer after all and they did advertise this SSD works on Steam decks. I might try that quirk in the future. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c8 --- Comment #8 from ted chang <monkeyboyted@yahoo.com> --- (In reply to Daniel Wagner from comment #7)
Unfortunately, some manufactures are not so keen on updating consumer devices. Don't know if this is the situation here.
Anyway, you can test the quirk by adding
nvme_core.default_ps_max_latency_us=0
to kernel command line. If this resolves it, I can spin a kernel patch and forward it upstream. In this case I would need also the output of 'nvme id-ctrl /dev/nvme0' please.
Ok. I will try. I will have trouble triggering this bug again because the SDMA0 bug seem to be trigger more often than this pciport error. On the other note, I was hoping Steam Deck and associative handhelds were an enticing enough market for WD to devote engineers to ensure decent quality. Thanks. I will run the cmdline and take a look -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c9 --- Comment #9 from ted chang <monkeyboyted@yahoo.com> --- Created attachment 873477 --> https://bugzilla.suse.com/attachment.cgi?id=873477&action=edit kernel-default-6.7.7 with nvme_core.default_ps_max_latency_us=0 //(In reply to Daniel Wagner from comment #7)
Unfortunately, some manufactures are not so keen on updating consumer devices. Don't know if this is the situation here.
Anyway, you can test the quirk by adding
nvme_core.default_ps_max_latency_us=0
to kernel command line. If this resolves it, I can spin a kernel patch and forward it upstream. In this case I would need also the output of 'nvme id-ctrl /dev/nvme0' please.
Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.7.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 239.6 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.7.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2024-03-01 13:51:21 +0000 GIT Revision: 1ff84c539098385746e3fa3aaf975296fb8e6791 GIT Branch: stable I am going to remove it from my kernel cmdline args BOOT_IMAGE=/boot/vmlinuz-6.7.7-1-default root=UUID=85486fcd-23d7-43b7-8be3-ad9a2ff0797a splash=silent mitigations=auto quiet security=apparmor nvme_core.default_ps_max_latency_us=0 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552 https://bugzilla.suse.com/show_bug.cgi?id=1218552#c10 --- Comment #10 from ted chang <monkeyboyted@yahoo.com> --- Created attachment 873478 --> https://bugzilla.suse.com/attachment.cgi?id=873478&action=edit WD_BLACK SN770M 1TB - info dump -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com