[Bug 1218552] New: [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck
https://bugzilla.suse.com/show_bug.cgi?id=1218552 Bug ID: 1218552 Summary: [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 - Steam Deck Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: monkeyboyted@yahoo.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 871668 --> https://bugzilla.suse.com/attachment.cgi?id=871668&action=edit dmesg-pci-eror Hi everyone, Every once in awhile, this Steam Deck print out this error and the system drops me to tty. I do not know why. I did change the SSD to a WD Black as listed below. I do not know how to reproduce the error. lsb_release -a LSB Version: n/a Distributor ID: openSUSE Description: openSUSE Tumbleweed Release: 20231228 Codename: n/a Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Valve Version: F7A0120 Release Date: 12/01/2023 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 16 MB Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.6.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 238.1 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.6.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2023-12-14 17:36:48 +0000 GIT Revision: 6869d093e8485475463bc171d23d7c4142fb6fa4 GIT Branch: stable === START OF INFORMATION SECTION === Model Number: WD_BLACK SN770M 1TB Serial Number: 233101400993 Firmware Version: 731100WD PCI Vendor/Subsystem ID: 0x15b7 IEEE OUI Identifier: 0x001b44 Total NVM Capacity: 1,000,204,886,016 [1.00 TB] Unallocated NVM Capacity: 0 Controller ID: 0 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 001b44 4a48dc08dc Local Time is: Thu Jan 4 17:36:48 2024 PST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify Log Page Attributes (0x7e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Log0_FISE_MI Telmtry_Ar_4 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 36 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 3,267,997 [1.67 TB] Data Units Written: 3,844,737 [1.96 TB] Host Read Commands: 22,073,106 Host Write Commands: 62,512,660 Controller Busy Time: 79 Power Cycles: 576 Power On Hours: 46 Unsafe Shutdowns: 123 [ 402.012325] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0 [ 402.012342] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) [ 402.012346] nvme 0000:01:00.0: device [15b7:5042] error status/mask=00000001/0000e000 [ 402.012351] nvme 0000:01:00.0: [ 0] RxErr [ 421.302005] usb 3-1.1: new full-speed USB device -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c1
--- Comment #1 from ted chang
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c3
--- Comment #3 from ted chang
Random idea, disable the power safe modes on the pci link if they are enabled.
nvme_core.default_ps_max_latency_us=0
Some details on this topic:
https://unix.stackexchange.com/questions/612096/clarifying-nvme-apst- problems-for-linux
Are you looking for something in particular? Are you waiting until I see AER: Corrected error received: 0000:01:00.0 again? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c6
--- Comment #6 from ted chang
BTW, you could still try to disable the powersafe modes and see if this makes the ECC go away. Some WDC devices need the NVME_QUIRK_NO_DEEPEST_PS quirk, maybe this device is one of these.
Hmmm. I contact WD and they told me I am running the newest firmware. I asked them whether or not they can send my information to their engineers to fix this SSD. I am a direct consumer after all and they did advertise this SSD works on Steam decks. I might try that quirk in the future. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c8
--- Comment #8 from ted chang
Unfortunately, some manufactures are not so keen on updating consumer devices. Don't know if this is the situation here.
Anyway, you can test the quirk by adding
nvme_core.default_ps_max_latency_us=0
to kernel command line. If this resolves it, I can spin a kernel patch and forward it upstream. In this case I would need also the output of 'nvme id-ctrl /dev/nvme0' please.
Ok. I will try. I will have trouble triggering this bug again because the SDMA0 bug seem to be trigger more often than this pciport error. On the other note, I was hoping Steam Deck and associative handhelds were an enticing enough market for WD to devote engineers to ensure decent quality. Thanks. I will run the cmdline and take a look -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c9
--- Comment #9 from ted chang
Unfortunately, some manufactures are not so keen on updating consumer devices. Don't know if this is the situation here.
Anyway, you can test the quirk by adding
nvme_core.default_ps_max_latency_us=0
to kernel command line. If this resolves it, I can spin a kernel patch and forward it upstream. In this case I would need also the output of 'nvme id-ctrl /dev/nvme0' please.
Information for package kernel-default: --------------------------------------- Repository : openSUSE-Tumbleweed-Oss Name : kernel-default Version : 6.7.7-1.1 Arch : x86_64 Vendor : openSUSE Installed Size : 239.6 MiB Installed : Yes Status : up-to-date Source package : kernel-default-6.7.7-1.1.nosrc Upstream URL : https://www.kernel.org/ Summary : The Standard Kernel Description : The standard kernel for both uniprocessor and multiprocessor systems. Source Timestamp: 2024-03-01 13:51:21 +0000 GIT Revision: 1ff84c539098385746e3fa3aaf975296fb8e6791 GIT Branch: stable I am going to remove it from my kernel cmdline args BOOT_IMAGE=/boot/vmlinuz-6.7.7-1-default root=UUID=85486fcd-23d7-43b7-8be3-ad9a2ff0797a splash=silent mitigations=auto quiet security=apparmor nvme_core.default_ps_max_latency_us=0 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1218552
https://bugzilla.suse.com/show_bug.cgi?id=1218552#c10
--- Comment #10 from ted chang
participants (1)
-
bugzilla_noreply@suse.com