Hi Daniel, I tried the pcie_aspm=off option, and it seems to work! At least I experienced no more freezing while running through the same scenarios that pretty consistently made it freeze previously. I ended up using nvme_core.default_ps_max_latency_us=0 that someone else pointed out. So far the freezes are gone with that boot option as well. Thank you! Best, -Gerhard On 12/3/20 1:25 AM, Daniel Wagner wrote:
Hi,
On Wed, Dec 02, 2020 at 04:31:14PM -0800, Gerhard Theurich wrote:
pcieport 000:00:1d.4: DPC: unmasked uncorrectable error detected nvme nvme0: frozen state error detected, reset controller
I am not a PCI expert but from a quick glance on some documentation I'd say the PCI controller detects an error which gets the error recovery strategy of the kernel going. This results in a NVMe controller reset and the filesystem gets marked read only. So this makes all sense.
The obvious question is what kind of error is detected?
Anyway, there is a kernel option to disable the error detection (pci=noear). One thing you could also try is to disable active power state management, see
https://www.thomas-krenn.com/de/wiki/PCIe_Bus_Error_Status_00001100_beheben
(assuming you understand German :))
HTH, Daniel