Bug ID 1165535
Summary System lockup under memory pressure using encrypted swap on SSD (failed command: WRITE FPDMA QUEUED)
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.1
Hardware Other
OS Other
Status NEW
Severity Major
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter Ulrich.Windl@rz.uni-regensburg.de
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

On a system that uses an SSD for /boot and encrypted swap only (the rest is on
a traditional hard disk), I had the following problem:
I was working normally with the system, when I decided to launch "yast2
online_update".
Unfortunately the system became terribly slow (practically not responding for
minutes), so I could not do anything than press the Reset-button after having
waited about 30 minutes.

After reboot I see "odd" ATA errors related to the SSD. However the SSD is
rather new and in good condition. So I guess the issue has to do with memory
pressure.

Here are the logs when things started to become slow:
Mar 03 10:09:57 pc dbus-daemon[2706]: [session uid=1000 pid=2706] Successfully
activated service 'org.gnome.evince.Daemon'
Mar 03 10:09:57 pc systemd[2686]: Starting Evince document viewer...
Mar 03 10:09:57 pc systemd[2686]: Started Evince document viewer.
Mar 03 10:10:13 pc kernel: ata2: log page 10h reported inactive tag 5
Mar 03 10:10:14 pc kernel: ata2.00: exception Emask 0x1 SAct 0x7fffffdf SErr
0x0 action 0x0
Mar 03 10:10:14 pc kernel: ata2.00: irq_stat 0x40000009
Mar 03 10:10:14 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:10:14 pc kernel: ata2.00: cmd 61/08:00:e0:b0:7d/00:00:00:00:00/40 tag
0 ncq dma 4096 out
                                           res
40/00:00:e0:b0:7d/00:00:00:00:00/40 Emask 0x1 (device error)
Mar 03 10:10:14 pc kernel: ata2.00: status: { DRDY }
Mar 03 10:10:14 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:10:14 pc kernel: ata2.00: cmd 61/08:08:e8:b0:7d/00:00:00:00:00/40 tag
1 ncq dma 4096 out
                                           res
40/00:00:e0:b0:7d/00:00:00:00:00/40 Emask 0x1 (device error)
Mar 03 10:10:14 pc kernel: ata2.00: status: { DRDY }
Mar 03 10:10:14 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:10:14 pc kernel: ata2.00: cmd 61/08:10:f0:b0:7d/00:00:00:00:00/40 tag
2 ncq dma 4096 out
                                           res
40/00:00:e0:b0:7d/00:00:00:00:00/40 Emask 0x1 (device error)
Mar 03 10:10:14 pc kernel: ata2.00: status: { DRDY }
[...]
Mar 03 10:10:14 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:10:14 pc kernel: ata2.00: cmd 61/08:f0:d8:b0:7d/00:00:00:00:00/40 tag
30 ncq dma 4096 out
                                           res
40/00:00:e0:b0:7d/00:00:00:00:00/40 Emask 0x1 (device error)
Mar 03 10:10:14 pc kernel: ata2.00: status: { DRDY }
Mar 03 10:10:14 pc kernel: ata2.00: supports DRM functions and may not be fully
accessible
Mar 03 10:10:14 pc kernel: ata2.00: supports DRM functions and may not be fully
accessible
Mar 03 10:10:14 pc kernel: ata2.00: configured for UDMA/133
Mar 03 10:10:14 pc kernel: ata2: EH complete
Mar 03 10:10:14 pc kernel: ata2.00: Enabling discard_zeroes_data
Mar 03 10:10:14 pc rsyslogd[1332]:  message repeated 13 times: [-- MARK --]
Mar 03 10:10:14 pc rsyslogd[1332]: action 'action 1' suspended (module
'builtin:ompipe'), retry 0. There should be messages before this one giving the
reason for suspension. [v8.33.1 try http://www.rsyslog.com/e/2007 ]
Mar 03 10:10:14 pc rsyslogd[1332]: action 'action 1' suspended (module
'builtin:ompipe'), next retry is Tue Mar  3 10:10:44 2020, retry nbr 0. There
should be messages before this one giving the reason for suspension. [v8.33.1
try http://www.rsyslog.com/e/2007 ]
[...]
Mar 03 10:15:48 pc /usr/lib/gdm/gdm-x-session[2700]: (II) event2  - Logitech
USB-PS/2 Optical Mouse: SYN_DROPPED event - some input events have been lost.
Mar 03 10:16:17 pc /usr/lib/gdm/gdm-x-session[2700]: (II) event2  - Logitech
USB-PS/2 Optical Mouse: SYN_DROPPED event - some input events have been lost.
[...]
Mar 03 10:25:13 pc systemd-journald[30532]: Journal started
Mar 03 10:25:18 pc systemd-journald[30532]: System journal
(/var/log/journal/89c660865c00403a9bacef32b6828556) is 856.0M, max 819.2M, 0B
free.
[...]
Mar 03 10:25:19 pc kernel: ata2.00: exception Emask 0x50 SAct 0x7f000007 SErr
0x800 action 0x6 frozen
Mar 03 10:25:19 pc kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Mar 03 10:25:19 pc kernel: ata2: SError: { HostInt }
Mar 03 10:25:19 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:25:19 pc kernel: ata2.00: cmd 61/c8:00:38:06:78/01:00:00:00:00/40 tag
0 ncq dma 233472 out
                                           res
40/00:c0:00:00:72/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Mar 03 10:25:19 pc kernel: ata2.00: status: { DRDY }
[...]
Mar 03 10:25:19 pc kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Mar 03 10:25:19 pc kernel: ata2.00: cmd 61/40:f0:f8:00:78/05:00:00:00:00/40 tag
30 ncq dma 688128 out
                                           res
40/00:c0:00:00:72/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Mar 03 10:25:19 pc kernel: ata2.00: status: { DRDY }
Mar 03 10:25:19 pc kernel: ata2: hard resetting link
Mar 03 10:25:19 pc kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl
300)
Mar 03 10:25:19 pc kernel: ata2.00: supports DRM functions and may not be fully
accessible
Mar 03 10:25:19 pc kernel: ata2.00: supports DRM functions and may not be fully
accessible
Mar 03 10:25:19 pc kernel: ata2.00: configured for UDMA/133
Mar 03 10:25:19 pc kernel: ata2: EH complete
Mar 03 10:25:19 pc kernel: ata2.00: Enabling discard_zeroes_data
[...]
Mar 03 10:25:19 pc systemd-coredump[30527]: Process 28448 (systemd-journal) of
user 0 dumped core.
Mar 03 10:25:19 pc systemd-coredump[30527]: Coredump diverted to
/var/lib/systemd/coredump/core.systemd-journal.0.92bd8e3e91344ef998c2c08ade695cd2.28448.1583227313000000.lz4
Mar 03 10:25:19 pc systemd-coredump[30527]: Stack trace of thread 28448:
Mar 03 10:25:19 pc systemd-coredump[30527]: #0  0x00007f52360114f5
journal_file_move_to_object (libsystemd-shared-234.so)
Mar 03 10:25:19 pc systemd-coredump[30527]: #1  0x00007f5236012f00 n/a
(libsystemd-shared-234.so)
Mar 03 10:25:19 pc systemd-coredump[30527]: #2  0x00007f5236013548 n/a
(libsystemd-shared-234.so)
Mar 03 10:25:19 pc systemd-coredump[30527]: #3  0x00007f523601470d
journal_file_append_entry (libsystemd-shared-234.so)
[...]
Mar 03 10:25:54 pc systemd-journald[30532]: Journal stopped
Mar 03 10:26:38 pc systemd-journald[30532]: Received SIGTERM from PID 1
(systemd).
Mar 03 10:26:38 pc systemd-journald[30545]: Journal started
Mar 03 10:26:38 pc systemd-journald[30545]: System journal
(/var/log/journal/89c660865c00403a9bacef32b6828556) is 856.0M, max 819.2M, 0B
free.
[...]

>From this point on I did not see any more ATA errors, but the system was still
unusable, so I hit Reset:

Mar 03 10:55:58 pc kernel: SFW2-INext-DROP-DEFLT IN=eth0 OUT=
MAC=00:22:11:dd:00:11:a0:36:9f:b2:3f:25:08:00 SRC=1.9.154.170 1.9.5.2 LEN=44
TOS=0x00 PREC=0x00 TTL=5
6 ID=52696 PROTO=TCP SPT=47944 DPT=655 WINDOW=1024 RES=0x00 SYN URGP=0 OPT
(020405B4) 
-- Reboot --
Mar 03 11:05:24 linux-abcd kernel: Linux version 4.12.14-lp151.28.36-default
(geeko@buildhost) (gcc version 7.4.1 20190905 [gcc-7-branch revision 275407]
(SUSE Linux) ) #1 SMP Fri Dec 6 13:50:27 UTC 2019 (8f4a495)

Final note: The IP address and related identifyable information in the log has
been anonymized for privacy.


You are receiving this mail because: