Bug ID 1165351
Summary systemd libpam-systemd immediate memory exhaustion triggers oom upon ssh
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware Other
OS Other
Status NEW
Severity Major
Priority P5 - None
Component Basesystem
Assignee screening-team-bugs@suse.de
Reporter luis.chamberlain@suse.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

I have a fresh install of OpenSUSE Tumbleweed on two HP ProLiant DL380 Gen9,
its at version 20200222 right now both with 30 GiB of RAM using
systemd-244-2.1.x86_64. On one of them I noticed that whenever I ssh in as a
regular user but not root I see the out of memory killer triggered. That is..
30 GiB of RAM eaten up.

The system now has the debug boolean set on /etc/pam.d/common-session and sudo
journalctl -f reveals:

Feb 29 16:31:18 chivo sshd[16511]: Accepted publickey for mcgrof from
10.100.208.217 port 44016 ssh2: RSA
SHA256:tOilLVI1jM/Pvg5uJVQbeXGEAU0mma6Jxma9c5eVX0w
Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): pam-systemd
initializing
Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Asking logind to
create session: uid=1000 pid=16511 service=sshd type=tty class=user desktop=
seat= vtnr=0 tty= display= remote=yes remote_user= remote_host=10.100.208.217
Feb 29 16:31:18 chivo sshd[16511]: pam_systemd(sshd:session): Session limits:
memory_max=n/a tasks_max=n/a cpu_weight=n/a io_weight=n/a runtime_max_sec=n/a
Feb 29 16:31:18 chivo systemd-logind[2184]: New session 62 of user mcgrof.      
Feb 29 16:31:18 chivo systemd[1]: Starting User Manager for UID 1000...         

-- <oom trace> ---

Feb 29 16:31:34 chivo kernel: oom_reaper: reaped process 16513 ((systemd)), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Feb 29 16:31:34 chivo systemd[1]: user@1000.service: Failed with result
'protocol'.
Feb 29 16:31:34 chivo systemd[1]: Failed to start User Manager for UID 1000.    
Feb 29 16:31:34 chivo systemd[1]: Started Session 62 of user mcgrof.            
Feb 29 16:31:34 chivo sshd[16511]: pam_systemd(sshd:session): Reply from
logind: id=62 object_path=/org/freedesktop/login1/session/_362
runtime_path=/run/user/1000 session_fd=7 seat= vtnr=0 original_uid=1000
Feb 29 16:31:34 chivo sshd[16511]: pam_unix(sshd:session): session opened for
user mcgrof(uid=1000) by (uid=0)


Here's what a vmstat log looks like every second, first with the system on
idle, then I create a new ssh session, it takes a while to eat up all the
memory and then the oom takes over and kills the pam systemd service, and I am
able to log in. The blank links are put in place below to highlight when the
ssh session is just delaying and the oom is about to hit:

mcgrof@chivo ~ $ sudo vmstat 1                                                  
procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
st
 1  0 248704 31981364    136 160708    2  248    35   252   31   63  0  1 99  0
 0
 0  0 248704 31981056    136 160856    0    0     0     0  131  101  0  0 100 
0  0
 0  0 248704 31981056    136 160856    0    0     0     0   82   92  0  0 100 
0  0
 0  0 248704 31980552    136 160824   52    0   384     0  108  157  0  0 100 
0  0
 0  0 248704 31980472    136 161196    0    0     0   232  164  280  0  0 100 
0  0
 0  0 248704 31980200    136 161344  128    0   228     0  155  240  0  0 100 
0  0
 0  0 248704 31980200    136 161412    0    0     0     0  160  106  0  0 100 
0  0
 0  0 248704 31979628    136 161484    0    0     0     0   95  108  0  0 100 
0  0
 0  0 248704 31979628    136 161484    0    0     0     0  114  146  0  0 100 
0  0
 0  0 248704 31979628    136 161484    0    0     0     0   79   84  0  0 100 
0  0
 1  0 248704 31932744    136 161484    0    0    16     0  120  133  0  0 100 
0  0
 0  0 248704 31929664    136 161504    0    0     0     0  116  104  0  0 100 
0  0
 0  0 248704 31929640    136 161508    0    0     0     0  125  162  0  0 100 
0  0
 1  0 248704 28031352    136 163092  416    0  1940     0  574  515  3  2 95  0
 0
 1  0 248704 22681916    136 163092    0    0    64   340  473  354  3  2 95  0
 0
 1  0 248704 17326052    136 163172    0    0     0     0  364  116  3  2 95  0
 0
 1  0 248704 11968392    136 163172    0    0     0     0  364  124  3  2 95  0
 0
 1  0 248704 6612528    136 163172    0    0     0     0  374  110  3  2 95  0 
0
 1  0 248704 1257168    136 163172    0    0     0     0  366  114  3  2 95  0 
0
 1  1 565632 215044    136  82292  192 316912  1864 316912 1678 1135  1  2 91 
6  0
 2  1 954816 207848    136  76672  264 389708  1564 389708 30002 12569  0  1 89
 9  0


 3  2 353004 24513544    136  78252  140 139752  8480 139752 38917 113910  0 16
83  1  0
 0  0 350444 32073372    136 122312 3040   24 42184    68 4774 14383  0  1 98 
1  0
 0  0 350444 32073372    136 122344    8    0    24     0  134  159  0  0 100 
0  0


Running $(sudo htop --sort-key PERCENT_MEM  -d 1) reveals the same but in a
much more fun way.

FWIW it is using yp, and I so I have:

/etc/yp.conf
domain suse.de server <some-private-ip>
domain suse.de server <some-other-ip>

However leaving this blank, running $(netconfig update -f), and then ssh'ing in
reveals it still triggers.

Related public issues:

https://github.com/systemd/systemd/issues/11502
https://bugzilla.redhat.com/show_bug.cgi?id=1666690
https://github.com/systemd/systemd/commit/084eeb865ca63887098e0945fb4e93c852b91b0f

Red Hat's bug reveals it was a bad backport of the fix for CVE-2018-16864 and
so that itself introduced another CVE, CVE-2019-3815.

I can reproduce very thing with a simple shiny new user created as user named
test. The user root does not trigger the issue. What is however odd is that
this issue is not reproducible on the other twin system.


You are receiving this mail because: