After looking at the affected system, it seems to be an issue related to NSS
and more specifically to the use of NIS.
Here is the backtrace of the process (child of PID1) just before it gets killed
by OOM killer:
> #0 0x00007f9545019217 in pthread_cond_init@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1 0x00007f95439c4f9e in clnt_dg_create () from /lib64/libtirpc.so.3
> #2 0x00007f95439c527a in clnt_tli_create () from /lib64/libtirpc.so.3
> #3 0x00007f95439c5641 in getclnthandle () from /lib64/libtirpc.so.3
> #4 0x00007f95439c640a in __rpcb_findaddr_timed () from /lib64/libtirpc.so.3
> #5 0x00007f95439c7c11 in clnt_tp_create_timed () from /lib64/libtirpc.so.3
> #6 0x00007f95439c7dc9 in clnt_create_timed () from /lib64/libtirpc.so.3
> #7 0x00007f95439e5d8b in yp_bind_client_create_v3 () from /usr/lib64/libnsl.so.2
> #8 0x00007f95439e61db in __yp_bind.part.0 () from /usr/lib64/libnsl.so.2
> #9 0x00007f95439e7160 in yp_all () from /usr/lib64/libnsl.so.2
> #10 0x00007f95439f8121 in _nss_nis_initgroups_dyn () from /lib64/libnss_nis.so.2
> #11 0x00007f9543a0551b in getgrent_next_nss () from /lib64/libnss_compat.so.2
> #12 0x00007f9543a05a45 in _nss_compat_initgroups_dyn () from /lib64/libnss_compat.so.2
> #13 0x00007f9545669876 in internal_getgrouplist () from /lib64/libc.so.6
> #14 0x00007f9545669bbb in initgroups () from /lib64/libc.so.6
> #15 0x000055811e7e4097 in get_supplementary_groups (group=<optimized out>, ngids=<synthetic pointer>, supplementary_gids=<synthetic pointer>, gid=100, user=0x55811fd9c7b0 "mcgrof", c=0x55811fd34ab8)
It seems that something went wrong when the child of PID1 called initgroups(3)
as the call never returned.
It can be reproduced easily by starting any services that set User= option. For
example starting the following test service leads to the same memory
exhaustion:
# systemctl cat test
# /etc/systemd/system/test.service
[Unit]
Description=test
[Service]
User=1000
Type=oneshot
ExecStart=/usr/bin/sleep infinity