Bug ID 1202821
Summary Containers lose permission to /dev/pts after some time
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.4
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Containers
Assignee containers-bugowner@suse.de
Reporter fvogt@suse.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

On openqaworker1, there are four (pretty much identical) containers set up as
services which run openQA tests. After some uptime (~1 day?), xterm fails to
start due to permission issues. When it happens, those issues can be triggered
manually by executing e.g. su -P. After restarting a container, the permission
issues disappear for a while.

Sometimes a container has a different issue regarding cgroups, where it looks
like the assigned cgroup somehow disappeared.

Here's some example output showing the symptoms. containers 101 and 104 were
recently restarted and work fine.

openqaworker1:~ # podman exec -i openqaworker1_container_101 su -P
openqaworker1_container:/ # exit
openqaworker1:~ # podman exec -i openqaworker1_container_104 su -P
openqaworker1_container:/ # exit
openqaworker1:~ # podman exec -i openqaworker1_container_102 su -P
su: failed to create pseudo-terminal: Operation not permitted
openqaworker1:~ # podman exec -i openqaworker1_container_103 su -P
Error: exec failed: unable to start container process: error adding pid 10265
to cgroups: failed to write 10265: openat2
/sys/fs/cgroup/unified/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope/cgroup.procs:
no such file or directory: OCI runtime attempted to invoke a command that was
not found

I didn't know where to begin looking, so I started at the bottom and used
systemtap to trace where openat gets its error code from. I traced the
permission issue down to the devices cgroup (v1).

Some information about the relevant cgroups:

container 101 (restarted, works):

openqaworker1:~ # podman inspect openqaworker1_container_101 | grep CgroupPath
            "CgroupPath":
"/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope"
openqaworker1:~ # cat
/sys/fs/cgroup/devices/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope/devices.list
c 10:200 rwm
c 5:2 rwm
c 5:0 rwm
c 1:9 rwm
c 1:8 rwm
c 1:7 rwm
c 1:5 rwm
c 1:3 rwm
b *:* m
c *:* m
c 136:* rwm
openqaworker1:~ # podman inspect openqaworker1_container_101 | grep -i pid
            "Pid": 9426,
            "ConmonPid": 9413,
        "ConmonPidFile":
"/var/run/containers/storage/btrfs-containers/955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812/userdata/conmon.pid",
        "PidFile": "",
            "PidMode": "private",
            "PidsLimit": 2048,
openqaworker1:~ # cat /proc/9426/cgroup 
13:blkio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
12:perf_event:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
11:devices:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
10:pids:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
9:memory:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
8:rdma:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
7:misc:/
6:cpuset:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
5:net_cls,net_prio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
4:freezer:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
3:cpu,cpuacct:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
2:hugetlb:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
1:name=systemd:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope
0::/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope

container 102 (permission issue, device entries missing):

openqaworker1:~ # podman inspect openqaworker1_container_102 | grep CgroupPath
            "CgroupPath":
"/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope"
openqaworker1:~ # cat
/sys/fs/cgroup/devices/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope/devices.list 
c 1:3 rwm
c 1:5 rwm
c 1:7 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:2 rwm
c 10:200 rwm
openqaworker1:~ # cat /proc/17525/cgroup
13:blkio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
12:perf_event:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
11:devices:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
10:pids:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
9:memory:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
8:rdma:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
7:misc:/
6:cpuset:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
5:net_cls,net_prio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
4:freezer:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
3:cpu,cpuacct:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
2:hugetlb:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
1:name=systemd:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope
0::/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope

container 103 (cgroup missing?):

openqaworker1:~ # podman inspect openqaworker1_container_103 | grep -i pid
            "Pid": 18167,
            "ConmonPid": 18154,
        "ConmonPidFile":
"/var/run/containers/storage/btrfs-containers/6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0/userdata/conmon.pid",
        "PidFile": "",
            "PidMode": "private",
            "PidsLimit": 2048,
openqaworker1:~ # cat /proc/18167/cgroup 
13:blkio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
12:perf_event:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
11:devices:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
10:pids:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
9:memory:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
8:rdma:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
7:misc:/
6:cpuset:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
5:net_cls,net_prio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
4:freezer:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
3:cpu,cpuacct:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
2:hugetlb:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope
1:name=systemd:/system.slice/container-openqaworker1_container_103.service
0::/system.slice/container-openqaworker1_container_103.service

It's visible that in the case of container 102, the device cgroup entries "b
*:* m" and "c *:* m" got removed
somehow and for container 103, the cgroup changed completely
(system.slice/*.service instead of machine.slice/libpod-*).

bug 1178775 sounds similar. I guess systemd is somehow interfering with podman?


You are receiving this mail because: