Bug ID | 1202821 |
---|---|
Summary | Containers lose permission to /dev/pts after some time |
Classification | openSUSE |
Product | openSUSE Distribution |
Version | Leap 15.4 |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | Containers |
Assignee | containers-bugowner@suse.de |
Reporter | fvogt@suse.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
On openqaworker1, there are four (pretty much identical) containers set up as services which run openQA tests. After some uptime (~1 day?), xterm fails to start due to permission issues. When it happens, those issues can be triggered manually by executing e.g. su -P. After restarting a container, the permission issues disappear for a while. Sometimes a container has a different issue regarding cgroups, where it looks like the assigned cgroup somehow disappeared. Here's some example output showing the symptoms. containers 101 and 104 were recently restarted and work fine. openqaworker1:~ # podman exec -i openqaworker1_container_101 su -P openqaworker1_container:/ # exit openqaworker1:~ # podman exec -i openqaworker1_container_104 su -P openqaworker1_container:/ # exit openqaworker1:~ # podman exec -i openqaworker1_container_102 su -P su: failed to create pseudo-terminal: Operation not permitted openqaworker1:~ # podman exec -i openqaworker1_container_103 su -P Error: exec failed: unable to start container process: error adding pid 10265 to cgroups: failed to write 10265: openat2 /sys/fs/cgroup/unified/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope/cgroup.procs: no such file or directory: OCI runtime attempted to invoke a command that was not found I didn't know where to begin looking, so I started at the bottom and used systemtap to trace where openat gets its error code from. I traced the permission issue down to the devices cgroup (v1). Some information about the relevant cgroups: container 101 (restarted, works): openqaworker1:~ # podman inspect openqaworker1_container_101 | grep CgroupPath "CgroupPath": "/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope" openqaworker1:~ # cat /sys/fs/cgroup/devices/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope/devices.list c 10:200 rwm c 5:2 rwm c 5:0 rwm c 1:9 rwm c 1:8 rwm c 1:7 rwm c 1:5 rwm c 1:3 rwm b *:* m c *:* m c 136:* rwm openqaworker1:~ # podman inspect openqaworker1_container_101 | grep -i pid "Pid": 9426, "ConmonPid": 9413, "ConmonPidFile": "/var/run/containers/storage/btrfs-containers/955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812/userdata/conmon.pid", "PidFile": "", "PidMode": "private", "PidsLimit": 2048, openqaworker1:~ # cat /proc/9426/cgroup 13:blkio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 12:perf_event:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 11:devices:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 10:pids:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 9:memory:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 8:rdma:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 7:misc:/ 6:cpuset:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 5:net_cls,net_prio:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 4:freezer:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 3:cpu,cpuacct:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 2:hugetlb:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 1:name=systemd:/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope 0::/machine.slice/libpod-955b615b984df586c92fdc7177ab4a8338bdbad109c3b9fc151ec90e7f420812.scope container 102 (permission issue, device entries missing): openqaworker1:~ # podman inspect openqaworker1_container_102 | grep CgroupPath "CgroupPath": "/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope" openqaworker1:~ # cat /sys/fs/cgroup/devices/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope/devices.list c 1:3 rwm c 1:5 rwm c 1:7 rwm c 1:8 rwm c 1:9 rwm c 5:0 rwm c 5:2 rwm c 10:200 rwm openqaworker1:~ # cat /proc/17525/cgroup 13:blkio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 12:perf_event:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 11:devices:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 10:pids:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 9:memory:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 8:rdma:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 7:misc:/ 6:cpuset:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 5:net_cls,net_prio:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 4:freezer:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 3:cpu,cpuacct:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 2:hugetlb:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 1:name=systemd:/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope 0::/machine.slice/libpod-bb7f7bed785fed4e77244d316cea6ee21ba9e6a26b609f02c894430f72beb3eb.scope container 103 (cgroup missing?): openqaworker1:~ # podman inspect openqaworker1_container_103 | grep -i pid "Pid": 18167, "ConmonPid": 18154, "ConmonPidFile": "/var/run/containers/storage/btrfs-containers/6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0/userdata/conmon.pid", "PidFile": "", "PidMode": "private", "PidsLimit": 2048, openqaworker1:~ # cat /proc/18167/cgroup 13:blkio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 12:perf_event:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 11:devices:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 10:pids:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 9:memory:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 8:rdma:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 7:misc:/ 6:cpuset:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 5:net_cls,net_prio:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 4:freezer:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 3:cpu,cpuacct:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 2:hugetlb:/machine.slice/libpod-6c5af02df66206caa2b364013a6ef4f8a6add7206beed39d7a2d85bfd0bfc1c0.scope 1:name=systemd:/system.slice/container-openqaworker1_container_103.service 0::/system.slice/container-openqaworker1_container_103.service It's visible that in the case of container 102, the device cgroup entries "b *:* m" and "c *:* m" got removed somehow and for container 103, the cgroup changed completely (system.slice/*.service instead of machine.slice/libpod-*). bug 1178775 sounds similar. I guess systemd is somehow interfering with podman?