Bug ID | 1186256 |
---|---|
Summary | qemu-linux-user: hardcoded binfmt handler doesn't play well with containers |
Classification | openSUSE |
Product | openSUSE Tumbleweed |
Version | Current |
Hardware | Other |
OS | Other |
Status | NEW |
Severity | Normal |
Priority | P5 - None |
Component | KVM |
Assignee | kvm-bugs@suse.de |
Reporter | martin.wilck@suse.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
Created attachment 849481 [details] Proposed patch for qemu-binfmt-conf.sh Since abbc0ce ("qemu-binfmt-conf: use qemu-ARCH-binfmt"), qemu-binfmt-conf.sh under openSUSE automatically replaces the default qemu binfmt wrapper "qemu-$ARCH" with "qemu-$ARCH-binfmt" in order to ensure that argv[0] is preserved; qemu-$ARCH-binfmt is a link to qemu-binfmt, which is just a simple wrapper that mangles argv to achieve the desired result. This is a SUSE-specific modification which isn't used upstream. This approach is inconvenient in some situations. In particular for running foreign-arch containers, it's useful to use the binfmt_misc "F" ("fix binary") flag to pre-load the qemu wrapper in the kernel. That way, foreign-arch containers can be run just like native containers, without having to bind-mount interpreters into the container. But that's impossible with the SUSE binfmt wrapper that needs to exec() a different (native) executable. In the openSUSE default mode of qemu-binfmt-conf.sh, the user needs to bind-mount both the -binfmt executable and the actual emulator into the container: > $ podman run -it --rm \ > -v /usr/bin/qemu-ppc64le-binfmt:/usr/bin/qemu-ppc64le-binfmt \ > -v /usr/bin/qemu-ppc64le:/usr/bin/qemu-ppc64le \ > ppc64le/busybox uname -m > ppc64le Otherwise, he gets > $ podman run -t --rm ppc64le/busybox uname -m > standard_init_linux.go:219: exec user process caused: no such file or directory If qemu-binfmt-conf.sh is used with the --persistent flag, qemu-ppc64le-binfmt is loaded into the kernel, but qemu-ppc64le must still be bind-mounted. If qemu-ppc64le was used directly as persistent binfmt_misc helper, it would be sufficient to run the container as if it was a native one: > $ podman run -it --rm ppc64le/busybox uname -m > ppc64le I can see why it makes sense to try to preserve argv[0], but for me at least, the "foreign container" use case is more important. Therefore I'd like to be able to switch the behavior of the qemu binfmt_misc helper back to the upstream default. So far I've worked around the issue by simply using the upstream container "docker.io/multiarch/qemu-user-static", but I'd like to be able to do this easily with openSUSE on-board tools. The attached patch allows the user to override the default "-binfmt" suffix by running "qemu-binfmt-conf.sh --qemu-suffix ''". (Note: "qemu-binfmt-conf.sh -F ''" doesn't work, that's a different issue).