|Summary||qemu-linux-user: hardcoded binfmt handler doesn't play well with containers|
|Priority||P5 - None|
Created attachment 849481 [details] Proposed patch for qemu-binfmt-conf.sh Since abbc0ce ("qemu-binfmt-conf: use qemu-ARCH-binfmt"), qemu-binfmt-conf.sh under openSUSE automatically replaces the default qemu binfmt wrapper "qemu-$ARCH" with "qemu-$ARCH-binfmt" in order to ensure that argv is preserved; qemu-$ARCH-binfmt is a link to qemu-binfmt, which is just a simple wrapper that mangles argv to achieve the desired result. This is a SUSE-specific modification which isn't used upstream. This approach is inconvenient in some situations. In particular for running foreign-arch containers, it's useful to use the binfmt_misc "F" ("fix binary") flag to pre-load the qemu wrapper in the kernel. That way, foreign-arch containers can be run just like native containers, without having to bind-mount interpreters into the container. But that's impossible with the SUSE binfmt wrapper that needs to exec() a different (native) executable. In the openSUSE default mode of qemu-binfmt-conf.sh, the user needs to bind-mount both the -binfmt executable and the actual emulator into the container: > $ podman run -it --rm \ > -v /usr/bin/qemu-ppc64le-binfmt:/usr/bin/qemu-ppc64le-binfmt \ > -v /usr/bin/qemu-ppc64le:/usr/bin/qemu-ppc64le \ > ppc64le/busybox uname -m > ppc64le Otherwise, he gets > $ podman run -t --rm ppc64le/busybox uname -m > standard_init_linux.go:219: exec user process caused: no such file or directory If qemu-binfmt-conf.sh is used with the --persistent flag, qemu-ppc64le-binfmt is loaded into the kernel, but qemu-ppc64le must still be bind-mounted. If qemu-ppc64le was used directly as persistent binfmt_misc helper, it would be sufficient to run the container as if it was a native one: > $ podman run -it --rm ppc64le/busybox uname -m > ppc64le I can see why it makes sense to try to preserve argv, but for me at least, the "foreign container" use case is more important. Therefore I'd like to be able to switch the behavior of the qemu binfmt_misc helper back to the upstream default. So far I've worked around the issue by simply using the upstream container "docker.io/multiarch/qemu-user-static", but I'd like to be able to do this easily with openSUSE on-board tools. The attached patch allows the user to override the default "-binfmt" suffix by running "qemu-binfmt-conf.sh --qemu-suffix ''". (Note: "qemu-binfmt-conf.sh -F ''" doesn't work, that's a different issue).