https://bugzilla.suse.com/show_bug.cgi?id=1186256
Bug ID: 1186256 Summary: qemu-linux-user: hardcoded binfmt handler doesn't play well with containers Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: KVM Assignee: kvm-bugs@suse.de Reporter: martin.wilck@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: ---
Created attachment 849481 --> https://bugzilla.suse.com/attachment.cgi?id=849481&action=edit Proposed patch for qemu-binfmt-conf.sh
Since abbc0ce ("qemu-binfmt-conf: use qemu-ARCH-binfmt"), qemu-binfmt-conf.sh under openSUSE automatically replaces the default qemu binfmt wrapper "qemu-$ARCH" with "qemu-$ARCH-binfmt" in order to ensure that argv[0] is preserved; qemu-$ARCH-binfmt is a link to qemu-binfmt, which is just a simple wrapper that mangles argv to achieve the desired result. This is a SUSE-specific modification which isn't used upstream.
This approach is inconvenient in some situations. In particular for running foreign-arch containers, it's useful to use the binfmt_misc "F" ("fix binary") flag to pre-load the qemu wrapper in the kernel. That way, foreign-arch containers can be run just like native containers, without having to bind-mount interpreters into the container. But that's impossible with the SUSE binfmt wrapper that needs to exec() a different (native) executable.
In the openSUSE default mode of qemu-binfmt-conf.sh, the user needs to bind-mount both the -binfmt executable and the actual emulator into the container:
$ podman run -it --rm \ -v /usr/bin/qemu-ppc64le-binfmt:/usr/bin/qemu-ppc64le-binfmt \ -v /usr/bin/qemu-ppc64le:/usr/bin/qemu-ppc64le \ ppc64le/busybox uname -m ppc64le
Otherwise, he gets
$ podman run -t --rm ppc64le/busybox uname -m standard_init_linux.go:219: exec user process caused: no such file or directory
If qemu-binfmt-conf.sh is used with the --persistent flag, qemu-ppc64le-binfmt is loaded into the kernel, but qemu-ppc64le must still be bind-mounted. If qemu-ppc64le was used directly as persistent binfmt_misc helper, it would be sufficient to run the container as if it was a native one:
$ podman run -it --rm ppc64le/busybox uname -m ppc64le
I can see why it makes sense to try to preserve argv[0], but for me at least, the "foreign container" use case is more important. Therefore I'd like to be able to switch the behavior of the qemu binfmt_misc helper back to the upstream default.
So far I've worked around the issue by simply using the upstream container "docker.io/multiarch/qemu-user-static", but I'd like to be able to do this easily with openSUSE on-board tools.
The attached patch allows the user to override the default "-binfmt" suffix by running "qemu-binfmt-conf.sh --qemu-suffix ''".
(Note: "qemu-binfmt-conf.sh -F ''" doesn't work, that's a different issue).
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Martin Wilck martin.wilck@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dfaggioli@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Martin Wilck martin.wilck@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |schwab@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c1
Martin Wilck martin.wilck@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #849481|0 |1 is obsolete| |
--- Comment #1 from Martin Wilck martin.wilck@suse.com --- Created attachment 849483 --> https://bugzilla.suse.com/attachment.cgi?id=849483&action=edit Proposed patch for qemu-binfmt-conf.sh
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c2
--- Comment #2 from Martin Wilck martin.wilck@suse.com --- wrt "-F", I just posted a patch to qemu-devel, subject "qemu-binfmt-conf.sh: fix -F option".
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c3
--- Comment #3 from Martin Wilck martin.wilck@suse.com --- Note: I tried to create an OBS request with these two patches, but I failed to make update_git.sh work.
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Jos� Ricardo Ziviani jose.ziviani@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jose.ziviani@suse.com Assignee|dfaggioli@suse.com |jose.ziviani@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c4
--- Comment #4 from Jos� Ricardo Ziviani jose.ziviani@suse.com --- Hello Martin,
Just added your patch in our stage repo (https://build.opensuse.org/package/revisions/Virtualization/qemu). I'll send a SR to Factory as soon as they finish the QEMU v6.1 update. (https://build.opensuse.org/request/show/914458).
Thank you!
Jose
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c5
--- Comment #5 from Martin Wilck martin.wilck@suse.com --- Great, thank you!
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Matej Cepl mcepl@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mcepl@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c7
--- Comment #7 from Martin Wilck martin.wilck@suse.com --- Jos�,
we're not there yet because an upstream bot rejected my -F patch (comment 2) because of a style issue which was definitely not my fault. The overlong line was there before my patch already. I never got this reply (spam folder? no idea), so I was also never able to fix this non-issue.
https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06012.html
I'll re-post the patch and cc you. I'd be glad if you could pull it into opensuse before upstream gets to it.
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c8
--- Comment #8 from Jos� Ricardo Ziviani jose.ziviani@suse.com --- (In reply to Martin Wilck from comment #7)
Jos�,
we're not there yet because an upstream bot rejected my -F patch (comment 2) because of a style issue which was definitely not my fault. The overlong line was there before my patch already. I never got this reply (spam folder? no idea), so I was also never able to fix this non-issue.
https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06012.html
I'll re-post the patch and cc you. I'd be glad if you could pull it into opensuse before upstream gets to it.
Hello Martin,
Sure, I'll add it here.
By the way, your -F patch is in Factory, should be available in this next update.
Thanks
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Claudio Fontana claudio.fontana@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |claudio.fontana@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Jos� Ricardo Ziviani jose.ziviani@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|jose.ziviani@suse.com |kvm-bugs@suse.de
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c13
--- Comment #13 from Martin Wilck martin.wilck@suse.com --- The upstream v2 submission fell through the cracks again, it seems. Trying once more. Perhaps an acked-by: of one of you guys might help...
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c14
--- Comment #14 from Martin Wilck martin.wilck@suse.com --- Laurent has reviewed my -F patch now ... https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg05530.html
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c15
--- Comment #15 from Martin Wilck martin.wilck@suse.com --- But FTR, the patch from comment 1 is not yet in factory (qemu-linux-user-6.1.0-34.1.x86_64).
https://bugzilla.suse.com/show_bug.cgi?id=1186256
Dario Faggioli dfaggioli@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|kvm-bugs@suse.de |dfaggioli@suse.com
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c18
Martin Wilck martin.wilck@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(martin.wilck@suse | |.com) |
--- Comment #18 from Martin Wilck martin.wilck@suse.com --- Thanks, I finally start to understand. I have to say I only partially understood Laurent's response so far.
If Alex' patch is dropped, my patch from comment 1 almost certaintly won't be necessary any more.
I don't care about preserving argv[0]. All I'm interested in is not to have to bind-mount a qemu executable into foreign arch containers. But if you drop Alex' patch, you may have to talk to some of the people who are interested in argv[0] preservation.
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c22
Martin Wilck martin.wilck@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(martin.wilck@suse | |.com) |
--- Comment #22 from Martin Wilck martin.wilck@suse.com --- What I want to achieve (being able to simply start a foreign-arch container without having to bind-mount anything from the native environment into it) only works with with "fix binary" settings, where the statically linked interpreter binary is loaded into the kernel (--persistent flag of qemu-binfmt-conf.sh, "F" flag in the kernel).
So you need to run e.g.
qemu-binfmt-conf.sh --systemd s390x --persistent yes --qemu-suffix ""
to make this work. The result looks like this:
# cat /proc/sys/fs/binfmt_misc/qemu-s390x enabled interpreter /usr/bin/qemu-s390x flags: PF offset 0 magic 7f454c4602020100000000000000000000020016 mask ffffffffffffff00fffffffffffffffffffeffff
Hope this makes sense.
(In reply to Dario Faggioli from comment #21)
Mmm... I also see this:
virt136:~ # ls /usr/bin/qemu-ppc64le* -l -rwxr-xr-x 1 root root 3940664 Dec 6 14:35 /usr/bin/qemu-ppc64le lrwxrwxrwx 1 root root 11 Dec 6 14:33 /usr/bin/qemu-ppc64le-binfmt -> qemu-binfmt
This is the normal SUSE setup.
https://bugzilla.suse.com/show_bug.cgi?id=1186256 https://bugzilla.suse.com/show_bug.cgi?id=1186256#c24
Dario Faggioli dfaggioli@suse.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #24 from Dario Faggioli dfaggioli@suse.com --- SR 936373 (https://build.opensuse.org/request/show/936373) is in Factory now, and it had both the patches, and according to my tests, things work as wanted now, so I'm closing this.
Thanks for the patches and for the help reproducing and debugging this!