Re: Tumbleweed - Move to x86-64-v2 (plus mitigation plan and call for help)
Hi Dominique, sorry for dropping the mailing list from the reply, fixed now again. Am Mi., 7. Dez. 2022 um 18:10 Uhr schrieb Dominique Leuenberger / DimStar <dimstar@opensuse.org>:
Sure - it's a wiki and can only benefit from more input.
I added a pro/cons list from my point of view and expanded a bit on the baselibs idea.
The proposal I favor so far is the 'possible option' - i.e allow to build everything (or as much as we want) proprely as a different architecture and get this into rpm/zypp.
I agree that this sounds like the "cleanest" approach, but it uses a hammer for something where not necessarily a hammer is needed, and it has therefore also quite some drawbacks. In addition, this approach has been rejected already previously by rpm upstream, I added the link to that discussion to your wiki page. Also it has a number of downsides, and myself being one of the openSUSE maintainers with the experience on armv6l/armv7l, there is a TON of software that breaks when rpm architecture is not matching the kernel architecture (aka when $(uname -m) != %arch, which is the case for armv7l <-> armv7hl). Maintaining this and getting those fixes upstream is anything but fun, and in this case it only worked because other distributions (fedora, ubuntu/debian) had the same issue, so everyone pushed the various upstreams to accept those patches. And yet, such an assumption is constantly creeping into the code requiring a constant battle of fixes. Also it breaks base interoperability requirements, especially when targetting -v3 and -v4 and thinking about containerized / cloud deployments, where cpu features can change between restart or reboot.
mls has already been lined up and thinks it should be rather easy (for the rpm and obs part)
Yes, that's true for zypper/libzypp. However in the non-openSUSE context of SUSE distributions often 3rd party software is used for systems management, and those need to be adopted for this as well. and many of those have hardcoded assumptions or false assumptions (like $(uname -m) being the rpmarchitecture). ; zypp
itself already has code to treat arch x>y but compatible (i586/i686) - si if we get x86_64<x86_64-v3, we're all set there (plus code to detect what machine you have)
with the downsides that it requires patching of about 1000 packages (%ifarch x86_64 -> %ifarch %x86_64) and not providing coinstallability, meaning you can't have single-installation media/machine images, requiring the user to choose which version works across their data centers and clouds (and hope that they do not use anything like kubernetes that just moves around workload) plus requiring full builds of the tumbleweed distribution (while technically we could build smaller ones for the higher optimized versions, that would mean mix-match deployments on the user side and on our installation medias, increasing size requirements and complexity) Greetings, Dirk
Hi, Am Mittwoch, 7. Dezember 2022, 22:22:42 CET schrieb Dirk Müller:
Hi Dominique,
sorry for dropping the mailing list from the reply, fixed now again.
Am Mi., 7. Dez. 2022 um 18:10 Uhr schrieb Dominique Leuenberger / DimStar <dimstar@opensuse.org>:
Sure - it's a wiki and can only benefit from more input.
I added a pro/cons list from my point of view and expanded a bit on the baselibs idea.
Thanks, that's helpful.
The proposal I favor so far is the 'possible option' - i.e allow to build everything (or as much as we want) proprely as a different architecture and get this into rpm/zypp.
I agree that this sounds like the "cleanest" approach, but it uses a hammer for something where not necessarily a hammer is needed, and it has therefore also quite some drawbacks.
FWICT this problem looks very much like a nail.
In addition, this approach has been rejected already previously by rpm upstream, I added the link to that discussion to your wiki page.
Not for x86_64_vX, that PR is about CPU family specific architectures like znver1. With glibc, gcc etc. supporting x86_64_vX through hwcaps, I think they'd accept it.
Also it has a number of downsides, and myself being one of the openSUSE maintainers with the experience on armv6l/armv7l, there is a TON of software that breaks when rpm architecture is not matching the kernel architecture (aka when $(uname -m) != %arch, which is the case for armv7l <-> armv7hl).
Do you have some examples? I imagine this only hits software which has to interact with RPM as well as the kernel.
Maintaining this and getting those fixes upstream is anything but fun, and in this case it only worked because other distributions (fedora, ubuntu/debian) had the same issue, so everyone pushed the various upstreams to accept those patches. And yet, such an assumption is constantly creeping into the code requiring a constant battle of fixes.
Also it breaks base interoperability requirements, especially when targetting -v3 and -v4 and thinking about containerized / cloud deployments, where cpu features can change between restart or reboot.
Having x86_64_vX arch in RPM doesn't mean that coinstallability is impossible. We could ship the hwcaps libraries as .x86_64_vX.rpms. For container/cloud deployments I would actually expect those images to be pinned to a specific x86_64 level from the beginning to ensure predictable behaviour. It's no fun to hit some issue only randomly after an instance ran on newer hardware.
mls has already been lined up and thinks it should be rather easy (for the rpm and obs part)
Yes, that's true for zypper/libzypp. However in the non-openSUSE context of SUSE distributions often 3rd party software is used for systems management, and those need to be adopted for this as well. and many of those have hardcoded assumptions or false assumptions (like $(uname -m) being the rpmarchitecture).
Do you have an example?
; zypp
itself already has code to treat arch x>y but compatible (i586/i686) - si if we get x86_64<x86_64-v3, we're all set there (plus code to detect what machine you have)
with the downsides that it requires patching of about 1000 packages (%ifarch x86_64 -> %ifarch %x86_64)
Most of those packages wouldn't work with the hwcaps-only approach at all because they include binaries.
and not providing coinstallability, meaning you can't have single-installation media/machine images, requiring the user to choose which version works across their data centers and clouds (and hope that they do not use anything like kubernetes that just moves around workload)
(see above)
plus requiring full builds of the tumbleweed distribution (while technically we could build smaller ones for the higher optimized versions, that would mean mix-match deployments on the user side and on our installation medias, increasing size requirements and complexity)
Yeah, the huge flexibility means we have a lot of open options. At that point we already arrive at "implementation details" though, something we can even change later. Cheers, Fabian
Greetings, Dirk
Hi Fabian, Am Do., 8. Dez. 2022 um 08:51 Uhr schrieb Fabian Vogt <fvogt@suse.de>:
Not for x86_64_vX, that PR is about CPU family specific architectures like znver1. With glibc, gcc etc. supporting x86_64_vX through hwcaps, I think they'd accept it.
We'll see. on top of getting it into rpm upstream, it would be good to find consensus on the cross-distro mailing list with other distributions on the approach of exposing microarchitecture levels as architecture
Do you have some examples? I imagine this only hits software which has to interact with RPM as well as the kernel.
Yes, so those softwares (zypper, dnf, yum, you name it) as well as anything that has to deduce a cpu architecture for build flags (so configure checks). Also all compilers (rust, golang, ...) as well as anything that does its own selection of which software to pick.
Also it breaks base interoperability requirements, especially when targetting -v3 and -v4 and thinking about containerized / cloud deployments, where cpu features can change between restart or reboot. Having x86_64_vX arch in RPM doesn't mean that coinstallability is impossible. We could ship the hwcaps libraries as .x86_64_vX.rpms.
You're right we can combine both options with little extra effort. The main advantage of the hwcaps proposal is that it does not require a new architecture handling in rpm and all other places. If we pursue doing that path anyway, we can combine in the hwcaps option to have both benefits combined.
For container/cloud deployments I would actually expect those images to be pinned to a specific x86_64 level from the beginning to ensure predictable behaviour. It's no fun to hit some issue only randomly after an instance ran on newer hardware.
Yes, and that pinned level has to be fairly low, like -v2 or -v1. and if we only build -v1 and v3, then it would be -v1, hence no improvement whatsoever. I think that is a strong reason in favor of not going that way. You're right that "weird issues only happening on certain hardware" is what we sign up for with *any* of the options that enable higher architecture levels. And that's part of the value proposition of a linux distribution is to catch those before they happen for the user and fix them ;) And we have that already today (like software compiling itself differently whether the build worker is intel or AMD x86_64 and so on). We already found with the "-v3 rebuild" in staging we had a few months ago very principal breakages. e.g. all the code that is assuming that "default" compiler flags produce a very compatible result and that build by themselves higher optimized versions were broken, because the compat plugin and the optimized plugin ended up being the same code (which we only caught because of testsuites breaking). By going the hwcaps route, at least you have a quick way to troubleshoot: does the problem go away by deinstalling the -v3 package? in other cases it might be more difficult
many of those have hardcoded assumptions or false assumptions (like $(uname -m) being the rpmarchitecture). Do you have an example?
anything that determines a target triplet in the various spec files. Also all image creation tools often were struggling with this, or were breaking in unexpected ways (when handling grub and other things) e.g. things like bootloader installation / configuration code.
with the downsides that it requires patching of about 1000 packages (%ifarch x86_64 -> %ifarch %x86_64) Most of those packages wouldn't work with the hwcaps-only approach at all because they include binaries.
They work just fine as long as the bulk of the performance gain is in code in the shared library and not in the binary. I don't know if something like hwcaps works for PIE executables also, that would be kinda neat... Greetings,Dirk
participants (2)
-
Dirk Müller
-
Fabian Vogt