Hi, Am Donnerstag, 6. Mai 2021, 16:57:17 CEST schrieb Jimmy Berry:
On Wednesday, May 5, 2021 3:00:06 PM CDT Thorsten Kukuk wrote:
- Is the bug against TW since it updated a package such that the container would be incompatible with other hosts such as Leap and thus break the fundamental feature of a TW container.
No, the change in TW was correct.
Alright, so TW containers are not for serious workloads since incredibly common functionality can be "correctly" broken for months. Understood, thanks.
The "correctness" here is not about nitpicking interpretations of some standard, it's literally this: faccessat2(AT_FDCWD, "/", R_OK, 0) returns -EPERM for everything! That applies to *all* syscalls introduced after a certain point in the past and thus only gets worse in the future. For non-coders: It's like running on a CPU with some pins cut off.
TW blocks for all sorts of ring failures caused by upgrades, many times "correct" to one package that exposes a bug in another. The whole point of QA and our promise to users is that openSUSE does not release a borked TW intentionally. Apparently this does not apply to the containers and should be stated somewhere.
It does. openQA currently fully tests Tumbleweed containers only on Tumbleweed, and that always worked. Leap and SLE also perform some tests with the Tumbleweed container, but those didn't actually hit this particular bug as they don't go deep enough. I filed a ticket about testing TW containers on other distros a while ago: https://progress.opensuse.org/issues/88822 However, this was caught pretty soon after the release, so while we would've been aware of the breakage before publishing the affected snapshot, it would still require a fix: a) If caught before publishing, revert the breaking change? This was caused by a glibc version update, which is rather painful to revert. b) Apply a workaround to glibc? The maintainer was *vehemently* opposed to that option, so that was the end of that. AFAICT a temporary workaround at least for x86_64 would've been feasible, especially considering the severity and affected platforms. c) Block Tumbleweed publishing until the runtime is fixed? While this is potentially an option for Leap and SLE because we can handle it ourselves (or at least influence it in a meaningful way), this is not really the case for third-party platforms like GitHub. I would also blame the container runtime and providers there, because the TW container did everything correctly. The nature of the buggy behaviour in the runtime also made it rather tricky to address, because it was actually a whole set of syscalls which were broken in totally unexpected ways. Unfortunately it took quite some time to get this fixed upstream (but before this landed in TW) and even longer to actually have it deployed in third-party environments. Other distributions using glibc 2.33 versions were affected in the exact same way. Cheers, Fabian