RFC: changing default build environment to use only glibc-locale-base

Hi all, Some time ago I noticed that Fedora is building all their packages with very minimal locales installed quite successfully. In fedora, if there is the occasional packaging testsuite that requires a specific locale being installed, Fedora just uses "BuildRequires: glibc-langpack-$lang". openSUSE currently doesn't have this granularity. we have glibc-locale-base, which covers C and en_US locale and we have glibc-locale which covers everything else. The vast majority of packages are building and testing fine in the default C.utf8/en_US.utf8 call. By switching to glibc-locale-base, we're saving about 227 MB of data to pull and install on every build. That's like 2TB of data less written for a full distro build. if you're behind crappy internet like a german citizen would be, then you love the speedup. Any comments, concerns? Yes I'm aware a few packages will fail to build. I've done a full rebuild of :Ring:0-Bootstrap/1-Minimalx and had about 10 build failures, and submitted already fixes for all of them. I submitted this to rpm maintainer for review, and he's fine if the opensuse tumbleweed distro team accepts the challenge. I'm happy to help with the fallout, the simplest fix is an additional "Buildrequires: glibc-locale". In some other cases it just requires to change the locale that is exported to a locale that is provided by glibc-locale-base https://build.opensuse.org/request/show/949701 Comments, feedback, objections? Thanks, Dirk

Hi Dirk,
-----Original Message----- From: Dirk Müller <dirk@dmllr.de> Sent: 28 January 2022 15:43 To: factory@lists.opensuse.org Subject: RFC: changing default build environment to use only glibc-locale-base
Hi all,
Some time ago I noticed that Fedora is building all their packages with very minimal locales installed quite successfully. In fedora, if there is the occasional packaging testsuite that requires a specific locale being installed, Fedora just uses "BuildRequires: glibc-langpack-$lang".
openSUSE currently doesn't have this granularity. we have glibc-locale-base, which covers C and en_US locale and we have glibc-locale which covers everything else.
The vast majority of packages are building and testing fine in the default C.utf8/en_US.utf8 call. By switching to glibc-locale-base, we're saving about 227 MB of data to pull and install on every build. That's like 2TB of data less written for a full distro build. if you're behind crappy internet like a german citizen would be, then you love the speedup.
Any comments, concerns? Yes I'm aware a few packages will fail to build. I've done a full rebuild of :Ring:0-Bootstrap/1-Minimalx and had about 10 build failures, and submitted already fixes for all of them.
I submitted this to rpm maintainer for review, and he's fine if the opensuse tumbleweed distro team accepts the challenge. I'm happy to help with the fallout, the simplest fix is an additional "Buildrequires: glibc-locale". In some other cases it just requires to change the locale that is exported to a locale that is provided by glibc-locale-base
https://build.opensuse.org/request/show/949701
Comments, feedback, objections?
That sounds good! It will likely help to speed-up builds. Thanks, Guillaume
Thanks, Dirk

On Fri, Jan 28, 2022 at 11:43 AM Dirk Müller <dirk@dmllr.de> wrote: < if
you're behind crappy internet like a german citizen would be, then you love the speedup.
I am behind fast internet but osc seems to talk to snail mirrors so this change actually helps me too.
Any comments, concerns? Yes I'm aware a few packages will fail to build. I've done a full rebuild of :Ring:0-Bootstrap/1-Minimalx and had about 10 build failures, and submitted already fixes for all of them.
Huh.. shouldn't the locale be C on every build step ?

On Fri, Jan 28, 2022 at 4:37 PM Cristian Rodríguez <cristian@rodriguez.im> wrote:
On Fri, Jan 28, 2022 at 11:43 AM Dirk Müller <dirk@dmllr.de> wrote: < if
you're behind crappy internet like a german citizen would be, then you love the speedup.
I am behind fast internet but osc seems to talk to snail mirrors so this change actually helps me too.
Any comments, concerns? Yes I'm aware a few packages will fail to build. I've done a full rebuild of :Ring:0-Bootstrap/1-Minimalx and had about 10 build failures, and submitted already fixes for all of them.
Huh.. shouldn't the locale be C on every build step ?
OBS forces a weird "POSIX" locale right now, rather than C.UTF-8. That causes a bunch of problems in itself, too. :( -- 真実はいつも一つ!/ Always, there's only one truth!

On Friday 2022-01-28 22:37, Cristian Rodríguez wrote:
On Fri, Jan 28, 2022 at 11:43 AM Dirk Müller <dirk@dmllr.de> wrote: < if
you're behind crappy internet like a german citizen would be, then you love the speedup.
I am behind fast internet but osc seems to talk to snail mirrors so this change actually helps me too.
It might be more of a concern for bs_worker-type workers than it is for an osc-type worker, since osc certainly keeps a cache in /var/tmp/osbuild-packagecache.
Huh.. shouldn't the locale be C on every build step ?
Should, yes. But you need only grep for LC_CT across factory and find... httpie.spec:export LC_CTYPE=en_US.UTF-8 libmodulemd.spec:export LC_CTYPE=C.utf8 log4net.spec:export LC_CTYPE=en_US.UTF-8 procps.spec:unset LC_CTYPE python-eliot.spec:export LC_CTYPE=en_US.UTF-8 python-Flask-Gravatar.spec:export LC_CTYPE=en_US@UTF-8 python-Flask-Migrate.spec:export LC_CTYPE=en_US.UTF-8 python-nose2.spec:export LC_CTYPE=C.UTF8 readline6.spec: unset LC_CTYPE readline.spec:unset LC_CTYPE ruby2.7.spec:export LC_CTYPE="en_US.UTF-8" ruby3.0.spec:export LC_CTYPE="en_US.UTF-8" texlive.spec: echo LC_CTYPE=en_US.UTF-8 texlive.spec: echo export LANG LC_CTYPE words.spec: american*) LC_CTYPE=en_US.UTF-8 ;; words.spec: british*) LC_CTYPE=en_GB.UTF-8 ;; words.spec: canadian*) LC_CTYPE=en_CA.UTF-8 ;; this can only be described by a medium not carryable by plaintext email, https://en.meming.world/wiki/Cat_Standing_in_the_Snow Yes, especially the "@UTF-8" one.

On Fri, Jan 28, 2022 at 6:43 PM Jan Engelhardt <jengelh@inai.de> wrote:
On Friday 2022-01-28 22:37, Cristian Rodríguez wrote:
On Fri, Jan 28, 2022 at 11:43 AM Dirk Müller <dirk@dmllr.de> wrote: < if
you're behind crappy internet like a german citizen would be, then you love the speedup.
I am behind fast internet but osc seems to talk to snail mirrors so this change actually helps me too.
It might be more of a concern for bs_worker-type workers than it is for an osc-type worker, since osc certainly keeps a cache in /var/tmp/osbuild-packagecache.
Huh.. shouldn't the locale be C on every build step ?
Should, yes. But you need only grep for LC_CT across factory and find...
httpie.spec:export LC_CTYPE=en_US.UTF-8 libmodulemd.spec:export LC_CTYPE=C.utf8 log4net.spec:export LC_CTYPE=en_US.UTF-8 procps.spec:unset LC_CTYPE python-eliot.spec:export LC_CTYPE=en_US.UTF-8 python-Flask-Gravatar.spec:export LC_CTYPE=en_US@UTF-8 python-Flask-Migrate.spec:export LC_CTYPE=en_US.UTF-8 python-nose2.spec:export LC_CTYPE=C.UTF8 readline6.spec: unset LC_CTYPE readline.spec:unset LC_CTYPE ruby2.7.spec:export LC_CTYPE="en_US.UTF-8" ruby3.0.spec:export LC_CTYPE="en_US.UTF-8" texlive.spec: echo LC_CTYPE=en_US.UTF-8 texlive.spec: echo export LANG LC_CTYPE words.spec: american*) LC_CTYPE=en_US.UTF-8 ;; words.spec: british*) LC_CTYPE=en_GB.UTF-8 ;; words.spec: canadian*) LC_CTYPE=en_CA.UTF-8 ;;
this can only be described by a medium not carryable by plaintext email, https://en.meming.world/wiki/Cat_Standing_in_the_Snow Yes, especially the "@UTF-8" one.
It's because OBS does not set the environment to "C.UTF-8", but "POSIX" instead (which is an alias of the "C" locale, and an alias that not every application understands). That breaks things sometimes, so it needs to be explicitly set. -- 真実はいつも一つ!/ Always, there's only one truth!

On Fri, Jan 28, 2022 at 9:09 PM Neal Gompa <ngompa13@gmail.com> wrote:
It's because OBS does not set the environment to "C.UTF-8", but "POSIX" instead (which is an alias of the "C" locale, and an alias that not every application understands).
So, which applications are these ? I ask because I have a pyre to lit.. :-) understanding "C" or "POSIX" is mandatory... That breaks things sometimes, so
it needs to be explicitly set.
Looks like something that really needs fixing..

On Sat, Jan 29, 2022 at 10:27 AM Cristian Rodríguez <cristian@rodriguez.im> wrote:
On Fri, Jan 28, 2022 at 9:09 PM Neal Gompa <ngompa13@gmail.com> wrote:
It's because OBS does not set the environment to "C.UTF-8", but "POSIX" instead (which is an alias of the "C" locale, and an alias that not every application understands).
So, which applications are these ? I ask because I have a pyre to lit.. :-) understanding "C" or "POSIX" is mandatory...
That breaks things sometimes, so
it needs to be explicitly set.
Looks like something that really needs fixing..
No. The C/POSIX locales also disable Unicode support, which can break some applications. That's why Fedora had a patch to force Python to coerce C to C.UTF-8 before it was finally upstreamed in Python 3.8. The C locale is woefully bad and broken for modern applications, but because of POSIX, we can't change the existing C locale to be Unicode enabled, so we now have C.UTF-8. -- 真実はいつも一つ!/ Always, there's only one truth!

Hi all, Am Sa., 29. Jan. 2022 um 16:44 Uhr schrieb Neal Gompa <ngompa13@gmail.com>:
No. The C/POSIX locales also disable Unicode support, which can break some applications. That's why Fedora had a patch to force Python to coerce C to C.UTF-8 before it was finally upstreamed in Python 3.8.
This is not relevant to the question whether glibc-locale-base should be used instead of glibc-locale. I'm not changing any locale choices at this time. Greetings, Dirk

On Sat, Jan 29, 2022 at 12:26 PM Dirk Müller <dirk@dmllr.de> wrote:
Hi all,
Am Sa., 29. Jan. 2022 um 16:44 Uhr schrieb Neal Gompa <ngompa13@gmail.com>:
No. The C/POSIX locales also disable Unicode support, which can break some applications. That's why Fedora had a patch to force Python to coerce C to C.UTF-8 before it was finally upstreamed in Python 3.8.
This is not relevant to the question whether glibc-locale-base should be used instead of glibc-locale. I'm not changing any locale choices at this time.
Well, it is. In Fedora, we were able to move to glibc-minimal-langpack in our buildroot because we did work to coerce the default locale in some stacks to C.UTF-8, which eliminated the need for "regular" locales for most package builds. It started with Python and work extended from there. https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale -- 真実はいつも一つ!/ Always, there's only one truth!

Hello, Am Samstag, 29. Januar 2022, 00:42:52 CET schrieb Jan Engelhardt:
On Friday 2022-01-28 22:37, Cristian Rodríguez wrote:
Huh.. shouldn't the locale be C on every build step ?
Should, yes. But you need only grep for LC_CT across factory and find...
httpie.spec:export LC_CTYPE=en_US.UTF-8 libmodulemd.spec:export LC_CTYPE=C.utf8 log4net.spec:export LC_CTYPE=en_US.UTF-8 [...]
Most of your grep results are en_US.utf8 or C.utf8, and the good thing is that /usr/lib/locale/en_US.utf8 and /usr/lib/locale/C.utf8 are part of glibc-locale-base. So even with glibc-locale-base, these packages shouldn't have any build issues. The only exception from your list is
words.spec: american*) LC_CTYPE=en_US.UTF-8 ;; words.spec: british*) LC_CTYPE=en_GB.UTF-8 ;; words.spec: canadian*) LC_CTYPE=en_CA.UTF-8 ;;
and since there are words-british and words-canadian subpackages, I'm not too surprised that it uses en_GB and en_CA. (However, I'm surprised that this is needed in the spec.)
this can only be described by a medium not carryable by plaintext email, https://en.meming.world/wiki/Cat_Standing_in_the_Snow
I think my (random!) signature can easily beat this picture ;-)
Yes, especially the "@UTF-8" one.
Just curious - what's the difference between ".UTF-8" and "@UTF-8"? Or is "@UTF-8" just wrong and broken? (Needless to say: There's no "*@UTF-8" in /usr/lib/locale/ on my system, with the full glibc-locale installed.) Regards, Christian Boltz --
AAAAAAAAAHHHHHHHHHHHHHH *inDieTastaturBeiß* ^^^^^^^^^^^^^ Mahlzeit! [> Michael Born und Kara in opensuse-de]
(For those not speaking german: "inDieTastaturBeiß" means "biting the keyboard", and "Mahlzeit" translates to "enjoy your meal".)

On Sat, Jan 29, 2022 at 3:25 PM Christian Boltz <opensuse@cboltz.de> wrote:
Just curious - what's the difference between ".UTF-8" and "@UTF-8"? Or is "@UTF-8" just wrong and broken? (Needless to say: There's no "*@UTF-8" in /usr/lib/locale/ on my system, with the full glibc-locale installed.)
it is not a valid locale name... locale -a | grep @ for the locales where the @ symbol is valid.

Hi Dirk, On Fri, 2022-01-28 at 15:43 +0100, Dirk Müller wrote:
Hi all,
Some time ago I noticed that Fedora is building all their packages with very minimal locales installed quite successfully. In fedora, if there is the occasional packaging testsuite that requires a specific locale being installed, Fedora just uses "BuildRequires: glibc-langpack-$lang".
openSUSE currently doesn't have this granularity. we have glibc-locale-base, which covers C and en_US locale and we have glibc-locale which covers everything else.
The vast majority of packages are building and testing fine in the default C.utf8/en_US.utf8 call. By switching to glibc-locale-base, we're saving about 227 MB of data to pull and install on every build. That's like 2TB of data less written for a full distro build. if you're behind crappy internet like a german citizen would be, then you love the speedup.
Any comments, concerns? Yes I'm aware a few packages will fail to build. I've done a full rebuild of :Ring:0-Bootstrap/1-Minimalx and had about 10 build failures, and submitted already fixes for all of them.
Thank you for working on this. Leaner build env is a welcome change in almost every situation. As you're also busy on eventual fallouts, I am not worried at all to get this settled rather quickly Cheers, Dominique
participants (7)
-
Christian Boltz
-
Cristian Rodríguez
-
Dirk Müller
-
Dominique Leuenberger / DimStar
-
Guillaume Gardet
-
Jan Engelhardt
-
Neal Gompa