Re: [opensuse-packaging] RFC: changing RPM's default scriptlet locale

27 Oct 2017

      On Freitag, 27. Oktober 2017 17:41:23 CEST Jan Engelhardt wrote:
...
On Friday 2017-10-27 16:13, jan matejek wrote:
...
more and more packages need their locale to be set to something more
sensible than C. This hit me while switching packages over to Python
3. Python gets its encoding from locale, so by default, it won't
decode UTF-8 unless the appropriate encoding is set.
I am not convinced that changing the rpm default helps. FWIW, a large
portion of source files could be ISO-8859-1 (Windows still is a
thing, and a popular one at that) so that LC=UTF-8 on a global scale
would not help.
The only real fix is to use an in-file marker such that the file
becomes self-describing, and there are sufficient examples in history
how to pull that off:
- Byte Order Mark to determine UTF-8, UTF-{16,32}{BE,LE}
Byte order mark is ambiguous - it is three bytes, which are valid codepoints 
in e.g. ISO-8859-1. Granted, it is unlikely, but ...
...
- <?xml encoding="..." ?> in XML
 - <meta> in HTML likewise
 - "use utf8" in Perl (or something similar to it)
 - "# -*- coding: utf-8 -*-" in Python (PEP-0263)
All these guarantee the content up to and including the encoding specification 
is plain ASCII, so this is completely unambiguous and sane.
...
So there you have it. If Python falls over on UTF-8 files (I know
Perl would), then those source files should say they are UTF-8. And
those that are ISO-8859-1 should say they are iso-8859-1.
Keeping the locale at C would at least identify the important
spots as python would stop execution.
Seconded, most utf-8 documents are also valid when interpreted as iso-8859-x, 
so guessing is a bad idea.

Kind regards,

Stefan

-- 
To unsubscribe, e-mail: opensuse-packaging+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-packaging+owner@opensuse.org

Re: [opensuse-packaging] RFC: changing RPM's default scriptlet locale

Brüns, Stefan