On 27.10.2017 17:41, Jan Engelhardt wrote:
I am not convinced that changing the rpm default helps. FWIW, a large portion of source files could be ISO-8859-1 (Windows still is a thing, and a popular one at that) so that LC=UTF-8 on a global scale would not help.
Source files are not the issue here though. (and FWIW Python 3 compatible source files tend to be modern enough to either a) be UTF8 or b) have the PEP263 encoding header) The issue is that Python 3 needs to know the encoding of *all* external inputs (e.g., documentation files to be parsed by a doc generator) because its stores strings as Unicode internally. So you'd need a BOM mark on *every* file potentially touched by Python, ever, and also somehow mark the encoding of stdin, stdout and such. (so maybe an environment variable? ;) )
So there you have it. If Python falls over on UTF-8 files (I know Perl would), then those source files should say they are UTF-8. And those that are ISO-8859-1 should say they are iso-8859-1.
Alternately we could say that UTF-8 is the distro default, and only non-default files must be marked. Which is what I'm proposing.