[opensuse-factory] Unicode support differs between python and python3
Hi all, I noticed a difference in unicode support between python and python3. The first one is built with ucs4 support, the later with ucs2 support. That means python3 behaves differently with characters up to 65535 (2**16-1), because with UCS2 support it returns two characters. $ python -c 'print(len(unichr((2**16-1))))' 1 $ python -c 'print(len(unichr((2**16+10))))' 1 $ python3 -c 'print(len(chr((2**16-1))))' 1 $ python3 -c 'print(len(chr((2**16+10))))' 2 I am not sure if this difference is intentional or not, so I'm asking here instead of bugzilla. Maybe this is intended, because you save 50% of memory for most of python programs and you need to port your code to python3, so it's not a big issue I'd say - see PEP 261 [1] for details. If so, then this should be mentioned in README.SUSE. I did a quick review of other distributions: * Debian use UCS4 from 2.3 and most probably in python3 as well - [2] * Fedora use USC2 in python [3] and UCS4 (I assume --with-wide-unicode is UCS4) in python3 [4] What's your opinion here? [1] http://www.python.org/dev/peps/pep-0261/ [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=171062 [3] http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=blob_plain;f=python.spe... [4] http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=blob_plain;f=python3.s... Regards Michal Vyskocil
hi, Dne 5.5.2011 08:55, Michal Vyskocil napsal(a):
I am not sure if this difference is intentional or not, so I'm asking here instead of bugzilla. Maybe this is intended, because you save 50% of memory for most of python programs and you need to port your code to python3, so it's not a big issue I'd say - see PEP 261 [1] for details. If so, then this should be mentioned in README.SUSE.
I did a quick review of other distributions: * Debian use UCS4 from 2.3 and most probably in python3 as well - [2] * Fedora use USC2 in python [3] and UCS4 (I assume --with-wide-unicode is UCS4) in python3 [4]
AFAICT, Fedora did use UCS4 in python 2.x too. I originally wanted to use UCS2, because of the memory savings and also because it's upstream's default. But i think that cross-distro compatibility is more important, so i'm switching to UCS4. Anyone who objects, please speak now or forever remain silent ;) regards m. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
participants (2)
-
Jan Matějek
-
Michal Vyskocil