I am using Unicode in a C++ program and found out that wchar_t uses 4 bytes instead of 2 bytes, which would be sufficient. Probably wchar_t is defined as int for 64 bit platforms, I haven't looked it up in the header file yet. Is there any chance to downsize wchar_t to unsigned short? Detlef
Detlef Grittner wrote:
I am using Unicode in a C++ program and found out that wchar_t uses 4 bytes instead of 2 bytes, which would be sufficient. Probably wchar_t is defined as int for 64 bit platforms, I haven't looked it up in the header file yet. Is there any chance to downsize wchar_t to unsigned short?
Except that Unicode is more than 16 bit data when stored raw. Three bytes are needed, so that UTF-32 format of 4 bytes makes a lot of sense. 2 byte UTF-16 format would mean that you have to read every character to find the character count, and character position in a Unicode string, since some characters would still be 4 bytes. We are having the same discussion on a number of other lists over Unicode :) -- Lester Caine ----------------------------- L.S.Caine Electronic Services
Lester Caine wrote:
Except that Unicode is more than 16 bit data when stored raw. Three bytes are needed, so that UTF-32 format of 4 bytes makes a lot of sense. 2 byte UTF-16 format would mean that you have to read every character to find the character count, and character position in a Unicode string, since some characters would still be 4 bytes. We are having the same discussion on a number of other lists over Unicode :)
Do you mean surrogates in UTF-16? I have never seen them in practice and I wonder if the wide character functions of the C libraries deal correctly with them if a wchar_t is 32 bit.
Detlef Grittner
I am using Unicode in a C++ program and found out that wchar_t uses 4 bytes instead of 2 bytes, which would be sufficient. Probably wchar_t is defined as int for 64 bit platforms, I haven't looked it up in the header file yet.
glibc defines wchar_t to 4 bytes for all platforms,
Is there any chance to downsize wchar_t to unsigned short?
No, Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
participants (3)
-
Andreas Jaeger
-
Detlef Grittner
-
Lester Caine