Xerces XMLCh UTF-16 Linux vs Reality
On the Xerces C++ mailing list I was told the following:
"XMLCh is specifically UTF-16. But wchar_t is not UTF-16 everywhere. It's
non-portable, and Solaris and Linux each use different (including from each
other) encodings for wchar_t.
There's actually a gcc option to make wchar_t the same as on Windows, it was
created for Wine. I'm not brave enough to rely on it."
The Qt documentation tells me:
<quote url=http://doc.trolltech.com/4.1/qstring.html>
The QString class provides a Unicode character string.
QString stores a string of 16-bit QChars, where each QChar stores one Unicode
4.0 character. Unicode is an international standard that supports most of the
writing systems in use today. It is a superset of ASCII and Latin-1 (ISO
8859-1), and all the ASCII/Latin-1 characters are available at the same code
positions.
Behind the scenes, QString uses implicit sharing (copy-on-write) to reduce
memory usage and to avoid the needless copying of data. This also helps
reduce the inherent overhead of storing 16-bit characters instead of 8-bit
characters.
</quote>
So, can I do this with confidence?
const XMLCh* QtoX(const QString& s) {
return reinterpret_cast
On Wednesday 08 March 2006 01:38, Steven T. Hatton wrote:
On the Xerces C++ mailing list I was told the following:
"XMLCh is specifically UTF-16. But wchar_t is not UTF-16 everywhere. It's non-portable, and Solaris and Linux each use different (including from each other) encodings for wchar_t.
Can anybody comment on the truth and/or significance of that statement? My understanding of the C++ Standard is that an implementation is required to support the UCS-2 (UTF-16) character set, but may use a different underlying encoding. The implementation should, however, behave 'as if' it were using UCS-2 internally. Perhaps I should press for clarification as to what was intended "It's non-portable, and Solaris and Linux each use different (including from each other) encodings for wchar_t." I have no doubt that the C++ Standard does not offer a clear and easy to follow path for using different character encodings within the same program. The support may ultimately be very powerful, but I have found it difficult to use. Steven
participants (1)
-
Steven T. Hatton