Xerces XMLCh UTF-16 Linux vs Reality
On the Xerces C++ mailing list I was told the following: "XMLCh is specifically UTF-16. But wchar_t is not UTF-16 everywhere. It's non-portable, and Solaris and Linux each use different (including from each other) encodings for wchar_t. There's actually a gcc option to make wchar_t the same as on Windows, it was created for Wine. I'm not brave enough to rely on it." The Qt documentation tells me: <quote url=http://doc.trolltech.com/4.1/qstring.html> The QString class provides a Unicode character string. QString stores a string of 16-bit QChars, where each QChar stores one Unicode 4.0 character. Unicode is an international standard that supports most of the writing systems in use today. It is a superset of ASCII and Latin-1 (ISO 8859-1), and all the ASCII/Latin-1 characters are available at the same code positions. Behind the scenes, QString uses implicit sharing (copy-on-write) to reduce memory usage and to avoid the needless copying of data. This also helps reduce the inherent overhead of storing 16-bit characters instead of 8-bit characters. </quote> So, can I do this with confidence? const XMLCh* QtoX(const QString& s) { return reinterpret_cast<const XMLCh*>(s.constData()); } Q:Is this thing loaded? A:I don't know; pull the trigger and find out. Q:Where should I point it? Steven
On Wednesday 08 March 2006 01:38, Steven T. Hatton wrote:
On the Xerces C++ mailing list I was told the following:
"XMLCh is specifically UTF-16. But wchar_t is not UTF-16 everywhere. It's non-portable, and Solaris and Linux each use different (including from each other) encodings for wchar_t.
Can anybody comment on the truth and/or significance of that statement? My understanding of the C++ Standard is that an implementation is required to support the UCS-2 (UTF-16) character set, but may use a different underlying encoding. The implementation should, however, behave 'as if' it were using UCS-2 internally. Perhaps I should press for clarification as to what was intended "It's non-portable, and Solaris and Linux each use different (including from each other) encodings for wchar_t." I have no doubt that the C++ Standard does not offer a clear and easy to follow path for using different character encodings within the same program. The support may ultimately be very powerful, but I have found it difficult to use. Steven
participants (1)
-
Steven T. Hatton