New subject: [suse-programming-e] Xerces XMLCh UTF-16 Linux vs Reality

8 Mar 2006

      On the Xerces C++ mailing list I was told the following:

"XMLCh is specifically UTF-16. But wchar_t is not UTF-16 everywhere. It's 
non-portable, and Solaris and Linux each use different (including from each 
other) encodings for wchar_t.

There's actually a gcc option to make wchar_t the same as on Windows, it was
created for Wine. I'm not brave enough to rely on it."

The Qt documentation tells me:

<quote url=http://doc.trolltech.com/4.1/qstring.html>
The QString class provides a Unicode character string.

QString stores a string of 16-bit QChars, where each QChar stores one Unicode 
4.0 character. Unicode is an international standard that supports most of the 
writing systems in use today. It is a superset of ASCII and Latin-1 (ISO 
8859-1), and all the ASCII/Latin-1 characters are available at the same code 
positions.

Behind the scenes, QString uses implicit sharing (copy-on-write) to reduce 
memory usage and to avoid the needless copying of data. This also helps 
reduce the inherent overhead of storing 16-bit characters instead of 8-bit 
characters.
</quote>

So, can I do this with confidence?
const XMLCh* QtoX(const QString& s) { 
  return reinterpret_cast<const XMLCh*>(s.constData()); 
}

Q:Is this thing loaded?  
A:I don't know; pull the trigger and find out.
Q:Where should I point it?

Steven

Xerces XMLCh UTF-16 Linux vs Reality

Steven T. Hatton

Steven T. Hatton

tags

participants (1)