Re: [opensuse] Another slightly OT c question, howto handle extended ascii chars? >127

31 Aug 2009


      David C. Rankin - 17:11 29.08.09 wrote:
...
Listmates,
Hi,
...
I'm parsing output that has the degree symbol in it in c. The character code 
for the symbol is 167, but of course the ascii character set it limited to 
0-127.
Well, as degree symbol is non-ascii, it's code depends on encoding you
are using.
...
Believe it or not, using a cut-n-paste into the strtok delimiter set 
works, but that just feels like a cludge. Example:
const char delimiters[] = " +°,;:!-";
  
    token = strtok (NULL, delimiters);
Will break the string on ° correctly. But looking at how c is handling this 
clude causes concern:
for (i=0;i
Yields:
32
+  43
???  -62
???  -80
,  44 
;  59 
:  58 
!  33 
-  45
Hmm, any time I see the little ??? character, that's a bad sign. So, is there 
any trick to handling the chars that are outside of the normal ascii chars, 
but we seem to run into all the time? Is this the area of the thinly defined 
wchar_t?
You see these ??? as your terminal doesn't know how to handle this
character, I would guess that you are using iso8859-1 somewhere
(probably in sources) and utf-8 somewhere else. You don't have to worry
as C can handle non-ascii characters very well - char can handle 8 bits
so it can handle all 256 characters in all one byte encodings (like
iso8859-1) wchar is used for some multibyte characters.

-- 
Michal Hrusecky

Package Maintainer
SUSE LINUX, s.r.o
e-mail: mhrusecky@suse.cz
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org