On Tue, Apr 03, 2012 at 11:33:09AM +0200, Klaus Kaempf wrote:
* Ladislav Slezak
[Apr 03. 2012 11:10]:
I used substring() to get one character. So the problematic call is actually:
substring("áa", 1, 1);
which returns "\0xF1" instead of "a" as I expected.
The documentation does not tell whether the substring() argument units are in bytes or characters. http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/substring-rest.html
So any opinions on changing this call? Is the UTF-8 assumption also valid here?
Yes. sub_string_ is operating on strings and strings are defined to be UTF-8 encoded.
Generally I agree that strings in YCP are UTF-8 encoded and
functions should respect this.
But simply fixing the functions might require converting from
UTF-8 to wstring and back in every function and that sounds very
costly. E.g. the size functions in YCP converts the string to
wstring. When I noticed that and saw how many time
size(string) == 0 is used I added an isempty function in YCP.
Could be that using wstring internally in YCPString is the better
solution.
Regards,
Arvin
--
Arvin Schnell,