On Tue, 3 Apr 2012 11:58:16 +0200 Arvin Schnell <aschnell@suse.de> wrote:
On Tue, Apr 03, 2012 at 11:33:09AM +0200, Klaus Kaempf wrote:
* Ladislav Slezak <lslezak@suse.cz> [Apr 03. 2012 11:10]:
I used substring() to get one character. So the problematic call is actually:
substring("áa", 1, 1);
which returns "\0xF1" instead of "a" as I expected.
The documentation does not tell whether the substring() argument units are in bytes or characters. http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/substring-rest.html
So any opinions on changing this call? Is the UTF-8 assumption also valid here?
Yes. sub_string_ is operating on strings and strings are defined to be UTF-8 encoded.
Generally I agree that strings in YCP are UTF-8 encoded and functions should respect this.
But simply fixing the functions might require converting from UTF-8 to wstring and back in every function and that sounds very costly. E.g. the size functions in YCP converts the string to wstring. When I noticed that and saw how many time size(string) == 0 is used I added an isempty function in YCP.
Could be that using wstring internally in YCPString is the better solution.
I absolutelly agree. If we have each string as UTF string in ycp, then not using wstring doesn't make much sense to me. Of course we need to check which depends on it, but I think that it should be mainly various bindings. Other part of code should not be interested what is internal representation. Josef
Regards, Arvin
-- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org