Re: [yast-devel] Re: YCP substring() Was: YCP String operator [] and UTF-8

3 Apr 2012


      On Tue, 3 Apr 2012 11:58:16 +0200
Arvin Schnell <aschnell@suse.de> wrote:
...
On Tue, Apr 03, 2012 at 11:33:09AM +0200, Klaus Kaempf wrote:
...
* Ladislav Slezak <lslezak@suse.cz> [Apr 03. 2012 11:10]:
...
...
I used substring() to get one character. So the problematic call is actually:
substring("áa", 1, 1);
which returns "\0xF1" instead of "a" as I expected.
The documentation does not tell whether the substring() argument units are in
bytes or characters.
http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/substring-rest.html
So any opinions on changing this call? Is the UTF-8 assumption also valid here?
Yes. sub_string_ is operating on strings and strings are defined to be
UTF-8 encoded.
Generally I agree that strings in YCP are UTF-8 encoded and
functions should respect this.
But simply fixing the functions might require converting from
UTF-8 to wstring and back in every function and that sounds very
costly. E.g. the size functions in YCP converts the string to
wstring. When I noticed that and saw how many time
size(string) == 0 is used I added an isempty function in YCP.
Could be that using wstring internally in YCPString is the better
solution.
I absolutelly agree. If we have each string as UTF string in ycp, then not using wstring doesn't make much sense to me. Of course we need to check which depends on it, but I think that it should be mainly various bindings. Other part of code should not be interested what is internal representation.

Josef
...
Regards,
  Arvin
-- 
To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org
To contact the owner, e-mail: yast-devel+owner@opensuse.org

Re: [yast-devel] Re: YCP substring() Was: YCP String operator [] and UTF-8

Josef Reidinger