On Tue, Apr 03, 2012 at 10:06:18AM +0200, Ladislav Slezak wrote:
Dne 3.4.2012 09:37, Johannes Meixner napsal(a):
Could a YCP expert show the correct way (example code) how to remove an UTF8 sub-string from an UTF8 string? I.e. how to remove "Bar" from "FooBarBaz" in an UTF8-safe way?
More generally: Is there documentation how to work on UTF8 strings with YCP in general?
Um, it seems that the documentation does not mention the [] behavior for string (http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/bracket.html nor http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/id_ycp_data_string.ht...).
I fixed that particular bug by using regexpsub() instead of iterating over the string and using [] operator. So I guess the regexp*() functions are UTF-8 safe.
I'm not sure what the correct solution is. Maybe the correct way is to fix the [] operator after all... That's why I have opened this discussion, because maybe someone is relaying on the current "buggy" behavior... (I don't expect that but I'd like to avoid regressions if possible.)
Unfortunately this can be the case since most other string functions also do not respect UTF-8. E.g. splitting a string at a space by using search and substring works correctly with substring since search is also byte-oriented. Program: string s = "schöner Würfel"; integer i = search(s, " "); y2milestone("substring '%1' '%2'", substring(s, 0, i), substring(s, i + 1)); y2milestone("lsubstring '%1' '%2'", lsubstring(s, 0, i), lsubstring(s, i + 1)); Output: test1.ycp:6 substring 'schöner' 'Würfel' test1.ycp:7 lsubstring 'schöner ' 'ürfel' So, if substring is fixed all other functions must also. Regards, Arvin -- Arvin Schnell, <aschnell@suse.de> Senior Software Engineer, Research & Development SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org