Hello, On Apr 3 18:19 Ladislav Slezak wrote (excerpt):
Dne 3.4.2012 15:39, Arvin Schnell napsal(a):
YCP has the function lsubstring: http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/lsubstring-rest.html ... But shouldn't be the default behavior opposite?, i.e. the default substring() should be UTF-8 aware and have an extra function for byte operation?
See what Klaus Kaempf already wrote in this thread: ------------------------------------------------------------------------- In YCP, all strings are supposed to be UTF-8 encoded. This is how YaST was designed from the beginning. -------------------------------------------------------------------------
... Is the UTF-8 assumption also valid here? Yes. sub_string_ is operating on strings and strings are defined to be UTF-8 encoded.
This is also what is documented at http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/id_ycp_data_string.ht... ------------------------------------------------------------------------- String constants consist of UNICODE characters encoded in UTF8. ------------------------------------------------------------------------- Therefore substring() must work on UTF-8 strings and an extra function for byte operation on strings might exist optionaly. But what is a practical use-case for byte operation on strings? Byte operation on UTF-8 encoded strings means that a multibyte character gets split into several one-byte values where each of those one-byte values is meaningless so that I cannot imagine a real use-case for byte operation on UTF-8 encoded strings. To extract ASCII characters from an UTF-8 encoded string there is toascii("UTF-8 encoded string") If byte operation on strings would be a useful functionality: Perhaps the [] operator which is currently not available for strings could be used to implement byte operation on strings? By the way: I wonder what tolist("UTF-8 encoded string") respecitvely (list)"UTF-8 encoded string" would result? Would it result a list of byte values where a multibyte character gets split into several one-byte list elements or would it result a list of integer values where a multibyte character becomes a single integer list elements or anytjing else? Perhaps (list<byteblock>)"an UTF-8 encoded string" results the string split into byteblocks where each character becomes a single byteblock list element (i.e. a multibyte character becomes a single byteblock list element)?
My quick grep in SVN trunk found just one (!!) usage of lsubstring() function in all YCP code. That's suspicious, I guess there could be other misused substring() calls...
Because in YCP strings are UTF-8 encoded there cannot be a misuse of substring("UTF-8 encoded string") and the only "misuse" i.e. "misuderstanding what strings are in YCP" from my point of view is that lsubstring() exists at all. Kind Regards Johannes Meixner -- SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org