Mailinglist Archive: yast-devel (23 mails)

< Previous Next >
Re: [yast-devel] Re: YCP substring() Was: YCP String operator [] and UTF-8

Hello,

On Apr 3 18:19 Ladislav Slezak wrote (excerpt):
Dne 3.4.2012 15:39, Arvin Schnell napsal(a):
YCP has the function lsubstring:
http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/lsubstring-rest.html
...
But shouldn't be the default behavior opposite?, i.e. the default substring()
should be UTF-8 aware and have an extra function for byte operation?

See what Klaus Kaempf already wrote in this thread:
-------------------------------------------------------------------------
In YCP, all strings are supposed to be UTF-8 encoded.
This is how YaST was designed from the beginning.
-------------------------------------------------------------------------
... Is the UTF-8 assumption also valid here?
Yes. sub_string_ is operating on strings and strings are defined
to be UTF-8 encoded.
-------------------------------------------------------------------------

This is also what is documented at
http://doc.opensuse.org/projects/YaST/openSUSE11.3/tdg/id_ycp_data_string.html
-------------------------------------------------------------------------
String constants consist of UNICODE characters encoded in UTF8.
-------------------------------------------------------------------------

Therefore substring() must work on UTF-8 strings and an extra function
for byte operation on strings might exist optionaly.


But what is a practical use-case for byte operation on strings?

Byte operation on UTF-8 encoded strings means that a multibyte character
gets split into several one-byte values where each of those
one-byte values is meaningless so that I cannot imagine a
real use-case for byte operation on UTF-8 encoded strings.

To extract ASCII characters from an UTF-8 encoded string there is
toascii("UTF-8 encoded string")


If byte operation on strings would be a useful functionality:

Perhaps the [] operator which is currently not available for strings
could be used to implement byte operation on strings?

By the way:

I wonder what
tolist("UTF-8 encoded string")
respecitvely
(list)"UTF-8 encoded string"
would result?

Would it result a list of byte values where a multibyte character
gets split into several one-byte list elements or
would it result a list of integer values where a multibyte character
becomes a single integer list elements or
anytjing else?

Perhaps
(list<byteblock>)"an UTF-8 encoded string"
results the string split into byteblocks where each character becomes
a single byteblock list element (i.e. a multibyte character
becomes a single byteblock list element)?


My quick grep in SVN trunk found just one (!!) usage of lsubstring()
function in all YCP code. That's suspicious, I guess there could
be other misused substring() calls...

Because in YCP strings are UTF-8 encoded there cannot be
a misuse of substring("UTF-8 encoded string") and the only
"misuse" i.e. "misuderstanding what strings are in YCP"
from my point of view is that lsubstring() exists at all.


Kind Regards
Johannes Meixner
--
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer
--
To unsubscribe, e-mail: yast-devel+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: yast-devel+owner@xxxxxxxxxxxx

< Previous Next >
List Navigation