[opensuse] problem with [:graph:] in perl
All, I'm sure we've got some very knowledgeable perl-people here, so maybe you can sort this one out for me: I'm deleting all [:graph:] from a string using something like this: $txt =~ s/[[:graph:]]+//g; With the standard locale, [:graph:] is 0x21-0x7e. With locale = de_DE.iso8859-1 (or similar ones), [:graph:] should be 0x21-0x7E plus 0xA0-0xFF. I'm comparing to how isgraph() works in C, which should be the same. Well, with LC_CTYPE=de_DE.iso8859-1, the perl statement does NOT remove the characters 0x0A-0xFF. Why not? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Per Jessen wrote:
I'm deleting all [:graph:] from a string using something like this:
$txt =~ s/[[:graph:]]+//g;
With the standard locale, [:graph:] is 0x21-0x7e.
With locale = de_DE.iso8859-1 (or similar ones), [:graph:] should be 0x21-0x7E plus 0xA0-0xFF. I'm comparing to how isgraph() works in C, which should be the same.
Well, with LC_CTYPE=de_DE.iso8859-1, the perl statement does NOT remove the characters 0x0A-0xFF.
Why not?
Does anyone have suggestions where else I might get an answer to this? /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi, On Sat, Sep 6, 2008 at 10:05 AM, Per Jessen <per@computer.org> wrote:
Per Jessen wrote:
I'm deleting all [:graph:] from a string using something like this:
$txt =~ s/[[:graph:]]+//g;
With the standard locale, [:graph:] is 0x21-0x7e.
With locale = de_DE.iso8859-1 (or similar ones), [:graph:] should be 0x21-0x7E plus 0xA0-0xFF. I'm comparing to how isgraph() works in C, which should be the same.
Well, with LC_CTYPE=de_DE.iso8859-1, the perl statement does NOT remove the characters 0x0A-0xFF.
Why not?
Does anyone have suggestions where else I might get an answer to this?
I had once some issue comparing strings. It looked like locale was not actually taken into account. Somebody on comp.lang.perl.tk (I was using Perl/Tk) group suggested that there might be a problem with interaction of Unicode and Locale.
From perlunicode:
Beginning with version 5.6, Perl uses logically-wide characters to represent strings internally. ... Usually locale settings and Unicode do not affect each other, but there are a couple of exceptions: • You can enable automatic UTF-8-ification of your standard file handles, default open() layer, and @ARGV by using either the -C command line switch or the PERL_UNICODE environment variable, see perlrun for the documentation of the -C switch. • Perl tries really hard to work both with Unicode and the old byte-oriented world. Most often this is nice, but sometimes Perl's straddling of the proverbial fence causes problems. Not sure it is relevant here. Try comp.lang.perl group anyway. -- Mark Goldstein -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Mark Goldstein
-
Per Jessen