sed -e's/[^[:alpha:]]/|/g' drüber drüber # Expected output would be: "dr|ber" (replace non-letters with '|') LANG=de sed -e's/[^[:alpha:]]/|/g' drüber dr||ber LANG=C sed -e's/[^[:alpha:]]/|/g' drüber drüber -- In Perl I have the line:
locale -m locale: Das Verzeichnis »/usr/share/i18n/charmaps« der Zeichensatz-Definitionen kann nicht gelesen werden: Datei oder Verzeichnis nicht gefunden rpm -qa |grep locale glibc-locale-32bit-2.19-16.2.5.x86_64 glibc-locale-2.19-16.2.5.x86_64
http://bugzilla.opensuse.org/show_bug.cgi?id=911622 Bug ID: 911622 Summary: locale problem with POSIX character class [[:alpha:]] and UTF-8 (in sed and Perl) Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: openSUSE 13.2 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: Ulrich.Windl@rz.uni-regensburg.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Using character class "[[:alpha:]]" in Perl with LC_CTYPE=de_DE.UTF-8 I found that (for example) character 'ü' does not match. Testing with sed showed a similar problem: -- print "R1: $`|$&|$'\n" if ($word =~ /[^[:alpha:]]+/); and this result: R1: dr�|�|ber # which makes no sense at all With "LANG=de" I get: R1: dr|ü|ber # (This would be the expected result for LANG=C) # The expected result should be no output if every character is a letter) I used my Perl script with any combinations of: use locale; use feature 'unicode_strings'; The only clue I have is this: libboost_locale1_54_0-1.54.0-10.1.3.x86_64 # RPM verifies both packages OK
ll /usr/share/i18n ls: Zugriff auf /usr/share/i18n nicht möglich: Datei oder Verzeichnis nicht gefunden
-- You are receiving this mail because: You are on the CC list for the bug.