http://bugzilla.novell.com/show_bug.cgi?id=1064519 http://bugzilla.novell.com/show_bug.cgi?id=1064519#c1 Michael Matz <matz@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |matz@suse.com Resolution|--- |FIXED --- Comment #1 from Michael Matz <matz@suse.com> --- POSIX regexp character ranges are defined to be linguistic and not byte-valued. Hence something like [A-Z] contains all collation elements between the two endpoints A and Z. The locale "cs_CZ" (and for instance pl_PL and lt_LT, but not for example en_US or de_DE) define the collating order such that miniscule letters come in between their capital variants, like so: aAbB ... xXyYzZ Due to further complications in czech having to do with multi-character sequences the range A-X doesn't include the small x (even though it would normally given the above sorting order). But of course A-Y and hence A-Z do include the small x. If you want to see some more funny things do this: % echo chemie | LANG=cs_CZ.UTF-8 grep -E '^[^x]emie' chemie (doesn't match in most other locales). Reason: POSIX regexps match on a collating sequence (mapping of the input to collating elements), and 'ch' is a single collating element (to sort it between h and i), and hence is matched by the single character regexp [^x]. (This works only with POSIX compliant regexp engines, others will usually regard 'ch' as two characters). In short: (POSIX) character ranges and character classes are i18n'ed. They don't do simple ASCII like byte ranges or mapping. If you need that you have to use LC_COLLATE=C (and for mapping also LC_CTYPE=C). See for instance https://www.regular-expressions.info/posixbrackets.html Therefore: works as expected, INVALID. -- You are receiving this mail because: You are on the CC list for the bug.