Re: [opensuse] Need new source for unix utils -- gnu has broken another.

3 Feb 2018

      On 02/03/2018 04:03 AM, L A Walsh wrote:
...
Bernhard Voelker wrote:
  If data is in processed as "binary", wouldn't that
mean processing it with no encoding or decoding -- as a stream
of bytes?
well, it tries to treat each byte as one character - as opposed to
multi-byte encodings where up to 4 bytes are used, e.g. the character
"SMILING FACE WITH SUNGLASSES" is encoded in UTF-8 as 4 bytes:
0xF0 0x9F 0x98 0x8E.

Furthermore, grep is a line-based tool, and binary files could have
extreme long lines, control characters (which grep doesn't care but
the terminal it writes to), and finally the NUL character which
traditionally is the end of a string.
...
How do you interpret binary?
...
locale (which is impossible in that case).
Again, Eric showed you the way:
    $ LC_ALL=C grep ...

Wouldn't LC_CTYPE suffice if you went that route?
Depends what you want to match.  LC_COLLATE may also influence
the matching. See 'info grep' or 'man grep'.  LC_ALL does it
all in one go.
...
However, since you say it would be impossible
to process the file as some encoding, then instead of
throwing some error, or skipping the file, wouldn't it
be more useful to default to such processing upon
encountering a file that might appear "binary" (as in my
case: "Non-ISO extended-ASCII text, with very long lines")
in the case that "POSIXLY_CORRECT" was not set?
IMO no: your environment tells grep to treat input e.g. as UTF-8,
but the actual input might be some ISO encoding.  There is simply
no way to get the regular expression for such a mixture.  It's like
one is a vegetarian, and wants to get some carrots of a bag someone
came back with from a butcher; in the back might be a duck stuffed
with some vegetables and even a carrot, but the search is just a
fail.
...
The original example had:
'grep -a string /bin/bash|head -3', but for purposes of
showing tty-corruption, the "head -3 /bin/bash" was
sufficient.
To search for some strings in executables, you're much better off
with "strings /bin/bash | grep string".  Your attempt is not a problem
for grep at all, but the terminal it writes to might interpret certain
control characters ... this is like someone speaking Chinese and an
English listener might hear the word "bye" somewhere in the sentence
... and leave.

Have a nice day,
Berny

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org