On 02/03/2018 04:03 AM, L A Walsh wrote:
Bernhard Voelker wrote: If data is in processed as "binary", wouldn't that mean processing it with no encoding or decoding -- as a stream of bytes?
well, it tries to treat each byte as one character - as opposed to multi-byte encodings where up to 4 bytes are used, e.g. the character "SMILING FACE WITH SUNGLASSES" is encoded in UTF-8 as 4 bytes: 0xF0 0x9F 0x98 0x8E. Furthermore, grep is a line-based tool, and binary files could have extreme long lines, control characters (which grep doesn't care but the terminal it writes to), and finally the NUL character which traditionally is the end of a string.
How do you interpret binary?
locale (which is impossible in that case). Again, Eric showed you the way: $ LC_ALL=C grep ...
Wouldn't LC_CTYPE suffice if you went that route?
Depends what you want to match. LC_COLLATE may also influence the matching. See 'info grep' or 'man grep'. LC_ALL does it all in one go.
However, since you say it would be impossible to process the file as some encoding, then instead of throwing some error, or skipping the file, wouldn't it be more useful to default to such processing upon encountering a file that might appear "binary" (as in my case: "Non-ISO extended-ASCII text, with very long lines") in the case that "POSIXLY_CORRECT" was not set?
IMO no: your environment tells grep to treat input e.g. as UTF-8, but the actual input might be some ISO encoding. There is simply no way to get the regular expression for such a mixture. It's like one is a vegetarian, and wants to get some carrots of a bag someone came back with from a butcher; in the back might be a duck stuffed with some vegetables and even a carrot, but the search is just a fail.
The original example had:
'grep -a string /bin/bash|head -3', but for purposes of showing tty-corruption, the "head -3 /bin/bash" was sufficient.
To search for some strings in executables, you're much better off with "strings /bin/bash | grep string". Your attempt is not a problem for grep at all, but the terminal it writes to might interpret certain control characters ... this is like someone speaking Chinese and an English listener might hear the word "bye" somewhere in the sentence ... and leave. Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org