[opensuse] Need new source for unix utils -- gnu has broken another.
I've used grep to search for strings across all my mailboxes for decades. Found out today, it randomly doesn't work based on whether or not the file contains any text that doesn't comply with POSIX. So if one user has UTF-8 encoding and another, ISO-8859-1, and they are in the same mailbox, according to POSIX, that's a binary file. You have to tell grep to search (and potentially display) binary data -- which can easily through a terminal into weird modes, making it unreadable (see attached example for results of a random binary being listed). Note the last line is the prompt same text as you can see at top of window. mbox files don't do this when you search for strings because when I search for strings I'm looking for something in the text of an email. While I want grep to skip things like compressed files and coredumps, I don't want it judging the quality of "text" that I'm searching through -- but that's what many of the utilities have been modified to do -- if it doesn't fit the POSIX definition of text, then some text utils won't process it. Technically, if the last line of the file doesn't end with a newline, it's also binary (though grep still displays it). Many text utils used to be generally useful -- but now they are having functionality removed to have them only work with POSIX. I suppose no one else really does a quick search through all their email this way any more. Though is this what you'd expect? Sigh.
Linda Walsh composed on 2018-02-02 12:29 (UTC-0800):
I suppose no one else really does a quick search through all their email this way any more.
I do such searching with mc or filecommander, which is largely why I still use POP. -- "Wisdom is supreme; therefore get wisdom. Whatever else you get, get wisdom." Proverbs 4:7 (New Living Translation) Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2018-02-02 21:42, Felix Miata wrote:
Linda Walsh composed on 2018-02-02 12:29 (UTC-0800):
I suppose no one else really does a quick search through all their email this way any more.
I do such searching with mc or filecommander, which is largely why I still use POP.
I also do with 'mc', no matter if I used pop3 or imap to retrieve the posts. But in the past I remember using grepmail or mailgrep - I'm unsure of the exact name. It would generate another mbox file with the hits, IIRC. Right, I just found old notes: cd Mail grepmail -b -i -M -m -R -u -e "RBL" file/* > busqueda grepmail -b -i -M -m -R -u -e "griego" busqueda > busqueda_paso2 grepmail -b -i -M -m -R -u -e "greek" busqueda >> busqueda_paso2 grepmail -h -m -R -e "mail.id@host" lists/* > busqueda Notice that the mbox "busqueda" (search in Spanish) can itself fall in the search recursively with nasty results. -b Asserts that the pattern must match in the body of the email. (Not compatible with -B.) -B Asserts that the pattern must match in the body of the email, but not the signature. The signature consists of everything after a line consisting of "-- ". (Not compatible with -b.) -i Make the search case-insensitive (by analogy to grep -i). -M Causes grepmail to ignore non-text MIME attachments. This removes false positives resulting from binaries encoded as ASCII attachments. -m Append "X-Mailfolder: <folder>" to all email headers, indicating which folder contained the matched email. -R Causes grepmail to recurse any directories encountered. -u Output only unique emails, by analogy to sort -u. Grepmail determines email uniqueness by the Message-ID header. -e Explicitly specify the search pattern. This is useful for specifying patterns that begin with "-", which would otherwise be interpreted as a flag. -h Asserts that the pattern must match in the header of the email. I don't remember now why I stopped using it. Perhaps because Thunderbird can search in several folders. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" (Minas Tirith))
Carlos E. R. wrote:
It would generate another mbox file with the hits, IIRC. ... I don't remember now why I stopped using it. Perhaps because Thunderbird can search in several folders.
I used grep for a few reasons -- 1) (later reason), it added perl-compat RE's, 2) speed. have about 6.4G though some of those are compressed. 3) recursive 4) didn't search in Tbird as things would get messy trying to keep even archives in Imap. Would usually try to find which folders had references. From there would either look at the file in an editor if it was old, or if the file was in IMAP, I'd search for what I wanted via Tbird+IMAP. Still takes a while to do text searches through several gigabytes of text. It's mostly a narrowing down step to find where something is. Just needed something to search through files for given strings. and grep used to be general case enough that it would search through just about anything. Apparently not anymore. Tried to report problem to gnu-grep bug list, and was told that grep only works on text files as defined by POSIX, ... wonderful... Now if I can only get all email sources(authors) to follow POSIX standards for their email texts. Hahahaha...like that's gonna happen. Does anyone else think it's more than a bit odd to POSIXify people-interfaces? Programs, sure, but people? So much for userfriendly... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2018-02-03 00:10, L A Walsh wrote:
Carlos E. R. wrote:
It would generate another mbox file with the hits, IIRC. ... I don't remember now why I stopped using it. Perhaps because Thunderbird can search in several folders.
I used grep for a few reasons -- 1) (later reason), it added perl-compat RE's, 2) speed. have about 6.4G though some of those are compressed. 3) recursive 4) didn't search in Tbird as things would get messy trying to keep even archives in Imap.
Well, try grepmail - as long as they are not compressed. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" (Minas Tirith))
On Fri, 02 Feb 2018 15:10:41 -0800 L A Walsh <suse@tlinx.org> wrote:
Carlos E. R. wrote:
It would generate another mbox file with the hits, IIRC. ... I don't remember now why I stopped using it. Perhaps because Thunderbird can search in several folders.
I used grep for a few reasons -- 1) (later reason), it added perl-compat RE's, 2) speed. have about 6.4G though some of those are compressed. 3) recursive 4) didn't search in Tbird as things would get messy trying to keep even archives in Imap.
Would usually try to find which folders had references. From there would either look at the file in an editor if it was old, or if the file was in IMAP, I'd search for what I wanted via Tbird+IMAP.
Still takes a while to do text searches through several gigabytes of text. It's mostly a narrowing down step to find where something is.
Just needed something to search through files for given strings. and grep used to be general case enough that it would search through just about anything. Apparently not anymore. Tried to report problem to gnu-grep bug list, and was told that grep only works on text files as defined by POSIX, ... wonderful...
Now if I can only get all email sources(authors) to follow POSIX standards for their email texts. Hahahaha...like that's gonna happen. Does anyone else think it's more than a bit odd to POSIXify people-interfaces? Programs, sure, but people?
So much for userfriendly...
Get an old version of grep from wherever it is still skulking about? Rename it as 'grep-that-works' or 'grep-jfdi' or somesuch :) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/02/2018 09:29 PM, Linda Walsh wrote:
I've used grep to search for strings across all my mailboxes for decades. Found out today, it randomly doesn't work based on whether or not the file contains any text that doesn't comply with POSIX.
So if one user has UTF-8 encoding and another, ISO-8859-1, and they are in the same mailbox, according to POSIX, that's a binary file.
yes, IMO Eric Blake explained that quite well: https://lists.gnu.org/r/bug-grep/2018-02/msg00001.html
You have to tell grep to search (and potentially display) binary data [...].
I don't think so - but you have to tell it to process the file single-byte-wise instead of trying to conform to a certain single locale (which is impossible in that case). Again, Eric showed you the way: $ LC_ALL=C grep ...
Note the last line is the prompt same text as you can see at top of window.
I don't see what "head -3 /bin/bash" has to do with the output of grep or your $SUBJECT at all (apart from the term "binary"). Apropos standards: "head -NUM" is obsolete and non-portable syntax: https://www.gnu.org/software/coreutils/head Use "head -n NUM" instead. ;-) Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Bernhard Voelker wrote:
On 02/02/2018 09:29 PM, Linda Walsh wrote:
I've used grep to search for strings across all my mailboxes for decades. Found out today, it randomly doesn't work based on whether or not the file contains any text that doesn't comply with POSIX.
So if one user has UTF-8 encoding and another, ISO-8859-1, and they are in the same mailbox, according to POSIX, that's a binary file.
yes, IMO Eric Blake explained that quite well: https://lists.gnu.org/r/bug-grep/2018-02/msg00001.html
You have to tell grep to search (and potentially display) binary data [...].
I don't think so - but you have to tell it to process the file single-byte-wise
If data is in processed as "binary", wouldn't that mean processing it with no encoding or decoding -- as a stream of bytes? How do you interpret binary?
locale (which is impossible in that case). Again, Eric showed you the way: $ LC_ALL=C grep ...
Wouldn't LC_CTYPE suffice if you went that route? However, since you say it would be impossible to process the file as some encoding, then instead of throwing some error, or skipping the file, wouldn't it be more useful to default to such processing upon encountering a file that might appear "binary" (as in my case: "Non-ISO extended-ASCII text, with very long lines") in the case that "POSIXLY_CORRECT" was not set?
Note the last line is the prompt same text as you can see at top of window.
I don't see what "head -3 /bin/bash" has to do with the output of grep or your $SUBJECT at all (apart from the term "binary").
It was showing the output of a real binary file instead of an "mbox" that would contain multiple encodings and why one still doesn't want unrestrained display of binary, even if one wants to process text with multiple or incorrect encodings. The original example had: 'grep -a string /bin/bash|head -3', but for purposes of showing tty-corruption, the "head -3 /bin/bash" was sufficient. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/03/2018 04:03 AM, L A Walsh wrote:
Bernhard Voelker wrote: If data is in processed as "binary", wouldn't that mean processing it with no encoding or decoding -- as a stream of bytes?
well, it tries to treat each byte as one character - as opposed to multi-byte encodings where up to 4 bytes are used, e.g. the character "SMILING FACE WITH SUNGLASSES" is encoded in UTF-8 as 4 bytes: 0xF0 0x9F 0x98 0x8E. Furthermore, grep is a line-based tool, and binary files could have extreme long lines, control characters (which grep doesn't care but the terminal it writes to), and finally the NUL character which traditionally is the end of a string.
How do you interpret binary?
locale (which is impossible in that case). Again, Eric showed you the way: $ LC_ALL=C grep ...
Wouldn't LC_CTYPE suffice if you went that route?
Depends what you want to match. LC_COLLATE may also influence the matching. See 'info grep' or 'man grep'. LC_ALL does it all in one go.
However, since you say it would be impossible to process the file as some encoding, then instead of throwing some error, or skipping the file, wouldn't it be more useful to default to such processing upon encountering a file that might appear "binary" (as in my case: "Non-ISO extended-ASCII text, with very long lines") in the case that "POSIXLY_CORRECT" was not set?
IMO no: your environment tells grep to treat input e.g. as UTF-8, but the actual input might be some ISO encoding. There is simply no way to get the regular expression for such a mixture. It's like one is a vegetarian, and wants to get some carrots of a bag someone came back with from a butcher; in the back might be a duck stuffed with some vegetables and even a carrot, but the search is just a fail.
The original example had:
'grep -a string /bin/bash|head -3', but for purposes of showing tty-corruption, the "head -3 /bin/bash" was sufficient.
To search for some strings in executables, you're much better off with "strings /bin/bash | grep string". Your attempt is not a problem for grep at all, but the terminal it writes to might interpret certain control characters ... this is like someone speaking Chinese and an English listener might hear the word "bye" somewhere in the sentence ... and leave. Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Bernhard Voelker
-
Carlos E. R.
-
Dave Howorth
-
Felix Miata
-
L A Walsh
-
Linda Walsh