On 06/04/2021 03.54, David Haller wrote:
Hello,
On Tue, 06 Apr 2021, Carlos E. R. wrote:
On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote:
On Mon, 05 Apr 2021, Carlos E. R. wrote:
grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007*
What can I do so that the $ inside the $MSGID content is passed to grepmail and not interpreted as a variable start? Do I need to do text substitution first inside $MSGID, replacing '$' with '\$'? Is there some other way?
$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ ~/Mail/_Lists/_filed/os-en.2007*
I don't understand how, but it works :-D
[...]
Huh, it fails on these (that mc finds, manually):
Message-ID: <006501c7b50e$77cc3630$6764a290$@com> Message-ID: <2md$yDk+qZbGFwuq@dev.null.davjam.org> Message-Id: <8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>
and a few others that seem similar. :-?
Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor not found 4673A349.5030701@gmx.net not found [..] I'll stop here ... ;)
It is the worst kind of string to grep for in bash. The separators are '<' and '>', used for redirection. There are ''', '+', '$' and who knows what more. The background is that I'm helping with the migration of the openSUSE mail archive. There is one month missing in the server, and I happen to have it, in my own archive, so I volunteered. I got a list of the msg-ids that are missing, and the goal is to generate an mbox file with them all. Out of 2435 msg-ids, my current script finds all except 23. Not bad. And most of those that are not found, are due to special characters like $ being used for the msg-id. Bad luck. There does not seem to be a token in grepmail to disable regex. I could write pascal code of my own to precede each problematic char with a backslash, for instance, and be done. Pascal I understand, regex I don't. Nor bash complexities. Anyway, I am now too sleepy to try to understand what you wrote below, but thankyou for it. After a tea or two tomorrow I'll have a try. :-)
I knew the + would make trouble as well, but the real tricky part was the dollar, that grepmail seems to handle rather weirdly. I have not looked into the source, how it handles the regex-arguments. The following works, I hope you can pull what you need from this script (-fragment) ... Weird: pcregrep doesn't handle the '$' correctly either. I thought that $ meant "end-of-line" only at the end of an expression. But seemingly, pcregrep and grepmail work in multiline mode where $ can match any (embedded) newline or some such.
Anyway, escaping to the hexcode works (just adding backslashes failed somehow):
==== grepmail-msgid ==== #!/bin/bash MBOXEN=( opensuse-bis-20070731 opensuse-bis-2007-12-31 ) for MSGID; do mid="${MSGID//+/\\+}"; mid="${mid//$/\\x\{24\}}"; lines=$(grepmail -C .grepmail.cache -Y 'Message-[Ii][Dd]' \ -h "${mid}" "${MBOXEN[@]}" | wc -l) if test $lines -gt 0; then printf '%s: %i\n' "$MSGID" "$lines" else printf '%s: not found\n' "$MSGID" fi done ====
$ grepmail-msgid '006501c7b50e$77cc3630$6764a290$@com' \ '2md$yDk+qZbGFwuq@dev.null.davjam.org' \ '8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk' \ 'Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor' \ '4673A349.5030701@gmx.net' HURZ 006501c7b50e$77cc3630$6764a290$@com: 91 2md$yDk+qZbGFwuq@dev.null.davjam.org: 104 8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk: 82 Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor: 120 4673A349.5030701@gmx.net: 97 HURZ: not found
I hope I have not missed more special cases ... And I'm too lazy to add options for either files or message-ids or to read the MIDs from a file or some such.
And yes, I *do* have those mails archived, albeit usually gzipped (saving ~83% in this case) :)
HTH, -dnh
-- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)