Re: Grepping for emails problem.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote:
Hello,
On Mon, 05 Apr 2021, Carlos E. R. wrote:
grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007*
What can I do so that the $ inside the $MSGID content is passed to grepmail and not interpreted as a variable start? Do I need to do text substitution first inside $MSGID, replacing '$' with '\$'? Is there some other way?
$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ ~/Mail/_Lists/_filed/os-en.2007*
I don't understand how, but it works :-D [...] Huh, it fails on these (that mc finds, manually): Message-ID: <006501c7b50e$77cc3630$6764a290$@com> Message-ID: <2md$yDk+qZbGFwuq@dev.null.davjam.org> Message-Id: <8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk> and a few others that seem similar. :-? Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor not found 4673A349.5030701@gmx.net not found 4179-Mon04Jun2007071608+0100-jpff@codemist.co.uk not found 7281-Sat16Jun2007102117+0100-jpff@codemist.co.uk not found '027801c7b544$c5202410$4f606c30$@com' not found 7281-Fri22Jun2007074841+0100-jpff@codemist.co.uk not found 200706012133.01430.thadeurj@terra.com.br not found 4300-Fri29Jun2007145048+0100-jpff@codemist.co.uk not found '003801c7a6d3$70740370$515c0a50$@com' not found 20070616184124.GF19067@blinkenlights.visv.net not found 00cc01c7ba6b$1ace4060$506ac120$@co.cr not found 7277-Tue26Jun2007205907+0100-jpff@codemist.co.uk not found Pine.LNX.4.64.0706251153151.14573@nimrodel.valinor not found 6647-Mon11Jun2007214515+0100-jpff@codemist.co.uk not found 000001c7b380$a5f1edf0$f1d5c9d0$@com not found 200706071153.50160.wstephenson@suse.de not found JM200706162234065.4685828@pop.707.to not found the script is still running, there may be a few more. - -- Cheers, Carlos E. R. (from openSUSE 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYGukWhwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVC2wAnj1ENtse+uyCFLVc7Hfe s8gHjd1UAKCWo+wkdYYNqfHCOFOKogvXhNT/gA== =7NY+ -----END PGP SIGNATURE-----
Hello, On Tue, 06 Apr 2021, Carlos E. R. wrote:
On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote:
On Mon, 05 Apr 2021, Carlos E. R. wrote:
grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007*
What can I do so that the $ inside the $MSGID content is passed to grepmail and not interpreted as a variable start? Do I need to do text substitution first inside $MSGID, replacing '$' with '\$'? Is there some other way?
$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ ~/Mail/_Lists/_filed/os-en.2007*
I don't understand how, but it works :-D
[...]
Huh, it fails on these (that mc finds, manually):
Message-ID: <006501c7b50e$77cc3630$6764a290$@com> Message-ID: <2md$yDk+qZbGFwuq@dev.null.davjam.org> Message-Id: <8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>
and a few others that seem similar. :-?
Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor not found 4673A349.5030701@gmx.net not found [..] I'll stop here ... ;)
I knew the + would make trouble as well, but the real tricky part was the dollar, that grepmail seems to handle rather weirdly. I have not looked into the source, how it handles the regex-arguments. The following works, I hope you can pull what you need from this script (-fragment) ... Weird: pcregrep doesn't handle the '$' correctly either. I thought that $ meant "end-of-line" only at the end of an expression. But seemingly, pcregrep and grepmail work in multiline mode where $ can match any (embedded) newline or some such. Anyway, escaping to the hexcode works (just adding backslashes failed somehow): ==== grepmail-msgid ==== #!/bin/bash MBOXEN=( opensuse-bis-20070731 opensuse-bis-2007-12-31 ) for MSGID; do mid="${MSGID//+/\\+}"; mid="${mid//$/\\x\{24\}}"; lines=$(grepmail -C .grepmail.cache -Y 'Message-[Ii][Dd]' \ -h "${mid}" "${MBOXEN[@]}" | wc -l) if test $lines -gt 0; then printf '%s: %i\n' "$MSGID" "$lines" else printf '%s: not found\n' "$MSGID" fi done ==== $ grepmail-msgid '006501c7b50e$77cc3630$6764a290$@com' \ '2md$yDk+qZbGFwuq@dev.null.davjam.org' \ '8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk' \ 'Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor' \ '4673A349.5030701@gmx.net' HURZ 006501c7b50e$77cc3630$6764a290$@com: 91 2md$yDk+qZbGFwuq@dev.null.davjam.org: 104 8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk: 82 Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor: 120 4673A349.5030701@gmx.net: 97 HURZ: not found I hope I have not missed more special cases ... And I'm too lazy to add options for either files or message-ids or to read the MIDs from a file or some such. And yes, I *do* have those mails archived, albeit usually gzipped (saving ~83% in this case) :) HTH, -dnh -- +-------------------------------------------------------------------+ |-- SELF-ASSEMBLY MOEBIUS-STRIP - SEE OTHER SIDE FOR INSTRUCTIONS --| +-------------------------------------------------------------------+
On 06/04/2021 03.54, David Haller wrote:
Hello,
On Tue, 06 Apr 2021, Carlos E. R. wrote:
On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote:
On Mon, 05 Apr 2021, Carlos E. R. wrote:
grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007*
What can I do so that the $ inside the $MSGID content is passed to grepmail and not interpreted as a variable start? Do I need to do text substitution first inside $MSGID, replacing '$' with '\$'? Is there some other way?
$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ ~/Mail/_Lists/_filed/os-en.2007*
I don't understand how, but it works :-D
[...]
Huh, it fails on these (that mc finds, manually):
Message-ID: <006501c7b50e$77cc3630$6764a290$@com> Message-ID: <2md$yDk+qZbGFwuq@dev.null.davjam.org> Message-Id: <8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>
and a few others that seem similar. :-?
Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor not found 4673A349.5030701@gmx.net not found [..] I'll stop here ... ;)
It is the worst kind of string to grep for in bash. The separators are '<' and '>', used for redirection. There are ''', '+', '$' and who knows what more. The background is that I'm helping with the migration of the openSUSE mail archive. There is one month missing in the server, and I happen to have it, in my own archive, so I volunteered. I got a list of the msg-ids that are missing, and the goal is to generate an mbox file with them all. Out of 2435 msg-ids, my current script finds all except 23. Not bad. And most of those that are not found, are due to special characters like $ being used for the msg-id. Bad luck. There does not seem to be a token in grepmail to disable regex. I could write pascal code of my own to precede each problematic char with a backslash, for instance, and be done. Pascal I understand, regex I don't. Nor bash complexities. Anyway, I am now too sleepy to try to understand what you wrote below, but thankyou for it. After a tea or two tomorrow I'll have a try. :-)
I knew the + would make trouble as well, but the real tricky part was the dollar, that grepmail seems to handle rather weirdly. I have not looked into the source, how it handles the regex-arguments. The following works, I hope you can pull what you need from this script (-fragment) ... Weird: pcregrep doesn't handle the '$' correctly either. I thought that $ meant "end-of-line" only at the end of an expression. But seemingly, pcregrep and grepmail work in multiline mode where $ can match any (embedded) newline or some such.
Anyway, escaping to the hexcode works (just adding backslashes failed somehow):
==== grepmail-msgid ==== #!/bin/bash MBOXEN=( opensuse-bis-20070731 opensuse-bis-2007-12-31 ) for MSGID; do mid="${MSGID//+/\\+}"; mid="${mid//$/\\x\{24\}}"; lines=$(grepmail -C .grepmail.cache -Y 'Message-[Ii][Dd]' \ -h "${mid}" "${MBOXEN[@]}" | wc -l) if test $lines -gt 0; then printf '%s: %i\n' "$MSGID" "$lines" else printf '%s: not found\n' "$MSGID" fi done ====
$ grepmail-msgid '006501c7b50e$77cc3630$6764a290$@com' \ '2md$yDk+qZbGFwuq@dev.null.davjam.org' \ '8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk' \ 'Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor' \ '4673A349.5030701@gmx.net' HURZ 006501c7b50e$77cc3630$6764a290$@com: 91 2md$yDk+qZbGFwuq@dev.null.davjam.org: 104 8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk: 82 Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor: 120 4673A349.5030701@gmx.net: 97 HURZ: not found
I hope I have not missed more special cases ... And I'm too lazy to add options for either files or message-ids or to read the MIDs from a file or some such.
And yes, I *do* have those mails archived, albeit usually gzipped (saving ~83% in this case) :)
HTH, -dnh
-- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
Carlos, et al -- ...and then Carlos E. R. said... % % On 06/04/2021 03.54, David Haller wrote: % >Hello, % > % >On Tue, 06 Apr 2021, Carlos E. R. wrote: % >>On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote: % >>>On Mon, 05 Apr 2021, Carlos E. R. wrote: % >>>>grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007* ... % >>>$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ % >>> ~/Mail/_Lists/_filed/os-en.2007* % >> % >>I don't understand how, but it works :-D First, props to dnh for the wicked awesome var munging :-) ... % % It is the worst kind of string to grep for in bash. The separators % are '<' and '>', used for redirection. There are ''', '+', '$' and % who knows what more. [snip] Indeed. Ugh. But you're working in a pretty limited universe, and you aren't likely to come across a lot of near-duplicates. Why not just change davidtg@gezebel:~> for MSGID in \ '<006501c7b50e$77cc3630$6764a290$@com>' \ '<2md$yDk+qZbGFwuq@dev.null.davjam.org>' \ '<8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>' do echo $MSGID | tr '<>$+' '.' ; done .006501c7b50e.77cc3630.6764a290.@com. .2md.yDk.qZbGFwuq@dev.null.davjam.org. .8993-Fri29Jun2007142734.0100-jpff@codemist.co.uk. all of those ugly chars to . as each single placeholder? Something like MSGID_CLEAN=`echo $MSGID | tr '<>$+' '.'` grepmail -h "^Message-[Ii][Dd].*$MSGID_CLEAN" ~/Mail/_Lists/_filed/os-en.2007* or so. [Yes, I know there are sexier ways to munge, but this may be more easily handled by those whose fu is weak -- such as I :-] HTH & HANN :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt
On 06/04/2021 05.18, David T-G wrote:
Carlos, et al --
...and then Carlos E. R. said... % % On 06/04/2021 03.54, David Haller wrote: % >Hello, % > % >On Tue, 06 Apr 2021, Carlos E. R. wrote: % >>On Tuesday, 2021-04-06 at 00:40 +0200, David Haller wrote: % >>>On Mon, 05 Apr 2021, Carlos E. R. wrote: % >>>>grepmail -h ^Message-[Ii][dD].*"$MSGID" ~/Mail/_Lists/_filed/os-en.2007* ... % >>>$ grepmail -h "^Message-[Ii][Dd].*${MSGID//$/\\$}" \ % >>> ~/Mail/_Lists/_filed/os-en.2007* % >> % >>I don't understand how, but it works :-D
First, props to dnh for the wicked awesome var munging :-)
Hats off :-)
... % % It is the worst kind of string to grep for in bash. The separators % are '<' and '>', used for redirection. There are ''', '+', '$' and % who knows what more. [snip]
Indeed. Ugh.
But you're working in a pretty limited universe, and you aren't likely to come across a lot of near-duplicates. Why not just change
davidtg@gezebel:~> for MSGID in \ '<006501c7b50e$77cc3630$6764a290$@com>' \ '<2md$yDk+qZbGFwuq@dev.null.davjam.org>' \ '<8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>' do echo $MSGID | tr '<>$+' '.' ; done .006501c7b50e.77cc3630.6764a290.@com. .2md.yDk.qZbGFwuq@dev.null.davjam.org. .8993-Fri29Jun2007142734.0100-jpff@codemist.co.uk.
all of those ugly chars to . as each single placeholder?
Now that's a nice out of the box thinking!! :-D :-O
Something like
MSGID_CLEAN=`echo $MSGID | tr '<>$+' '.'` grepmail -h "^Message-[Ii][Dd].*$MSGID_CLEAN" ~/Mail/_Lists/_filed/os-en.2007*
or so. [Yes, I know there are sexier ways to munge, but this may be more easily handled by those whose fu is weak -- such as I :-]
Indeed! As the IDs come on a file, one line per ID, I can simply do a search and replace on that file of the nasty chars, and run the file again through my script as is. I could use... I have forgotten the tool. First sort the file, then find duplicate lines, just in case there are any. Ah, 'uniq'. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2021-04-06 at 10:09 +0200, Carlos E. R. wrote:
On 06/04/2021 05.18, David T-G wrote:
Carlos, et al --
...
% >> I don't understand how, but it works :-D
First, props to dnh for the wicked awesome var munging :-)
Hats off :-)
...
all of those ugly chars to . as each single placeholder?
Now that's a nice out of the box thinking!! :-D :-O
Something like
MSGID_CLEAN=`echo $MSGID | tr '<>$+' '.'` grepmail -h "^Message-[Ii][Dd].*$MSGID_CLEAN" ~/Mail/_Lists/_filed/os-en.2007*
or so. [Yes, I know there are sexier ways to munge, but this may be more easily handled by those whose fu is weak -- such as I :-]
Indeed! As the IDs come on a file, one line per ID, I can simply do a search and replace on that file of the nasty chars, and run the file again through my script as is.
(one reason to edit the file is that the single quote ''' is used on two of the IDs).
I could use... I have forgotten the tool. First sort the file, then find duplicate lines, just in case there are any. Ah, 'uniq'.
Done :-) cer@Telcontar:~/tmp/mailarchive> mcedit opensuse-2007-06-munged.m-id cer@Telcontar:~/tmp/mailarchive> sort opensuse-2007-06-munged.m-id | uniq -d cer@Telcontar:~/tmp/mailarchive> mcedit hacer cer@Telcontar:~/tmp/mailarchive> time ./hacer Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor not found 4673A349.5030701@gmx.net not found 200706012133.01430.thadeurj@terra.com.br not found 20070616184124.GF19067@blinkenlights.visv.net not found Pine.LNX.4.64.0706251153151.14573@nimrodel.valinor not found 200706071153.50160.wstephenson@suse.de not found JM200706162234065.4685828@pop.707.to not found BAY136-F6F3946075A76AD14C88DED2260@phx.gbl not found Done 2435 (8 failed) real 47m27,468s user 44m52,623s sys 2m18,341s cer@Telcontar:~/tmp/mailarchive> The actual IDs (the above can be munged, the '.' may not be dots) are: Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor 4673A349.5030701@gmx.net 200706012133.01430.thadeurj@terra.com.br 20070616184124.GF19067@blinkenlights.visv.net Pine.LNX.4.64.0706251153151.14573@nimrodel.valinor 200706071153.50160.wstephenson@suse.de JM200706162234065.4685828@pop.707.to BAY136-F6F3946075A76AD14C88DED2260@phx.gbl There are two of mine that I found in the sent folders. The others... maybe I can find them in the gmail archive, but I'll have to use the gmail web site to search manuall (my local copy was deleted long ago). - -- Cheers, Carlos E. R. (from openSUSE 15.2 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHIEARECADIWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYGwzVxQccm9iaW4ubGlz dGFzQGdteC5lcwAKCRC1MxgcbY1H1a1RAJ9E4BPfCIXUr4V7GDRGxJoNht8BeACe LoQWncJ9wfhGX21LAx22VwaEbnA= =zU1Z -----END PGP SIGNATURE-----
Hello, On Tue, 06 Apr 2021, Carlos E. R. wrote:
On Tuesday, 2021-04-06 at 10:09 +0200, Carlos E. R. wrote:
On 06/04/2021 05.18, David T-G wrote:
% >> I don't understand how, but it works :-D
First, props to dnh for the wicked awesome var munging :-)
Hats off :-)
Thanks :)
all of those ugly chars to . as each single placeholder?
Now that's a nice out of the box thinking!! :-D :-O
Or sed it to (hex-)escapes ... [..]
(one reason to edit the file is that the single quote ''' is used on two of the IDs). [..]> cer@Telcontar:~/tmp/mailarchive> mcedit opensuse-2007-06-munged.m-id [..]
Something like: $ sed -e "s/'/\\\x{27}/g;" -e 's/\+/\\x{2B}/g' -e 's/\$/\\x{24}/g' \ opensuse-2007-06.m-id > opensuse-2007-06-munged.m-id $ grepmail -Y 'Message-[Ii][Dd]' -h -f opensuse-2007-06-munged.m-id Add more replacements as neccessary. Or put the sed-exprs in a file like this: ==== munge-ids.sed === #!/bin/sed -f s/'/\\x{27}/g s/\+/\\x{2B}/g s/\$/\\x{24}/g ==== and run $ sed -f munge-ids.sed opensuse-2007-06.m-id > opensuse-2007-06-munged.m-id $ grepmail -Y 'Message-[Ii][Dd]' -h -f opensuse-2007-06-munged.m-id
There are two of mine that I found in the sent folders. The others... maybe I can find them in the gmail archive, but I'll have to use the gmail web site to search manuall (my local copy was deleted long ago).
I have a complete archive of opensuse(-en) of that month/year and can extract the missing messages, prune "Received" and other headers past the suse servers and send them as compressed mbox. Just send me the list of missing message-ids (and the rough timeframe). HTH, -dnh -- printk(KERN_ERR "happy meal: Eieee, rx config register gets greasy fries.\n"); linux-2.6.19/drivers/net/sunhme.c
On 06/04/2021 17.01, David Haller wrote:
Hello,
On Tue, 06 Apr 2021, Carlos E. R. wrote:
On Tuesday, 2021-04-06 at 10:09 +0200, Carlos E. R. wrote:
On 06/04/2021 05.18, David T-G wrote:
% >> I don't understand how, but it works :-D
First, props to dnh for the wicked awesome var munging :-)
Hats off :-)
Thanks :)
all of those ugly chars to . as each single placeholder?
Now that's a nice out of the box thinking!! :-D :-O
Or sed it to (hex-)escapes ...
[..]
(one reason to edit the file is that the single quote ''' is used on two of the IDs). [..]> cer@Telcontar:~/tmp/mailarchive> mcedit opensuse-2007-06-munged.m-id [..]
Something like:
$ sed -e "s/'/\\\x{27}/g;" -e 's/\+/\\x{2B}/g' -e 's/\$/\\x{24}/g' \ opensuse-2007-06.m-id > opensuse-2007-06-munged.m-id
$ grepmail -Y 'Message-[Ii][Dd]' -h -f opensuse-2007-06-munged.m-id
Add more replacements as neccessary. Or put the sed-exprs in a file like this:
==== munge-ids.sed === #!/bin/sed -f s/'/\\x{27}/g s/\+/\\x{2B}/g s/\$/\\x{24}/g ====
Oh, being just one time, I simply used the search and replace feture of
the editor, in this case mcedit:
+ --> .
' --> .
$ --> .
that was all that was needed. Then my script did the job, took 47
minutes to search:
#!/bin/bash
KEY=.2007tremis
# Using munged source id file.
#Done 2435 (8 failed)
#
#real 47m27,468s
#user 44m52,623s
#sys 2m18,341s
if [ -f output$KEY.mbox ]; then
rm output$KEY.mbox
fi
if [ -f notfound$KEY.ids ]; then
rm notfound$KEY.ids
fi
COUNT=0
FAILED=0
while read MSGID ; do
let "COUNT = $COUNT + 1"
grepmail -h "^Message-[Ii][Dd].*${MSGID}" \
~/Mail/_Lists/_filed/os-en.2007* > tmpoutput.mbox
# $? returns 0 always, can't check error code.
if [ -s tmpoutput.mbox ]; then
cat tmpoutput.mbox >> output$KEY.mbox
#echo "Found one ($COUNT)"
else
echo $MSGID >> notfound$KEY.ids
echo $COUNT " --- " $MSGID not found
let "FAILED = $FAILED + 1"
fi
done < opensuse-2007-06-munged.m-id
echo "Done $COUNT ($FAILED failed)"
That's it, it found all except these 8 (no "bad" chars):
Pine.LNX.4.64.0706071117040.15609@nimrodel.valinor
4673A349.5030701@gmx.net
200706012133.01430.thadeurj@terra.com.br
20070616184124.GF19067@blinkenlights.visv.net
Pine.LNX.4.64.0706251153151.14573@nimrodel.valinor
200706071153.50160.wstephenson@suse.de
JM200706162234065.4685828@pop.707.to
BAY136-F6F3946075A76AD14C88DED2260@phx.gbl
The two from @nimrodel.valinor I have the "Sent" folder copy, so I found
them. The other 6 are missing, but I may find them at my gmail account
[...] No, the web tool doesn't find any of them, despite me being
subscribed those days.
In fact, as I know the text of the two of mine, I can find those two,
and I can see the matching Message ID:
and run
$ sed -f munge-ids.sed opensuse-2007-06.m-id > opensuse-2007-06-munged.m-id $ grepmail -Y 'Message-[Ii][Dd]' -h -f opensuse-2007-06-munged.m-id
There are two of mine that I found in the sent folders. The others... maybe I can find them in the gmail archive, but I'll have to use the gmail web site to search manuall (my local copy was deleted long ago).
I have a complete archive of opensuse(-en) of that month/year and can extract the missing messages, prune "Received" and other headers past the suse servers and send them as compressed mbox. Just send me the list of missing message-ids (and the rough timeframe).
Well, it is: tickets #77701: post-mortem - mailing list migration https://progress.opensuse.org/issues/77701#note-46 +++······················ https://lists.opensuse.org/opensuse/opensuse-2007-06.mbox.gz - Missing https://lists.opensuse.org/limal-devel/limal-devel-2007-09.mbox.gz - Missing https://lists.opensuse.org/opensuse-commit/opensuse-commit-2019-08.mbox.gz - Corrupted https://lists.opensuse.org/opensuse-pt/opensuse-pt-2013-06.mbox.gz - Corrupted ······················++- opensuse-2007-06.mbox I have already, except those 6 messages. The others mentioned in the ticket I have nothing. Another problem would be removing at least some "received" headers. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)
Carlos, et al -- ...and then Carlos E. R. said... % % On 06/04/2021 05.18, David T-G wrote: % > ... % >come across a lot of near-duplicates. Why not just change % > % > davidtg@gezebel:~> for MSGID in \ % > '<006501c7b50e$77cc3630$6764a290$@com>' \ % > '<2md$yDk+qZbGFwuq@dev.null.davjam.org>' \ % > '<8993-Fri29Jun2007142734+0100-jpff@codemist.co.uk>' % > do echo $MSGID | tr '<>$+' '.' ; done % > .006501c7b50e.77cc3630.6764a290.@com. % > .2md.yDk.qZbGFwuq@dev.null.davjam.org. % > .8993-Fri29Jun2007142734.0100-jpff@codemist.co.uk. % > % >all of those ugly chars to . as each single placeholder? % % Now that's a nice out of the box thinking!! :-D :-O We do try :-) Sometimes the brute-force simple way beats the elegant approach! TMTOWTDI. % % > ... % >easily handled by those whose fu is weak -- such as I :-] % % Indeed! As the IDs come on a file, one line per ID, I can simply do % a search and replace on that file of the nasty chars, and run the % file again through my script as is. Woo hoo! Glad to hear it. % % I could use... I have forgotten the tool. First sort the file, then % find duplicate lines, just in case there are any. Ah, 'uniq'. I'm a big sort -u fan unless I'm planning on uniq -c :-) % % -- % Cheers / Saludos, % % Carlos E. R. :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt
participants (4)
-
Carlos E. R.
-
Carlos E. R.
-
David Haller
-
David T-G