[opensuse] thunderbird: How to get all email adresses from one folder?
Hello, there once was a tunderbird add-on "email address crawler" that did exactly what I wanted: - extract all names and email addresses from all mails in /one/ folder in thunderbird and add them to a new addressbook or list I can't find anything similar (this one doesn't work anymore and will not be updated because the author doesn't like the current add-on policy) Do you know an add-on or any other tool that can extract that data from all mails in a thunderbird folder? I don't care if it adds to a thunderbird address book or simply writes a textfile. thanks for hints! Daniel -- Daniel Bauer photographer Basel Málaga https://www.patreon.com/danielbauer https://www.daniel-bauer.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/06/2020 11.57, Daniel Bauer wrote:
Hello,
there once was a tunderbird add-on "email address crawler" that did exactly what I wanted:
- extract all names and email addresses from all mails in /one/ folder in thunderbird and add them to a new addressbook or list
I can't find anything similar (this one doesn't work anymore and will not be updated because the author doesn't like the current add-on policy)
Do you know an add-on or any other tool that can extract that data from all mails in a thunderbird folder? I don't care if it adds to a thunderbird address book or simply writes a textfile.
I might concoct something with CLI tools if the folder is local and in mbox format (which is the default). I would try formail, and maybe procmail. I have not done this before, so I would have to read the man page again ;-) -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
On 06/13/2020 06:01 AM, Carlos E. R. wrote:
On 13/06/2020 11.57, Daniel Bauer wrote:
Hello,
there once was a tunderbird add-on "email address crawler" that did exactly what I wanted:
- extract all names and email addresses from all mails in /one/ folder in thunderbird and add them to a new addressbook or list
I can't find anything similar (this one doesn't work anymore and will not be updated because the author doesn't like the current add-on policy)
Do you know an add-on or any other tool that can extract that data from all mails in a thunderbird folder? I don't care if it adds to a thunderbird address book or simply writes a textfile.
I might concoct something with CLI tools if the folder is local and in mbox format (which is the default).
I would try formail, and maybe procmail. I have not done this before, so I would have to read the man page again ;-)
awk is another capable swiss-army-knife for text. Getting the To: From: names and address from your mail files is relatively trivial (implementing a full regex for matching all possible e-mail address per http://www.ietf.org/rfc/rfc5322.txt is a bit of a challenge) ... but ... and there is always a ... but ... doing anything with the thunderbird address book (abook.mab) is -- not (reasonably) possible. (unless you just happened to be one of the Mork (from Ork) developers of the undocumented Mork database format in the 80's... See: https://wiki.mozilla.org/Address_Book (clusterfsck...) Trivially, you can dump the To:/From: From:/To: address from an mbox file with: awk '/^From:/ || /^To:/' mboxfile -- David C. Rankin, J.D.,P.E.
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-) egrep '^(From|To):' mboxfile -- Per Jessen, Zürich (15.9°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, 13 Jun 2020 20:40:32 +0200 Per Jessen <per@computer.org> wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
egrep '^(From|To):' mboxfile
Now who's being minimalist :) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Sat, 13 Jun 2020 20:40:32 +0200 Per Jessen <per@computer.org> wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
egrep '^(From|To):' mboxfile
Now who's being minimalist :)
Touché! Thanks, I love a clever comeback. :-) -- Per Jessen, Zürich (16.1°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
egrep '^(From|To):' mboxfile
It's not so easy doing any grep/awking on the to/from lines - the format is quite varied. On my TB with IMAP accounts only, I tried this: find ~/.thunderbird/ImapMail -type f ! name \*msf | \ xargs egrep -l '^(From|To):' | \ sed -e '/^[^<]\+<\([^,]\+\)>.*$/\1/' which got me the addresses that are enclosed in <>. -- Per Jessen, Zürich (15.8°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
egrep '^(From|To):' mboxfile
It's not so easy doing any grep/awking on the to/from lines - the format is quite varied.
On my TB with IMAP accounts only, I tried this:
find ~/.thunderbird/ImapMail -type f ! name \*msf | \ xargs egrep -l '^(From|To):' | \ sed -e '/^[^<]\+<\([^,]\+\)>.*$/\1/'
which got me the addresses that are enclosed in <>.
and loads of other sh.. stuff - find ~/.thunderbird/ImapMail -type f ! name \*msf -print0 | \ xargs -0 egrep -l '^(From|To):' | \ sed -e '/^[^<]\+<\([^,]\+\)>.*$/\1/' |\ grep -v : | sort -fu (takes a while, mine is still running). -- Per Jessen, Zürich (16.4°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
find ~/.thunderbird/ImapMail -type f ! name \*msf | \ xargs egrep -l '^(From|To):' | \ sed -e '/^[^<]\+<\([^,]\+\)>.*$/\1/'
which got me the addresses that are enclosed in <>.
and loads of other sh.. stuff -
After finishing, still more sh.... It'll take some more fine tuning. -- Per Jessen, Zürich (16.0°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/13/2020 01:40 PM, Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
egrep '^(From|To):' mboxfile
That was meant to be a starting point, for follow-on in the general vein of: awk ' BEGIN { email = "^[a-zA-Z0-9]{1,78}[@][a-zA-Z0-9]{1,78}[.][a-zA-Z0-9]{1,3}$" } /^From:/ { if ($2 ~ email) from = $2 else unset from } /^To:/ && $2 ~ email { if (!from || from","$2 in a) next; a[from","$2]++ unset from } END { for (i in a) print i } ' mboxfile (not finished) -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/06/2020 14:40, Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
Indeed. Awk, perl, python or ruby or ny other such that can habndle REs are only of use if you are also going to process any found addresses.
egrep '^(From|To):' mboxfile
At the very least I'd add the following space :-) Adding the RE to recognise an address isn't exactly difficult either. You might want' eventually, to refine it to restrict the domains it can handle and filter though FGREP using a file of spam addresses. The only use I can think of for using 'formail' is in the syndromic case where there is a line in the body of the message that meets the RE. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 13/06/2020 14:40, Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
Indeed. Awk, perl, python or ruby or ny other such that can habndle REs are only of use if you are also going to process any found addresses.
egrep '^(From|To):' mboxfile
At the very least I'd add the following space :-) Adding the RE to recognise an address isn't exactly difficult either. You might want' eventually, to refine it to restrict the domains it can handle and filter though FGREP using a file of spam addresses.
The only use I can think of for using 'formail' is in the syndromic case where there is a line in the body of the message that meets the RE.
Using formail would probably be really useful to fold continued lines into one (-c). -- Per Jessen, Zürich (15.8°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 14/06/2020 19.40, Per Jessen wrote:
Anton Aylward wrote:
On 13/06/2020 14:40, Per Jessen wrote:
David C. Rankin wrote:
Trivially, you can dump the To:/From: From:/To: address from an mbox file with:
awk '/^From:/ || /^To:/' mboxfile
awk is overkill :-)
Indeed. Awk, perl, python or ruby or ny other such that can habndle REs are only of use if you are also going to process any found addresses.
egrep '^(From|To):' mboxfile
At the very least I'd add the following space :-) Adding the RE to recognise an address isn't exactly difficult either. You might want' eventually, to refine it to restrict the domains it can handle and filter though FGREP using a file of spam addresses.
The only use I can think of for using 'formail' is in the syndromic case where there is a line in the body of the message that meets the RE.
Using formail would probably be really useful to fold continued lines into one (-c).
I have forgotten how to use formail, but I managed to create something. I run: formail -s procmail ./.procmailrc-filter_log < in_rst_4 Where "in_rst_4" is the folder to scan in mbox format, and ".procmailrc-filter_log" is a procmail recipe. "formail -s" outputs an email at a time and feeds it to the program after the -s. cat .procmailrc-filter_log VERBOSE=on LOGFILE=procmail-test.log LOG="+++----> LOG START " :0f * ^From:.* | formail -X From: :0 a throwaway The resulting "throwaway" file has this format: From: suse-announce-uk-help@suse.com From: suse-linux-e-help@suse.com From: suse-programming-e-help@suse.com From: suse-announce-es-help@suse.com From: suse-security-help@suse.com From: suse-linux-s-help@suse.com From: suse-security-announce-help@suse.com But something is wrong, because it wrote 77 mails to my system inbox. In the log, one normal entry followed by a failed one: +++----> EMPEZANDO REGISTRO procmail: Match on "^From:.*" procmail: Executing "formail,-X,From:" procmail: Assigning "LASTFOLDER=throwaway" procmail: Opening "throwaway" procmail: Acquiring kernel-lock Folder: throwaway 54 procmail: Assigning "LOG=+++----> EMPEZANDO REGISTRO " +++----> EMPEZANDO REGISTRO procmail: Match on "^From:.*" procmail: Executing "formail,-X,From:" procmail: Error while writing to "formail" procmail: Rescue of unfiltered data succeeded procmail: Bypassed locking "/var/mail/cer.lock" procmail: Assigning "LASTFOLDER=/var/mail/cer" procmail: Opening "/var/mail/cer" procmail: Acquiring kernel-lock From ....@.... Fri Jul 21 19:59:53 2006 Subject: Re: [SLE] How to start a computer remotely - generic Q Folder: /var/mail/cer 365403 procmail: Assigning "LOG=+++----> EMPEZANDO REGISTRO " -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
David & Carlos & Daniel, et al -- ...and then David C. Rankin said... % % On 06/13/2020 06:01 AM, Carlos E. R. wrote: % > On 13/06/2020 11.57, Daniel Bauer wrote: ... % >> % >> - extract all names and email addresses from all mails in /one/ folder in % >> thunderbird and add them to a new addressbook or list ... % > % > I would try formail, and maybe procmail. I have not done this before, so I % > would have to read the man page again ;-) % % awk is another capable swiss-army-knife for text. Getting the To: From: names ... % % Trivially, you can dump the To:/From: From:/To: address from an mbox file with: % % awk '/^From:/ || /^To:/' mboxfile And don't forget that these lines can wrap ... From: me <me@example.com> To: you <you@example.com>, him <him@example.com>, \ther <her@example.com>, \tReallyStupidLongNameSalutation <ridiculouslongaddress@example.com>, \tanother <another@example.com> Cc: thistoo <thistoo@example.com> You'd need to continue based on a trailing comma and a leading tab. And how many is "all"? Which fields? Only what's in each email's current envelope, or anything that looks like an email address? And in what MTA's format(s)? Have fun :-) HANW :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/06/2020 07:01, Carlos E. R. wrote:
I might concoct something with CLI tools if the folder is local and in mbox format (which is the default).
Yes.
I would try formail, and maybe procmail. I have not done this before, so I would have to read the man page again ;-)
That's massive overkill. Surely oyu can use GREP and a nice regular expression to extract addresses(that in the header, not the body) to a text file for later processing? If you really want an all-in-one-gulp then the RE capabilities of Perl, Python or Ruby can easily do that, or if you lean towards the more sophisticated or have read the White Book, use AWK. I say this as someone who has (a) used both formail and procmail to build apps in the past and (b) someone who has extracted email addresses from a mbox. The issue is really not extracting the addresses; as I say, that's a simple REGEX matter. The issue is stuffing them into the address book. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 14/06/2020 19.20, Anton Aylward wrote:
On 13/06/2020 07:01, Carlos E. R. wrote:
I might concoct something with CLI tools if the folder is local and in mbox format (which is the default).
Yes.
I would try formail, and maybe procmail. I have not done this before, so I would have to read the man page again ;-)
That's massive overkill. Surely oyu can use GREP and a nice regular expression to extract addresses(that in the header, not the body) to a text file for later processing?
Why, when formail is designed already to extract and manipulate headers? I see it as the right tool for the purpose.
If you really want an all-in-one-gulp then the RE capabilities of Perl, Python or Ruby can easily do that, or if you lean towards the more sophisticated or have read the White Book, use AWK.
I don't talk any of those languages, so for me that would be a massive enterprise.
I say this as someone who has (a) used both formail and procmail to build apps in the past and (b) someone who has extracted email addresses from a mbox.
The issue is really not extracting the addresses; as I say, that's a simple REGEX matter. The issue is stuffing them into the address book.
Not that simple, as it can catch body text that has the "From:" format. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
On 14/06/2020 14:53, Carlos E. R. wrote:
Not that simple, as it can catch body text that has the "From:" format.
Did I ever mention adding a space after the colon? -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 14/06/2020 21.38, Anton Aylward wrote:
On 14/06/2020 14:53, Carlos E. R. wrote:
Not that simple, as it can catch body text that has the "From:" format.
Did I ever mention adding a space after the colon?
So what? I may and do have posts that in the body contain From: somebody To: Somebody else Subject: Helo Date: in the past text . <-- space -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
On Sun, 14 Jun 2020 20:53:21 +0200 "Carlos E. R." <robin.listas@telefonica.net> wrote:
On 14/06/2020 19.20, Anton Aylward wrote:
If you really want an all-in-one-gulp then the RE capabilities of Perl, Python or Ruby can easily do that, or if you lean towards the more sophisticated or have read the White Book, use AWK.
I don't talk any of those languages, so for me that would be a massive enterprise.
Ouch! That's a major admission IMHO, Carlos. I'd really recommend being familiar with at least one of those. For myself, I started with awk (when perl was perl 4) then learned to love perl (after it became 5) and now can write python to some extent. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 14/06/2020 23.41, Dave Howorth wrote:
On Sun, 14 Jun 2020 20:53:21 +0200 "Carlos E. R." <robin.listas@telefonica.net> wrote:
On 14/06/2020 19.20, Anton Aylward wrote:
If you really want an all-in-one-gulp then the RE capabilities of Perl, Python or Ruby can easily do that, or if you lean towards the more sophisticated or have read the White Book, use AWK.
I don't talk any of those languages, so for me that would be a massive enterprise.
Ouch! That's a major admission IMHO, Carlos. I'd really recommend being familiar with at least one of those. For myself, I started with awk (when perl was perl 4) then learned to love perl (after it became 5) and now can write python to some extent.
I've never hidden that fact :-) I can do bash, but my preferred language is Pascal, in the Borland variant. I also talk C, Assembler (68000), Basic... And Labview G(1). But I'm forgetting them. (1): <https://en.wikipedia.org/wiki/LabVIEW#/media/File:LabVIEW_Block_diagram.JPG> Yes, that is code, not a block diagram. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
On 13/06/2020 05:57, Daniel Bauer wrote:
I can't find anything similar (this one doesn't work anymore and will not be updated because the author doesn't like the current add-on policy)
YMMV. depending on the 'doesn't work any more' nature you might be able to download and hack the old version. One 'old version' I found that all i needed to do was change the field that said which versions it worked with :-) Of course some programmers work has more clarity than others, some document the whys and wherefores better than others. And some of the people who need to hack old code might not have the level of skill needed for that particular language. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am 14.06.20 um 19:11 schrieb Anton Aylward:
On 13/06/2020 05:57, Daniel Bauer wrote:
I can't find anything similar (this one doesn't work anymore and will not be updated because the author doesn't like the current add-on policy)
YMMV. depending on the 'doesn't work any more' nature you might be able to download and hack the old version. One 'old version' I found that all i needed to do was change the field that said which versions it worked with :-)
Thunderbird and firefox changed "something" basic i the way their add-ons work. That's what I read....
Of course some programmers work has more clarity than others, some document the whys and wherefores better than others.
And some of the people who need to hack old code might not have the level of skill needed for that particular language.
One of that people is sitting right here on the keyboard... I don't even understand regular expressions, all I can do is copy/paste them and often I don't because I can't understand what it does... I don't know python, ruby etc., and until now I thought "awk" was to express something awkward... -- Daniel Bauer photographer Basel Málaga https://www.patreon.com/danielbauer https://www.daniel-bauer.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (7)
-
Anton Aylward
-
Carlos E. R.
-
Daniel Bauer
-
Dave Howorth
-
David C. Rankin
-
David T-G
-
Per Jessen