Hi all, I've a lot of text files like the one you can see at the end of my mail (starting with Message-ID). I need to import parts of those messages into mysql: Message-ID, Subject, From, Newsgroups, and the text which is in html. I want to first create the text files containing only the information I want to import and then import them in mysql Using "awk '$1~/Message-ID/ || $1~/Subject/ || $1~/From/ || $1~/Newsgroups/ ' 4060bb67.txt > txt1.awk". I'm able to take the first fields and using "awk ' /<body>/,/<\/body>/ ' 4060bb67.txt > text2.awk" the text in html but when I try one unique command it just doesn't work. Any idea? Is there perhaps a more elegant way of doing the same thing? Thanks Gaël Message-ID: <417ee59f@mail-ha1> Subject: From: renu [renu@solicomm.net] X-Authenticated-User: renu Xref: mail-ha1 Practice:76 Seminar: Main Path: mail-ha1!not-for-mail Action: NNTP-Posting-Host: mail.fntc.ac.fj Date: 27 Oct 2004 02:02:39 +0200 Followup-To: Lines: 10 X-Trace: mail-ha1 1098835359 202.62.126.2 (27 Oct 2004 02:02:39 +0200) State: 0 Content-Type: text/html Newsgroups: Practice <html> <head> </head> <body> <p> </p> </body> </html>
Thu, 17 Feb 2005, by g.lams@itcilo.org:
Using "awk '$1~/Message-ID/ || $1~/Subject/ || $1~/From/ || $1~/Newsgroups/ ' 4060bb67.txt > txt1.awk". I'm able to take the first fields and using "awk ' /<body>/,/<\/body>/ ' 4060bb67.txt > text2.awk" the text in html but when I try one unique command it just doesn't work.
What does that mean: "one unique command", and please define "doesn't work" Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + MSN: twe-msn@ferrets4me.xs4all.nl See headers for PGP/GPG info. +
Hi Thanks for the reply. I wanted to use one unique command to keep in the new file only the Message-ID, the Subject, the From, and the "Body". For the time being I run those two commands sequentially. Actually it's not a blocking problem. The fast is that I want to import those files in mysql, and, for what I now, I need to have a file like this: field1:field2;field3;filed4 I can do this for the first three fields but the forth one is like this: <body> <p> xghdgdfgfgdff </p> <p> xgdsfssdfsdf fsdfsfsdfhdgdfgfgdff </p> </body> What I would need is to have everything on one row, i.e removing the carrigare return. Is there a way to do it? Regarsa Gaël "Theo v. Werkhoven" <twe-suse.e@ferrets4me.xs4all.nl> wrote on 18/02/2005 00.12.27:
Thu, 17 Feb 2005, by g.lams@itcilo.org:
Using "awk '$1~/Message-ID/ || $1~/Subject/ || $1~/From/ || $1~/Newsgroups/ ' 4060bb67.txt > txt1.awk". I'm able to take the first fields and using "awk ' /<body>/,/<\/body>/ ' 4060bb67.txt > text2.awk" the text in html but when I try one unique command it just doesn't work.
What does that mean: "one unique command", and please define "doesn't work"
Theo --
Tue, 01 Mar 2005, by g.lams@itcilo.org:
Hi
Thanks for the reply.
Do not toppost again please, it's a pita to keep threads orderly that way.
I wanted to use one unique command to keep in the new file only the Message-ID, the Subject, the From, and the "Body". For the time being I run those two commands sequentially. Actually it's not a blocking problem. The fast is that I want to import those files in mysql, and, for what I now, I need to have a file like this: field1:field2;field3;filed4
I can do this for the first three fields but the forth one is like this: <body> <p> xghdgdfgfgdff </p> <p> xgdsfssdfsdf fsdfsfsdfhdgdfgfgdff </p> </body>
What I would need is to have everything on one row, i.e removing the carrigare return. Is there a way to do it?
#v+ awk '/Message-ID/ || /Subject/ || /From/ || /Newsgroups/ \ {printf "%s\n", $1 >"headers.txt"}; /<body>/,/<\/body>/ {printf "%s;", $1 >"body.txt"}' <somefile.txt #v- Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + MSN: twe-msn@ferrets4me.xs4all.nl See headers for PGP/GPG info. +
Theo, On Tuesday 01 March 2005 14:13, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by g.lams@itcilo.org:
Hi
Thanks for the reply.
Do not toppost again please, it's a pita to keep threads orderly that way.
Why not leave it to your mail software to keep threads orderly? This dogmatic insistence that top-posting is always wrong is just mindlessness. Sometimes top-posting is the best choice. Randall Schulz
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
On Tuesday 01 March 2005 14:13, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by g.lams@itcilo.org:
Hi
Thanks for the reply.
Do not toppost again please, it's a pita to keep threads orderly that way.
Why not leave it to your mail software to keep threads orderly?
I don't know about yours, but my MUA isn't able to repair topposts, and I do not call a inconsiderate and lazy toppost orderly.
This dogmatic insistence that top-posting is always wrong is just mindlessness. Sometimes top-posting is the best choice.
No, it is not. It might seem so for the poster because it saves a few seconds of *his* "precious" time, but the next replier might have to repair the broken post, to make sense of what *he* has to say, thereby being forced to invest much more time on that post than strictly neccessary. This is a mailing *list*, meaning that (generaly speaking) more then 2 people are involved in a discussion. To keep any disccussion sane it is vital that everybody uses the same set of rules, and unless I'm mistaken; replying under a quote and trimming excess material is still the preferred SOP here. Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + MSN: twe-msn@ferrets4me.xs4all.nl See headers for PGP/GPG info. +
* Theo v. Werkhoven <twe-suse.e@ferrets4me.xs4all.nl> [03-01-05 17:49]:
This is a mailing *list*, meaning that (generaly speaking) more then 2 people are involved in a discussion. To keep any disccussion sane it is vital that everybody uses the same set of rules, and unless I'm mistaken; replying under a quote and trimming excess material is still the preferred SOP here.
Yes -- Patrick Shanahan Registered Linux User #207535 http://wahoo.no-ip.org @ http://counter.li.org HOG # US1244711 Photo Album: http://wahoo.no-ip.org/gallery
Theo, May I demand that you cut down on all that irrelevant, pointless clutter you use as a signature? It's a pain in the ass to have to trim it out of every one of your posts to which I reply. [You'll note that since this content does not directly address your most recent post, it does not need to be interleaved with that text and is, in fact, _easier_ to deal with if put at the top where it is most easily seen and read.] Randall Schulz On Tuesday 01 March 2005 14:48, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
On Tuesday 01 March 2005 14:13, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by g.lams@itcilo.org:
Hi
Thanks for the reply.
Do not toppost again please, it's a pita to keep threads orderly that way.
Why not leave it to your mail software to keep threads orderly?
I don't know about yours, but my MUA isn't able to repair topposts, and I do not call a inconsiderate and lazy toppost orderly.
This dogmatic insistence that top-posting is always wrong is just mindlessness. Sometimes top-posting is the best choice.
No, it is not. It might seem so for the poster because it saves a few seconds of *his* "precious" time, but the next replier might have to repair the broken post, to make sense of what *he* has to say, thereby being forced to invest much more time on that post than strictly neccessary. This is a mailing *list*, meaning that (generaly speaking) more then 2 people are involved in a discussion. To keep any disccussion sane it is vital that everybody uses the same set of rules, and unless I'm mistaken; replying under a quote and trimming excess material is still the preferred SOP here.
Theo
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
May I demand that you cut down on all that irrelevant, pointless clutter you use as a signature? It's a pain in the ass to have to trim it out of every one of your posts to which I reply.
It's not my fault you use a broken piece of software. The MUA I'm using is very capable of trimming signatures by itself. Furthermore, if you have such a problem with all that clutter, I suggest you pass by my posts from now on.
[You'll note that since this content does not directly address your most recent post, it does not need to be interleaved with that text and is, in fact, _easier_ to deal with if put at the top where it is most easily seen and read.]
I only notice you insist on making a PITA of yourself, which is your given right, just as it is my right to.ignore you for the time being. Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + MSN: twe-msn@ferrets4me.xs4all.nl See headers for PGP/GPG info. +
Theo, On Tuesday 01 March 2005 15:15, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
May I demand that you cut down on all that irrelevant, pointless clutter you use as a signature? It's a pain in the ass to have to trim it out of every one of your posts to which I reply.
It's not my fault you use a broken piece of software. The MUA I'm using is very capable of trimming signatures by itself. Furthermore, if you have such a problem with all that clutter, I suggest you pass by my posts from now on.
Other people know to use two hyphens _followed by a space_ to delineate their signature. When that convention is followed, KMail does properly excise the signature when adding quoted material to its replies.
[You'll note that since this content does not directly address your most recent post, it does not need to be interleaved with that text and is, in fact, _easier_ to deal with if put at the top where it is most easily seen and read.]
I only notice you insist on making a PITA of yourself, which is your given right, just as it is my right to.ignore you for the time being.
Then why on Earth aren't you ignoring me? I only posted to defend someone you chastised for not hewing to your mindless, dogmatic insistence that there never be any top-posted text whatsoever.
Theo
Randall Schulz
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
On Tuesday 01 March 2005 15:15, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
May I demand that you cut down on all that irrelevant, pointless clutter you use as a signature? It's a pain in the ass to have to trim it out of every one of your posts to which I reply.
It's not my fault you use a broken piece of software. The MUA I'm using is very capable of trimming signatures by itself. Furthermore, if you have such a problem with all that clutter, I suggest you pass by my posts from now on.
Other people know to use two hyphens _followed by a space_ to delineate their signature. When that convention is followed, KMail does properly excise the signature when adding quoted material to its replies.
|xxd|less 0000c20: 662c 2077 6869 6368 2069 7320 796f 7572 f, which is your 0000c30: 0a67 6976 656e 2072 6967 6874 2c20 6a75 .given right, ju 0000c40: 7374 2061 7320 6974 2069 7320 6d79 2072 st as it is my r 0000c50: 6967 6874 2074 6f2e 6967 6e6f 7265 2079 ight to.ignore y 0000c60: 6f75 2066 6f72 2074 6865 2074 696d 650a ou for the time. 0000c70: 6265 696e 672e 0a0a 5468 656f 0a2d 2d20 being...Theo.-- 0000c80: 0a54 6865 6f20 762e 2057 6572 6b68 6f76 .Theo v. Werkhov 0000c90: 656e 2020 2020 5265 6769 7374 6572 6564 en Registered 0000ca0: 204c 696e 7578 2075 7365 7223 2039 3938 Linux user# 998 0000cb0: 3732 2068 7474 703a 2f2f 636f 756e 7465 72 http://counte 0000cc0: 722e 6c69 2e6f 7267 0a49 4342 4d20 3532 r.li.org.ICBM 52 0000cd0: 2031 3320 3236 4e20 2c20 3420 3239 2034 13 26N , 4 29 4 0000ce0: 3745 2e20 2020 2020 2b20 2020 2020 2049 7E. + I 0000cf0: 4351 3a20 3237 3732 3137 3133 310a 5355 CQ: 277217131.SU 0000d00: 5345 2039 2e32 2020 2020 2020 2020 2020 SE 9.2 Look in line 0000c70 at the end: "0a2d 2d20", and in line 0000c80 which begins with "0a". Nothing wrong with my .sig.
[You'll note that since this content does not directly address your most recent post, it does not need to be interleaved with that text and is, in fact, _easier_ to deal with if put at the top where it is most easily seen and read.]
I only notice you insist on making a PITA of yourself, which is your given right, just as it is my right to.ignore you for the time being.
Then why on Earth aren't you ignoring me?
I'm an impossible optimist.
I only posted to defend someone you chastised for not hewing to your mindless, dogmatic insistence that there never be any top-posted text whatsoever.
I asked OP politely. Show my chastising please. And it's my _using_ my mind that makes me decide *not* to toppost. Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + See headers for PGP/GPG info.
The, On Tuesday 01 March 2005 15:42, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
On Tuesday 01 March 2005 15:15, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
May I demand that you cut down on all that irrelevant, pointless clutter you use as a signature? It's a pain in the ass to have to trim it out of every one of your posts to which I reply.
It's not my fault you use a broken piece of software. The MUA I'm using is very capable of trimming signatures by itself. Furthermore, if you have such a problem with all that clutter, I suggest you pass by my posts from now on.
Other people know to use two hyphens _followed by a space_ to delineate their signature. When that convention is followed, KMail does properly excise the signature when adding quoted material to its replies.
|xxd|less
0000c20: 662c 2077 6869 6368 2069 7320 796f 7572 f, which is your 0000c30: 0a67 6976 656e 2072 6967 6874 2c20 6a75 .given right, ju 0000c40: 7374 2061 7320 6974 2069 7320 6d79 2072 st as it is my r 0000c50: 6967 6874 2074 6f2e 6967 6e6f 7265 2079 ight to.ignore y 0000c60: 6f75 2066 6f72 2074 6865 2074 696d 650a ou for the time. 0000c70: 6265 696e 672e 0a0a 5468 656f 0a2d 2d20 being...Theo.-- 0000c80: 0a54 6865 6f20 762e 2057 6572 6b68 6f76 .Theo v. Werkhov 0000c90: 656e 2020 2020 5265 6769 7374 6572 6564 en Registered 0000ca0: 204c 696e 7578 2075 7365 7223 2039 3938 Linux user# 998 0000cb0: 3732 2068 7474 703a 2f2f 636f 756e 7465 72 http://counte 0000cc0: 722e 6c69 2e6f 7267 0a49 4342 4d20 3532 r.li.org.ICBM 52 0000cd0: 2031 3320 3236 4e20 2c20 3420 3239 2034 13 26N , 4 29 4 0000ce0: 3745 2e20 2020 2020 2b20 2020 2020 2049 7E. + I 0000cf0: 4351 3a20 3237 3732 3137 3133 310a 5355 CQ: 277217131.SU 0000d00: 5345 2039 2e32 2020 2020 2020 2020 2020 SE 9.2
Look in line 0000c70 at the end: "0a2d 2d20", and in line 0000c80 which begins with "0a". Nothing wrong with my .sig.
Perhaps KMail only removes a single signature block and the list server adds a signature block of its own.
[You'll note that since this content does not directly address your most recent post, it does not need to be interleaved with that text and is, in fact, _easier_ to deal with if put at the top where it is most easily seen and read.]
I only notice you insist on making a PITA of yourself, which is your given right, just as it is my right to.ignore you for the time being.
Then why on Earth aren't you ignoring me?
I'm an impossible optimist.
Uh-huh.
I only posted to defend someone you chastised for not hewing to your mindless, dogmatic insistence that there never be any top-posted text whatsoever.
I asked OP politely. Show my chastising please.
You wrote:
Do not toppost again please, it's a pita to keep threads orderly that way.
In my book, that's a chastisement. And I still don't see what this has to do with keeping threads orderly. That's handled by the In-Reply-To and References headers.
And it's my _using_ my mind that makes me decide *not* to toppost.
Then leave it at that and allow others to do the same, even if they come to a different conclusion, OK?
Theo
Randall Schulz
Tue, 01 Mar 2005, by rschulz@sonic.net:
The,
On Tuesday 01 March 2005 15:42, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by rschulz@sonic.net:
[..]> > > I only posted to defend someone you chastised for not hewing to
your mindless, dogmatic insistence that there never be any top-posted text whatsoever.
I asked OP politely. Show my chastising please.
You wrote:
Do not toppost again please, it's a pita to keep threads orderly that way.
In my book, that's a chastisement.
We use different books. That's ok.
And I still don't see what this has to do with keeping threads orderly. That's handled by the In-Reply-To and References headers.
An ordely thread is also one where I don't have to ask myself what (part of a quote) topposters are replying to.
And it's my _using_ my mind that makes me decide *not* to toppost.
Then leave it at that and allow others to do the same, even if they come to a different conclusion, OK?
Um, no. I still have a problem with inconsiderate TOFU, no matter what you think of that. When it happens on a thread I'm not involved in: no problem, but when somenone I try to help does it, It becomes my problem, and so it also becomes the problem of the someone who tries to seek help. Sorry Randalf, agreeing to disagree is the best we can do I'm afraid. Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + See headers for PGP/GPG info.
Hi All Sorry for the problem I seem to have created, actually I just "replied to all" removing your personal e-mail (on my client, Lotus Notes there is no "reply to the list" function). I always did like this on this list and on other lists. Actually, I don't care about saving a few seconds and I'm ready to do it "the right way", it's just that I'm not sure to understand what you mean by toppost Regards Gaël "Theo v. Werkhoven" <twe-suse.e@ferrets4me.xs4all.nl> wrote on 01/03/2005 23.48.34:
Tue, 01 Mar 2005, by rschulz@sonic.net:
Theo,
On Tuesday 01 March 2005 14:13, Theo v. Werkhoven wrote:
Tue, 01 Mar 2005, by g.lams@itcilo.org:
Hi
Thanks for the reply.
Do not toppost again please, it's a pita to keep threads orderly that way.
Why not leave it to your mail software to keep threads orderly?
I don't know about yours, but my MUA isn't able to repair topposts, and I do not call a inconsiderate and lazy toppost orderly.
This dogmatic insistence that top-posting is always wrong is just mindlessness. Sometimes top-posting is the best choice.
No, it is not. It might seem so for the poster because it saves a few seconds of *his* "precious" time, but the next replier might have to repair the broken post, to make sense of what *he* has to say, thereby being forced to invest much more time on that post than strictly neccessary. This is a mailing *list*, meaning that (generaly speaking) more then 2 people are involved in a discussion. To keep any disccussion sane it is vital that everybody uses the same set of rules, and unless I'm mistaken; replying under a quote and trimming excess material is still the preferred SOP here.
Theo -- Theo v. Werkhoven Registered Linux user# 99872 http://counter.li.org ICBM 52 13 26N , 4 29 47E. + ICQ: 277217131 SUSE 9.2 + Jabber: muadib@jabber.xs4all.nl Kernel 2.6.8 + MSN: twe-msn@ferrets4me.xs4all.nl See headers for PGP/GPG info. +
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
On Wed, 2005-03-02 at 02:26, g.lams@itcilo.org wrote:
Hi All
Sorry for the problem I seem to have created, actually I just "replied to all" removing your personal e-mail (on my client, Lotus Notes there is no "reply to the list" function). I always did like this on this list and on other lists.
Actually, I don't care about saving a few seconds and I'm ready to do it "the right way", it's just that I'm not sure to understand what you mean by toppost
Regards
Gaël
It means putting your reply at the -top- of the email instead inline or at the bottom. It makes it easier to follow what is going on. Do you read a book from back to front? -- Ken Schneider UNIX since 1989, linux since 1994, SuSE since 1998 * Only reply to the list please* "The day Microsoft makes something that doesn't suck is probably the day they start making vacuum cleaners." -Ernst Jan Plugge
participants (5)
-
g.lams@itcilo.org
-
Ken Schneider
-
Patrick Shanahan
-
Randall R Schulz
-
Theo v. Werkhoven