Mailinglist Archive: opensuse (3261 mails)
| < Previous | Next > |
Re: [SLE] [OT sed awk grep regexp] How to grep URLs our of an html page
- From: Adilson Guilherme Vasconcelos Ribeiro <adilson@xxxxxxxxxxx>
- Date: Wed, 25 Apr 2001 20:12:44 -0300 (BRT)
- Message-id: <Pine.LNX.4.10.10104252007050.4436-100000@xxxxxxxxxxxxxxxxxxxxx>
Today, Jonathan Wilson wrote...
> Hey,
Hi,
>
> I want to get "grep" out an exact list of URLs from a whole buch of
> downloaded html pages. I can get as far as this sort of thing:
>
> www.onlinebible.net/notes.html:
> href="http://www.answersingenesis.org/TheWord/Files/Notes//mhcc.exe">Matthew
>
> Anyone know the magic string? :-)
maybe this can help (maybe not the best answer, but... :)
cat file.html | grep "your original grep expression"|
sed "s/<[aA] [Hh][Rr][Ee][Ff]=\(.*\)>/\1/g; s/\"//g"
the first command substitutes a
href="proto://abc.def.ghi/asd/qwe/asd" for
"proto://abc.def.ghi/asd/qwe/asd", and the second command strips the
quotes
hope it helps
>
Best Regards,
Adilson Ribeiro
> Hey,
Hi,
>
> I want to get "grep" out an exact list of URLs from a whole buch of
> downloaded html pages. I can get as far as this sort of thing:
>
> www.onlinebible.net/notes.html:
> href="http://www.answersingenesis.org/TheWord/Files/Notes//mhcc.exe">Matthew
>
> Anyone know the magic string? :-)
maybe this can help (maybe not the best answer, but... :)
cat file.html | grep "your original grep expression"|
sed "s/<[aA] [Hh][Rr][Ee][Ff]=\(.*\)>/\1/g; s/\"//g"
the first command substitutes a
href="proto://abc.def.ghi/asd/qwe/asd" for
"proto://abc.def.ghi/asd/qwe/asd", and the second command strips the
quotes
hope it helps
>
Best Regards,
Adilson Ribeiro
| < Previous | Next > |