Mailinglist Archive: opensuse (3261 mails)

< Previous Next >
Re: [SLE] [OT sed awk grep regexp] How to grep URLs our of an html page
  • From: Adilson Guilherme Vasconcelos Ribeiro <adilson@xxxxxxxxxxx>
  • Date: Wed, 25 Apr 2001 20:12:44 -0300 (BRT)
  • Message-id: <Pine.LNX.4.10.10104252007050.4436-100000@xxxxxxxxxxxxxxxxxxxxx>
Today, Jonathan Wilson wrote...

> Hey,

Hi,
>
> I want to get "grep" out an exact list of URLs from a whole buch of
> downloaded html pages. I can get as far as this sort of thing:
>
> www.onlinebible.net/notes.html:
> href="http://www.answersingenesis.org/TheWord/Files/Notes//mhcc.exe";>Matthew
>
> Anyone know the magic string? :-)

maybe this can help (maybe not the best answer, but... :)

cat file.html | grep "your original grep expression"|
sed "s/<[aA] [Hh][Rr][Ee][Ff]=\(.*\)>/\1/g; s/\"//g"

the first command substitutes a
href="proto://abc.def.ghi/asd/qwe/asd" for
"proto://abc.def.ghi/asd/qwe/asd", and the second command strips the
quotes

hope it helps
>
Best Regards,
Adilson Ribeiro


< Previous Next >
References