On 2023-04-27 23:46, Brendan McKenna wrote:
Hi,
On 27/04/2023 22:03, Carlos E. R. wrote:
Hi,
....
The problem is here:
cer@Isengard:~> grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//'
The problem is above is a combination of two things -- it's grabbing the first line from the HTML file that contains "jpg", but the sed code doesn't account for the chance that the label text could be on the same line.
Right.
If you replace the sed command with:
sed -e 's/.*="//' -e 's/">.*//' -e 's/^/http:\/\/apod.nasa.gov\/apod\//'
That will work in both cases. The first bit (the first value for the "-e" parameter) s/.*="// gets rid of everything on the line up to and including the first occurance of the characters =" (in the href or SRC attribute, depending on whether the <IMG> or the <a> tag comes first). The second bit s/">.*// gets rid of everything after the first occurance of "> in the remaining bit of the string. The last bit adds the http://.../ stuff to the beginning of the URL.
Thanks :-) -- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)