This is not my post. This Serafino Conflitti is subverting mails from other people. Be careful. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar) On 2024-08-20 16:13, Serafino Conflitti wrote:
On 2023-04-28 09:33, Robert Webb via openSUSE Users wrote:
On Thu, 27 Apr 2023 23:03:05 +0200 (CEST), "Carlos E. R." <robin.listas@telefonica.net> wrote:
I have a script (from somebody else) to download the image of the day from the NASA. I have modified the script quite a bit to download a range of days, even years of them. One key function is failing on today's picture. This is the relevant section of the script: +++...................... function get_page { echo "Downloading page to find image" #wget http://apod.nasa.gov/apod/ --quiet -O /tmp/apod.html wget https://apod.nasa.gov/apod/ap${SHORT_TODAY}.html --quiet -O /tmp/apod.html grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' > /tmp/pic_url } get_page # Got the link to the image PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.jpg for $TODAY. Trying .png" try_png PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.png for $TODAY. html file saved" cp /tmp/apod.html $PICTURES_DIR/$TODAY.html return fi fi NAME=`basename $PICURL` PICTURE_NAME=${TODAY}_${NAME} ......................++- The script sets up variable SHORT_TODAY to 230427, so "get_page" downloads <https://apod.nasa.gov/apod/ap230426.html>. It is coughing up this error: You mean "<https://apod.nasa.gov/apod/ap230427.html>". ap230426.html is "the day before" you mention later below.
No, I wrote on the 27 ;-)
Do 20230427 Downloading page to find image /home/cer/bin/NASA Picture-Of-The-Day Wallpaper Script, mine, loop: line 98: [: too many arguments basename: extra operand ‘Tarantula</a>’ Try 'basename --help' for more information. The problem is here: cer@Isengard:~> grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' http://apod.nasa.gov/apod/image/1602/Tarantula-HST-ESO-annotated1800.jpgArou... the Tarantula</a> cer@Isengard:~> The source html goes like this: <a href="image/1602/Tarantula-HST-ESO-annotated1800.jpg">Around the Tarantula</a> are other star forming regions with young star clusters, filaments, and blown-out <a href="ap080327.html">bubble-shaped</a> clouds.
As Brendan pointed out, one of the problems is that the grep command gets the first line that contains "jpg".
Right.
That isn't even the line with the picture-of-the-day link, on this ap230427.html page, because the A.P.O.D. is a PNG instead. It is from the description. The correct link is from here:
When the jpg is not found, I do a second call asking for png instead: you can see it in the susepaste. And some are a youtube video, those I do not really want (it is for a screensaver).
2023 April 27 <br> <a href="image/2304/SuperBIT_tarantula.png"> <IMG SRC="image/2304/SuperBIT_tarantula_1024.png" alt="See Explanation. Clicking on the picture will download the highest resolution version available." style="max-width:100%"></a> You need to grab not just a link *to* an image, but one that *is* an inline image also. The file suffix doesn't matter. So, the following can replace the grep and sed line, and the PICURL code. -------------------- # Given an A.P.O.D. HTML page, this skips the lines down through the # page title heading. Then it extracts the HREF contents of the first # link that contains an IMG tag. # Requires GNU sed for the 'I' flag (ignore regex case) which is used. # Line-by-line description of ${HREF_EXTRACT_SCRIPT}: # # On each line, remove leading space. # Skip lines through the A.P.O.D. heading. # Skip blank lines. # If IMG tag, goto :img # Save this line to the Hold Space. # Next input line. (and go to the beginning of the script) # :img # Get (the previous line) from the Hold Space. # If not an 'A' tag with HREF, next input line. # Extract the HREF contents. # Print it. # Quit after the first print. HREF_EXTRACT_SCRIPT=' s/^ *// 1,/^<h1> *Astronomy Picture of the Day/Id /^$/d /^<img *src="/Ib img x b :img x /^<a href="\([^<>"]\+\)">.*$/I!b s//\1/ p q ' [...] wget -O /tmp/apod.html ... PIC_HREF="$(sed -n -e "$HREF_EXTRACT_SCRIPT" /tmp/apod.html)" if [ -z "$PIC_HREF" ] ;then printf '%s' "Not found picture.xxx for $TODAY... " cp -T /tmp/apod.html $PICTURES_DIR/$TODAY.html \ && echo " html file saved" return 1 else PICURL="http://apod.nasa.gov/apod/${PIC_HREF}" # printf '%s\n' "$PICURL" > /tmp/pic_url fi NAME="$(basename "$PIC_HREF")" [...] --------------------
:-o
Thanks.