Re: [oS-en] sed question
On 2023-04-28 09:33, Robert Webb via openSUSE Users wrote:
On Thu, 27 Apr 2023 23:03:05 +0200 (CEST), "Carlos E. R." <robin.listas@telefonica.net> wrote:
I have a script (from somebody else) to download the image of the day from the NASA. I have modified the script quite a bit to download a range of days, even years of them. One key function is failing on today's picture. This is the relevant section of the script: +++...................... function get_page { echo "Downloading page to find image" #wget http://apod.nasa.gov/apod/ --quiet -O /tmp/apod.html wget https://apod.nasa.gov/apod/ap${SHORT_TODAY}.html --quiet -O /tmp/apod.html grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' > /tmp/pic_url } get_page # Got the link to the image PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.jpg for $TODAY. Trying .png" try_png PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.png for $TODAY. html file saved" cp /tmp/apod.html $PICTURES_DIR/$TODAY.html return fi fi NAME=`basename $PICURL` PICTURE_NAME=${TODAY}_${NAME} ......................++- The script sets up variable SHORT_TODAY to 230427, so "get_page" downloads <https://apod.nasa.gov/apod/ap230426.html>. It is coughing up this error: You mean "<https://apod.nasa.gov/apod/ap230427.html>". ap230426.html is "the day before" you mention later below.
No, I wrote on the 27 ;-)
Do 20230427 Downloading page to find image /home/cer/bin/NASA Picture-Of-The-Day Wallpaper Script, mine, loop: line 98: [: too many arguments basename: extra operand ‘Tarantula</a>’ Try 'basename --help' for more information. The problem is here: cer@Isengard:~> grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' http://apod.nasa.gov/apod/image/1602/Tarantula-HST-ESO-annotated1800.jpgArou... the Tarantula</a> cer@Isengard:~> The source html goes like this: <a href="image/1602/Tarantula-HST-ESO-annotated1800.jpg">Around the Tarantula</a> are other star forming regions with young star clusters, filaments, and blown-out <a href="ap080327.html">bubble-shaped</a> clouds.
As Brendan pointed out, one of the problems is that the grep command gets the first line that contains "jpg".
Right.
That isn't even the line with the picture-of-the-day link, on this ap230427.html page, because the A.P.O.D. is a PNG instead. It is from the description. The correct link is from here:
When the jpg is not found, I do a second call asking for png instead: you can see it in the susepaste. And some are a youtube video, those I do not really want (it is for a screensaver).
2023 April 27 <br> <a href="image/2304/SuperBIT_tarantula.png"> <IMG SRC="image/2304/SuperBIT_tarantula_1024.png" alt="See Explanation. Clicking on the picture will download the highest resolution version available." style="max-width:100%"></a> You need to grab not just a link *to* an image, but one that *is* an inline image also. The file suffix doesn't matter. So, the following can replace the grep and sed line, and the PICURL code. -------------------- # Given an A.P.O.D. HTML page, this skips the lines down through the # page title heading. Then it extracts the HREF contents of the first # link that contains an IMG tag. # Requires GNU sed for the 'I' flag (ignore regex case) which is used. # Line-by-line description of ${HREF_EXTRACT_SCRIPT}: # # On each line, remove leading space. # Skip lines through the A.P.O.D. heading. # Skip blank lines. # If IMG tag, goto :img # Save this line to the Hold Space. # Next input line. (and go to the beginning of the script) # :img # Get (the previous line) from the Hold Space. # If not an 'A' tag with HREF, next input line. # Extract the HREF contents. # Print it. # Quit after the first print. HREF_EXTRACT_SCRIPT=' s/^ *// 1,/^<h1> *Astronomy Picture of the Day/Id /^$/d /^<img *src="/Ib img x b :img x /^<a href="\([^<>"]\+\)">.*$/I!b s//\1/ p q ' [...] wget -O /tmp/apod.html ... PIC_HREF="$(sed -n -e "$HREF_EXTRACT_SCRIPT" /tmp/apod.html)" if [ -z "$PIC_HREF" ] ;then printf '%s' "Not found picture.xxx for $TODAY... " cp -T /tmp/apod.html $PICTURES_DIR/$TODAY.html \ && echo " html file saved" return 1 else PICURL="http://apod.nasa.gov/apod/${PIC_HREF}" # printf '%s\n' "$PICURL" > /tmp/pic_url fi NAME="$(basename "$PIC_HREF")" [...] --------------------
:-o Thanks. -- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)
This is not my post. This Serafino Conflitti is subverting mails from other people. Be careful. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar) On 2024-08-20 16:13, Serafino Conflitti wrote:
On 2023-04-28 09:33, Robert Webb via openSUSE Users wrote:
On Thu, 27 Apr 2023 23:03:05 +0200 (CEST), "Carlos E. R." <robin.listas@telefonica.net> wrote:
I have a script (from somebody else) to download the image of the day from the NASA. I have modified the script quite a bit to download a range of days, even years of them. One key function is failing on today's picture. This is the relevant section of the script: +++...................... function get_page { echo "Downloading page to find image" #wget http://apod.nasa.gov/apod/ --quiet -O /tmp/apod.html wget https://apod.nasa.gov/apod/ap${SHORT_TODAY}.html --quiet -O /tmp/apod.html grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' > /tmp/pic_url } get_page # Got the link to the image PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.jpg for $TODAY. Trying .png" try_png PICURL=`/bin/cat /tmp/pic_url` if [ -z $PICURL ]; then echo "Not found picture.png for $TODAY. html file saved" cp /tmp/apod.html $PICTURES_DIR/$TODAY.html return fi fi NAME=`basename $PICURL` PICTURE_NAME=${TODAY}_${NAME} ......................++- The script sets up variable SHORT_TODAY to 230427, so "get_page" downloads <https://apod.nasa.gov/apod/ap230426.html>. It is coughing up this error: You mean "<https://apod.nasa.gov/apod/ap230427.html>". ap230426.html is "the day before" you mention later below.
No, I wrote on the 27 ;-)
Do 20230427 Downloading page to find image /home/cer/bin/NASA Picture-Of-The-Day Wallpaper Script, mine, loop: line 98: [: too many arguments basename: extra operand ‘Tarantula</a>’ Try 'basename --help' for more information. The problem is here: cer@Isengard:~> grep -m 1 jpg /tmp/apod.html | sed -e 's/<//' -e 's/>//' -e 's/.*=//' -e 's/"//g' -e 's/^/http:\/\/apod.nasa.gov\/apod\//' http://apod.nasa.gov/apod/image/1602/Tarantula-HST-ESO-annotated1800.jpgArou... the Tarantula</a> cer@Isengard:~> The source html goes like this: <a href="image/1602/Tarantula-HST-ESO-annotated1800.jpg">Around the Tarantula</a> are other star forming regions with young star clusters, filaments, and blown-out <a href="ap080327.html">bubble-shaped</a> clouds.
As Brendan pointed out, one of the problems is that the grep command gets the first line that contains "jpg".
Right.
That isn't even the line with the picture-of-the-day link, on this ap230427.html page, because the A.P.O.D. is a PNG instead. It is from the description. The correct link is from here:
When the jpg is not found, I do a second call asking for png instead: you can see it in the susepaste. And some are a youtube video, those I do not really want (it is for a screensaver).
2023 April 27 <br> <a href="image/2304/SuperBIT_tarantula.png"> <IMG SRC="image/2304/SuperBIT_tarantula_1024.png" alt="See Explanation. Clicking on the picture will download the highest resolution version available." style="max-width:100%"></a> You need to grab not just a link *to* an image, but one that *is* an inline image also. The file suffix doesn't matter. So, the following can replace the grep and sed line, and the PICURL code. -------------------- # Given an A.P.O.D. HTML page, this skips the lines down through the # page title heading. Then it extracts the HREF contents of the first # link that contains an IMG tag. # Requires GNU sed for the 'I' flag (ignore regex case) which is used. # Line-by-line description of ${HREF_EXTRACT_SCRIPT}: # # On each line, remove leading space. # Skip lines through the A.P.O.D. heading. # Skip blank lines. # If IMG tag, goto :img # Save this line to the Hold Space. # Next input line. (and go to the beginning of the script) # :img # Get (the previous line) from the Hold Space. # If not an 'A' tag with HREF, next input line. # Extract the HREF contents. # Print it. # Quit after the first print. HREF_EXTRACT_SCRIPT=' s/^ *// 1,/^<h1> *Astronomy Picture of the Day/Id /^$/d /^<img *src="/Ib img x b :img x /^<a href="\([^<>"]\+\)">.*$/I!b s//\1/ p q ' [...] wget -O /tmp/apod.html ... PIC_HREF="$(sed -n -e "$HREF_EXTRACT_SCRIPT" /tmp/apod.html)" if [ -z "$PIC_HREF" ] ;then printf '%s' "Not found picture.xxx for $TODAY... " cp -T /tmp/apod.html $PICTURES_DIR/$TODAY.html \ && echo " html file saved" return 1 else PICURL="http://apod.nasa.gov/apod/${PIC_HREF}" # printf '%s\n' "$PICURL" > /tmp/pic_url fi NAME="$(basename "$PIC_HREF")" [...] --------------------
:-o
Thanks.
On 8/20/24 10:07 AM, Carlos E. R. wrote:
This is not my post.
This Serafino Conflitti is subverting mails from other people. Be careful.
Carlos, See Masaru's reply to my report of the same issue. "Re: Why are packman package names not consistent?" Apparently this is legitimate and the result in an error that occurred while trying to sort out list operations on an iPhone??? If Masaru will vouch for Serifino -- I'm good with his explanation. Masaru doesn't have another alias "Jia Tan" -- doe he? -- David C. Rankin, J.D.,P.E.
On 2024-08-20 23:11, David C. Rankin wrote:
On 8/20/24 10:07 AM, Carlos E. R. wrote:
This is not my post.
This Serafino Conflitti is subverting mails from other people. Be careful.
Carlos,
See Masaru's reply to my report of the same issue. "Re: Why are packman package names not consistent?"
I saw it.
Apparently this is legitimate and the result in an error that occurred while trying to sort out list operations on an iPhone???
No, it is not legitimate. His name means "conflict" in Italian. It is someone trying to stir conflict.
If Masaru will vouch for Serifino -- I'm good with his explanation.
Masaru doesn't have another alias "Jia Tan" -- doe he?
Dunno. -- Cheers / Saludos, Carlos E. R. (from 15.5 x86_64 at Telcontar)
On 8/20/24 6:00 PM, Carlos E. R. wrote:
No, it is not legitimate. His name means "conflict" in Italian. It is someone trying to stir conflict.
If Masaru will vouch for Serifino -- I'm good with his explanation.
Masaru doesn't have another alias "Jia Tan" -- doe he?
Dunno.
Maybe Masaru knows more than he is letting on. I was certainly a bit unnerved seeing a post purporting to be from me that wasn't. The post from not-you originated from: Received: from smtpclient.apple (unknown [72.136.106.121]) by mail-c-01.b2b2c.net (Postfix) with ESMTPSA id 4WpBJd4y6QzJCwv for <users@lists.opensuse.org>; Tue, 20 Aug 2024 10:13:21 -0400 (EDT) The not-me post originated from: Received: from smtpclient.apple (unknown [72.136.117.178]) by mail-c-01.b2b2c.net (Postfix) with ESMTPSA id 4WnqGm2S5jzJCvh for <users@lists.opensuse.org>; Mon, 19 Aug 2024 19:55:28 -0400 (EDT) All from "Rogers Communications Canada Inc." IP block. Canada along with South America seem to be new proxy targets for spammers. Masaru, Can you add a bit more detail about what is going on here? -- David C. Rankin, J.D.,P.E.
Hello, In the Message; Subject : Re: [oS-en] sed question Message-ID : <b6bcdb4a-8ebf-4fda-90db-da2105383c61@gmail.com> Date & Time: Tue, 20 Aug 2024 16:11:54 -0500 [DCR] == "David C. Rankin" <drankinatty@gmail.com> has written: DCR> On 8/20/24 10:07 AM, Carlos E. R. wrote: DCR> > DCR> > This is not my post. DCR> > DCR> > This Serafino Conflitti is subverting mails from other people. Be careful. DCR> Carlos, DCR> See Masaru's reply to my report of the same issue. "Re: Why are packman DCR> package names not consistent?" DCR> Apparently this is legitimate and the result in an error that DCR> occurred while trying to sort out list operations on an iPhone??? DCR> If Masaru will vouch for Serifino -- I'm good with his explanation. DCR> Masaru doesn't have another alias "Jia Tan" -- doe he? Sorry, I was wrong. Serafino's PC or iPhone was infected with the virus Emotet, which is sending mail on its own. Here is the result of the virus check of the outgoing mail. $ clamscan 2 Loading: 9s, ETA: 0s [========================>] 8.70M/8.70M sigs Compiling: 2s, ETA: 0s [========================>] 41/41 tasks ----------- SCAN SUMMARY ----------- Known viruses: 8697584 Engine version: 1.4.0 Scanned directories: 0 Scanned files: 1 Infected files: 0 Data scanned: 0.02 MB Data read: 0.01 MB (ratio 2.50:1) Time: 11.487 sec (0 m 11 s) Start Date: 2024:08:21 10:46:11 End Date: 2024:08:21 10:46:23 It is a dangerous virus, so we should block emails from Serafino. Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: nomiya @ lake.dti.ne.jp ┃\/彡 ┗━━┛ "To hire for skills, firms will need to implement robust and intentional changes in their hiring practices ― and change is hard." -- Employers don’t practice what they preach on skills-based hiring --
On 8/20/24 8:55 PM, Masaru Nomiya wrote:
Sorry, I was wrong.
Serafino's PC or iPhone was infected with the virus Emotet, which is sending mail on its own.
Okay, That makes sense. So the virus is doing, among whatever else, going through his e-mails and randomly spamming out replies (clumsily) to wherever they came from -- or wherever they end up getting sent to. Masaru -- is Serafino's PC running Linux? This sounds like a windows issue, but if it's a Linux issue, we need to identify what it is and how it works to make sure we can protect against it. Any idea what he is running and what virus it is -- and importantly whether opensuse is vulnerable? -- David C. Rankin, J.D.,P.E.
Hello, In the Message; Subject : Re: [oS-en] sed question Message-ID : <38c3ea66-56c0-4d9f-a215-6eb934900ab0@gmail.com> Date & Time: Tue, 20 Aug 2024 22:31:21 -0500 [DCR] == "David C. Rankin" <drankinatty@gmail.com> has written: DCR> On 8/20/24 8:55 PM, Masaru Nomiya wrote: MN> > Sorry, I was wrong. MN> > Serafino's PC or iPhone was infected with the virus Emotet, which is sending DCR> > mail on its own. DCR> Okay, DCR> That makes sense. So the virus is doing, among whatever else, DCR> going through his e-mails and randomly spamming out replies DCR> (clumsily) to wherever they came from -- or wherever they end up DCR> getting sent to. DCR> Masaru -- is Serafino's PC running Linux? This sounds like a DCR> windows issue, Maybe he opened a Word or Excel file on his iPhone. DCR> but if it's a Linux issue, we need to identify what it is and DCR> how it works to make sure we can protect against it. As you said, it's a Windows issue, and I didn't see any reports of infection on Linux. DCR> Any idea what he is running and what virus it is -- and DCR> importantly whether opensuse is vulnerable? I don't think openUSE is vulnerable. Emotet is raging all over the world and many places are taking measures against it, so why don't we just use anti-virus software to protect ourselves? Of course, I have asked the administrator of this ML to block emails from Serafino. I think it is quite difficult to get past the virus checks of the providers and this ML. I think our precaution is not to open attachments easily. Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: nomiya @ lake.dti.ne.jp ┃\/彡 ┗━━┛ "Japan was the future but it's stuck in the past" -- Rupert Wingfield-Hayes (BBC) --
Hello, I've received; In the Message; Subject : Re: Request to block emails from Serafino Message-ID : <26532299.1r3eYUQgxm@tux.boltz.de.vu> Date & Time: Wed, 21 Aug 2024 22:43:03 +0200 [CB] == Christian Boltz <opensuse@cboltz.de> has written: CB> Hello, CB> Am Mittwoch, 21. August 2024, 05:09:32 MESZ schrieb Masaru Nomiya: MN> > As you may have already noticed, Serafino's PC or iPhone is infected MN> > with the virus Emotet, which has been sending past emails to the MN> > Mailing List since yesterday. MN> > As you know, Emotet is a dangerous virus, so please block emails from MN> > Serafino as soon as possible. CB> I just enabled moderation for Serafino's mail address. Thanks, Christian. Best Regards. --- ┏━━┓彡 Masaru Nomiya mail-to: nomiya @ lake.dti.ne.jp ┃\/彡 ┗━━┛ "Japan was the future but it's stuck in the past" -- Rupert Wingfield-Hayes (BBC) --
participants (4)
-
Carlos E. R.
-
David C. Rankin
-
Masaru Nomiya
-
Serafino Conflitti