[opensuse] fun bash script to automate creating url lists for file download (especially from repos)
![](https://seccdn.libravatar.org/avatar/27aacf61a13c66fcc083fcf8a84823bc.jpg?s=120&d=mm&r=g)
Guys, I frequently need to pull multiple files from a remote host and I like to do it with a list of links that I just feed to wget with the -i option (and -b) and let it do its thing. The only pain is building the 'getfile' list. I wrote a little script that helps. If you are interested, you can grab it at: http://www.3111skyline.com/dl/dev/scr/net/lynxdump.sh As the name implies, it uses 'lynx -dump' to generate the list and then it parses the return to leave just the links in the output file. Normally, it will just create an output file with all links from the urls given on the command line in a single file 'without' any subdirectories included. Two options modify this behavior [-d|--dirs] causes subdirectories to be included and [-r|--rpms] return only links to rpms. (very handy for repos). The main timesaver of the script is the [-b|--base] option that tells the script to grab the baseURL from the next argument and use that for any subsequent directories specified on the command line. (again, very handy for working with repos -- you never have to enter the full url twice). For example, if you wanted the 11.2 libusb0 packages for i586, x86_64 and the src rpms, all you would need to do is: lynxdump -b http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/i5... src x86_64 -r to grab the binary and src.rpm links and eliminate all the mirror and metadata junk. The default output file is ./lynxdump.txt [or as specified with the -o option]. In the case above, here is the output file you would get: 01:10 alchemy:~> cat lynxdump.txt http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/i5... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/i5... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/i5... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/sr... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/x8... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/x8... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/x8... http://download.opensuse.org/repositories/hardware:/libusb0/openSUSE_11.2/x8... All that is needed to retrieve the files now is 'wget -i lynxdump.txt' and your done. The script works equally well on 1 URL or 50 URLs. The only limitation is you cant specify more than 1 baseURL ( two -b options are not allowed). The script is reasonable commented and can be quickly adapted to handle time when you just want the .tar.gz files, .pdf files, etc... For those wanting to learn bash scripting a little better, this is a good script to dissect for a look at command line parameter handling with an array, array indexing, bash functions and returns, etc. Here the option specification order or position in the command line is unimportant. (obviously if you specify -b or -o, the next parameter must be a URL or output file, respectively. DNH, all, you see anything I could do better, let me know. I tried to rework the getdump() function to simply build the dumpSTR (see ## Scraps) at the bottom of the script, but I couldn't get lynx to accept a variable containing the pipes, etc... I was wanting to eliminate the 3 calls to lynx -dump in the function and simple pass the dumpSTR to a single call of lynx -dump at the bottom of the function, but ... "no dice." Thanks for any feedback you can offer. Enjoy :p -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/27aacf61a13c66fcc083fcf8a84823bc.jpg?s=120&d=mm&r=g)
On 06/07/2010 01:28 AM, David C. Rankin wrote:
Guys,
I frequently need to pull multiple files from a remote host and I like to do it with a list of links that I just feed to wget with the -i option (and -b) and let it do its thing. The only pain is building the 'getfile' list. I wrote a little script that helps. If you are interested, you can grab it at:
http://www.3111skyline.com/dl/dev/scr/net/lynxdump.sh
As the name implies, it uses 'lynx -dump' to generate the list and then it parses the return to leave just the links in the output file. Normally, it will just create an output file with all links from the urls given on the command line in a single file 'without' any subdirectories included.
Update: I have included a new --nodebug flag that works with the --rpm flag to eliminate and exclude all debugsource and debuginfo rpms from the list of rpms URLs returned. The changes are reflected in the updated help for the lynxdump script: 00:57 nirvana:/srv/http/dl/dev/scr/net> sh lynxdump.sh Error: No input URL provided, exiting... Usage: lynxdump.sh [-h|--help] [-v|--verbose] [-r|--rpm] [--nodebug] [-d|--dirs] [-b|--base] url-with-links [-o|--outfile outfile] lynxdump.sh uses 'lynx -dump' to capture all links from 'url-with-links' and parses the output leaving only the direct URLs. The resulting links written to 'outfile' (default: ./lynxdump.txt) can be used with 'wget -i outfile' to retrieve all files from the remote host. Options: -h | --help show this help and exit (must be only option given). -b | --base the next URL provides the baseURL information as well as a directory (i.e. -b http://download.lynx.org/docs) All other urls with the same baseURL need only provide the directory name (i.e. download, svn). -d | --dirs include sub-directories in the list of links. -o | --outfile the following command line option profides the output file name. -r | --rpm changes dump file parsing so that only rpm links are saved. --nodebug excludes debuginfo and debugsource files. (use with -r | --rpm) -v | --verbose additional output of script operations. Example: lynxdump -b http://download.opensuse.org/repositories/X11/i586 src x86_64 --rpm --nodebug creates an output file with the links to rpms in ../X11/i586 ../X11/src and ../X11/src directories without the debuginfo or debugsource files included. If you would like to grab a few of your favorite 11.0 rpms before they mysteriously disappear, then you can simply issue the command: lynxdump -b http://download.opensuse.org/repositories/your_favorite/openSUSE_11.0/i586 src noarch x86_64 -r --nodebug -o getfile.txt That will create for you a list of complete URLs to all rpms contained in all 4 of the repository directories without including any debuginfo or debugsource rpms. Then to retrieve the packages to your local system, simply use wget -i URLsFile, here: wget -i getfile.txt -b ## add the -b to wget to background the retieval. Have fun. If you find any bugs, let me know :) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (1)
-
David C. Rankin