[opensuse] How would you do this?
Hi Folks, I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand! What would you use? awk? A shell script? Something using grep and sed? Thanks in advance for any advice, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
Hi Lew find and sed : find <topdir> -type f -print0 | xargs -0 sed -i -e 's/foobar.org/foobar.info/g' Not tested, don't try it without having a backup! -- Per Jessen, Zürich (13.6°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 08/21/2014 11:35 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed? Hi Lew
find and sed :
find <topdir> -type f -print0 | xargs -0 sed -i -e 's/foobar.org/foobar.info/g'
Not tested, don't try it without having a backup!
My thanks to Per, Dirk, Rolf, Greg, and Carlos for offering suggestions. I didn't need to capture the site data, I've got control of the source and it's backed up in multiple off-site locations. It's about 6-GB of content. This phrase will work well, I tested it on an off-line image. find . -name \*.html -print0 | xargs -0 sed -i -e 's/foo.org/foo.info/g' The site is used for reference purposes, and has a lot of non-html content which should preserve the old domain name for historical purposes in text files. I also ran it against .pl and .php files. In case you're wondering, this is a museum web site that contains lots of historic reference material. It was written in plain html by hand with vi over the course of 20-years. It is admittedly "old fashioned" and management wanted a new modern look. But none of the volunteers who contributed to the old site wanted to participate, so management hired out the work. Alas, they underestimated the magnitude of the task by an order of magnitude or so. The new site looks great, but is very shallow, and will be so for a long time. So I took the old content and registered a new name under .info and will keep that on-line until/if they ever get the data completely ported. This is also a volunteer effort on my part, it's not my day job. The old site does run on openSuSE on a dedicated server! They stuck the new site into the cloud somewhere and are tied in closely with Google, which also makes me nervous... Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
try this shellscript: $!/bin/bash WEBSITEROOT=/srv/web/site/original COPY=/srv/web/site/modified cp -R $WEBSITEROOT $COPY # recursive copy of the website directory tree cd $COPY # go to copy find . -exec ( sed -e " s/foober.info/foobar.org/g " \{\} > tmp.$$ ; mv tmp.$$ \{\}) \; It's safe, because it only modifies the copy. And you can run it over and over
Thanks in advance for any advice, Lew
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am Freitag 22 August 14 schrieb Dirk Gently:
Lew Wolfgang wrote:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
try this shellscript:
$!/bin/bash WEBSITEROOT=/srv/web/site/original COPY=/srv/web/site/modified
cp -R $WEBSITEROOT $COPY # recursive copy of the website directory tree cd $COPY # go to copy find . -exec ( sed -e " s/foober.info/foobar.org/g " \{\} > tmp.$$ ; mv tmp.$$ \{\}) \;
Should it not be the other way round? ...sed -e " s/foober.org/foobar.info/g "...
It's safe, because it only modifies the copy. And you can run it over and over
Thanks in advance for any advice, Lew
-- Herzliche Grüße! Rolf Muth Meine Adressen dürfen nicht für Werbung verwendet werden! S/MIME Zertifikat 0x25F0E92D9AE21AE6
On Fri, Aug 22, 2014 at 1:34 AM, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
Thanks in advance for any advice, Lew
The first thing I would do is see if some magnanimous soul had already done it for me. Just imaging a company that goes around making backups of all the websites they can and making them available to the website owners for free. Hard to believe, but someone is doing just that: http://web.archive.org/ In the case of foobar.com you can see all the backups that they have made at: http://web.archive.org/web/*/foobar.org So my question is if you need to bother at all with your effort. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2014-08-21 at 22:34 -0700, Lew Wolfgang wrote:
historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
wget. It can do a mirror of a site, sanitizing the old domain for the new, I think. At least when creating a local mirror to a directory. I used this years ago, I don't remember the details. - -- Cheers, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlP3asEACgkQtTMYHG2NR9XmCgCeO1+FQ7fuoaKAeh4JQQHJM6fR VygAnAtPbrm7QoXKrf4sXYBTo/3RIR1h =vP01 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, Aug 22, 2014 at 12:07 PM, Carlos E. R. <carlos.e.r@opensuse.org> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Thursday, 2014-08-21 at 22:34 -0700, Lew Wolfgang wrote:
historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
wget. It can do a mirror of a site, sanitizing the old domain for the new, I think. At least when creating a local mirror to a directory. I used this years ago, I don't remember the details.
I don't know if wget can do that or not. httrack is in the 13.1 distro and it is designed to make local copies of remote websites. One use case is offline reading of a website. I admit to not have done enough testing of httrack, but I've pulled a basic website down with it IIRC. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-08-22 18:20, Greg Freemyer wrote:
On Fri, Aug 22, 2014 at 12:07 PM, Carlos E. R. <> wrote:
wget. It can do a mirror of a site, sanitizing the old domain for the new, I think. At least when creating a local mirror to a directory. I used this years ago, I don't remember the details.
I don't know if wget can do that or not.
wget --continue --recursive --level=2 --convert-links \ --page-requisites http://site... But it converts to the local address and directory of the machine you run the command. Maybe this can be changed, but I would have to read the documentation in more depth. You should look also at the "--mirror" option. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
El 22/08/14 a las #4, Lew Wolfgang escribió:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
Thanks in advance for any advice, Lew
Since people already gave you sane solutions, here is a crazy one.. Use apache mod_ext_filter :-) ExtFilterDefine fixlinks mode=output intype=text/html \ cmd="/bin/sed s/foobar.org/foobar.info/g" <Location /> SetOutputFilter fixlinks </Location> in your host configuration. then you are done. ps:this is just an example, does not scale well, ain't going to be fast. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 08/22/2014 11:33 PM, Cristian Rodríguez wrote:
El 22/08/14 a las #4, Lew Wolfgang escribió:
Hi Folks,
I've got a web site that's being "retired", so to speak, and will be replaced with a shiny new presentation. But I want to keep the old one on-line for historical reference at a different domainname. For example, the domain hosting the new content would be foobar.org, the one I want to keep for history would be foobar.info. So I need to go through the old html and substitute any references of foobar.org to foobar.info. There are 3705 separate lines in probably 1000 files in a large filesystem hierarchy containing the string that needs swapping. I'd rather not do it by hand!
What would you use? awk? A shell script? Something using grep and sed?
Thanks in advance for any advice, Lew
Since people already gave you sane solutions, here is a crazy one.. Use apache mod_ext_filter :-)
ExtFilterDefine fixlinks mode=output intype=text/html \ cmd="/bin/sed s/foobar.org/foobar.info/g"
<Location /> SetOutputFilter fixlinks </Location>
in your host configuration. then you are done.
ps:this is just an example, does not scale well, ain't going to be fast.
Thanks Cristian, that's interesting. But this would check all served content, wouldn't it? It could be bad if the string happened to appear in a binary file! Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
El 23/08/14 a las #4, Lew Wolfgang escribió:
Thanks Cristian, that's interesting. But this would check all served content, wouldn't it? It could be bad if the string happened to appear in a binary file!
Nope. only content with type text/html in the virtual host you use it. It is just an example to show the power of output filters ;-) ps: I strongly suggest not to use this in production. -- Cristian "I don't know the key to success, but the key to failure is trying to please everybody." -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (8)
-
Carlos E. R.
-
Carlos E. R.
-
Cristian Rodríguez
-
Dirk Gently
-
Greg Freemyer
-
Lew Wolfgang
-
Per Jessen
-
Rolf Muth