Re: [opensuse] OT: Need program to replace text

13 Nov 2007

      On Monday 12 November 2007 22:24, Chris Arnold wrote:
...
Stefan Hundhammer wrote:
...
perl -p -i -e 's/oldtext/newtext/' *.html
/oldtext will be lines of html and /newtext will be lines of php. Will
perl still be able to do it and if so, do i need to escape some of the
code in oldtext and newtext? Example:
<HTML> the < and >
and
<?php ;?> does any of that need to be escaped?
I did some experimenting, and admittedly there are some caveats with that 
stuff. But here is a skeleton for you:

perl -p -i -0777 \
    -e 's/^.*<!DOCTYPE\s+html.*<body>/PHP-Header\n/si;' \
    -e 's:</body>.*$:<?PHP-Footer?>\n:si;'              \
    -e 's:moreoldstuff:newoldstuff:g;'                  \
    *.html

Note: This is one single line of shell command. I just reformatted it for 
better legibility.

Let's take this apart.

perl -p  : This reads the files specified on the command line as input files 
line by line and prints each single line. If you don't do anything else, this 
is a glorified "copy" command. But with regular search-and-replace, this 
becomes more like a "sed" call. Note there is also "perl -n" which does not 
print; you'd have to append 'p' ("print") to each regexp replace to write 
something to the output file, or use the regular perl "print" command.

-i  : This does all changes in-place, i.e. you don't need to supply an input 
and an output file. Otherwise you'd have to write your own loop in the shell 
and do something like   perl -p -e'<do something>'  <infile >outfile  
perl -i does that loop for you, reads from each infile from the command line, 
writes to a new file and renames the new file afterwards to the name of the 
infile. You can also specify a backup file extension: "perl -i.bak" will back 
up all old infiles to "infile.bak".

-e  : This specifies one perl expression. You can use several -e args, but 
then you need to delimit all (except the last one) with a semicolon. A bit 
unlike "sed", unfortunately.

By default, perl reads one single line from the infile, processes it with all 
your -e expressions and (with -p) writes it to the outfile. This is what most 
people need in most cases. You can use that as a "sed" substitute with 
in-place editing - this is what I wrote in my first post to this thread:

    perl -p -i -e 's/oldstuff/newstuff/g'

The '/g' at the end tells perl to do that globally, i.e., more than once. 
Otherwise it would just replace one single time. Just like "sed".

In your special case, though, you want to search and replace over multiple 
lines. That's a bit tricker. For one thing, you need to tell perl to read 
more than just one line at a time. For example, the entire file at once. This 
is what

    -0777

does: It changes the input record delimiter from \n (newline) to character 
0777 (octal) which doesn't exist, thus the entire file is read at once. 
(See "man perlrun").

Then, you also have to make perl match more than just one single line in a 
regular expression:

    s/oldstuff/morestuff/s 

Note that this is necessary in addition to having the whole file in one single 
string.

I also added /i to match case-insensitive which makes sense for HTML tags.

Quoting regular expressions is another tricky part. You have to watch 
carefully which characters in "oldstuff" have a special meaning in perl's 
(very powerful!) regular expressions. But typically that's not a problem 
because that part is hand-written, not variable stuff coming from a file.

\s is useful: It's a shorthand for "any kind of whitespace character" - blank, 
tab, newline. "\s+" means "at least one whitespace character, but maybe more 
(any number)".

In the replace text there are a lot less characters with special meaning. $1, 
$2, ... $9 come to mind; they are placeholders for an expression in 
parentheses in "oldstuff".

If your search regexp contains slashes, it makes sense to use some other 
delimiter character; this is what I did in the other  -e   expressions:

    s:oldstuff/with/slashes:newstuff:

You could also escape every single slash with a backslash, but that's tedious, 
error-prone and it looks ugly:

    s:oldstuff\/with\/slashes/newstuff/

Did I forget something? Probably. But hey, I don't want to deprive you of all 
intellectual challenges. ;-) I hope I gave you some good starting points, 
though.

More info:

man perlre          (Perl regular expressions)
man perlrun         (Perl command line switches)
man perlop          (Perl quoting)

CU
-- 
Stefan Hundhammer <sh@suse.de>                Penguin by conviction.
YaST2 Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Nürnberg, Germany
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org