Hi, Does anyone know of a (command line) program in SuSE that allows you to replace all occurences of a certain string in a text (CSV) file with another? I have a (huge) file which contains Matlab style 'NaN' for missing data and I need to make it Gauss readable by replacing all the 'NaN' with the symbol '.'. It's too big to be loaded into an editor or excel etc. thanks, j. __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
Morning Try man awk, man sed, man ed Tage D. -----Oprindelig meddelelse----- Fra: James Philp [mailto:linuxjames2003@yahoo.com] Sendt: 21. august 2003 06:03 Til: suse-linux-e@suse.com Emne: [SLE] ?program to replace text in text files Hi, Does anyone know of a (command line) program in SuSE that allows you to replace all occurences of a certain string in a text (CSV) file with another? I have a (huge) file which contains Matlab style 'NaN' for missing data and I need to make it Gauss readable by replacing all the 'NaN' with the symbol '.'. It's too big to be loaded into an editor or excel etc. thanks, j. __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
Does anyone know of a (command line) program in SuSE that allows you to replace all occurences of a certain string in a text (CSV) file with another?
I have a (huge) file which contains Matlab style 'NaN' for missing data and I need to make it Gauss readable by replacing all the 'NaN' with the symbol '.'. It's too big to be loaded into an editor or excel etc.
Using everyone's favourite swiss army chainsaw: perl -pi -e 's/NaN/\./' file.csv Take a backup first! --
eatapple core dump
Thanks a lot. Brilliant little trick. However it only
does it for the first row of the data. So I just had
to execute the script over and over again until it
parsed the entire data set.
Btw, where do you learn these things from? I'm not a
computer science major, so are there any quick intro's
to perl?
thanks, james.
--- Derek Fountain
Does anyone know of a (command line) program in SuSE that allows you to replace all occurences of a certain string in a text (CSV) file with another?
I have a (huge) file which contains Matlab style 'NaN' for missing data and I need to make it Gauss readable by replacing all the 'NaN' with the symbol '.'. It's too big to be loaded into an editor or excel etc.
Using everyone's favourite swiss army chainsaw:
perl -pi -e 's/NaN/\./' file.csv
Take a backup first!
--
eatapple core dump
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
On Thu, 21 Aug 2003 08:40:23 -0700 (PDT)
James Philp
Thanks a lot. Brilliant little trick. However it only does it for the first row of the data. So I just had
perl -pi -e 's/NaN/\./' file.csv
Take a backup first!
Put a 'g' modifier on the regex perl -pi -e 's/NaN/\./g' file.csv or try the "line" modifier perl -lpi -e '$_ =~ s/searchstring/replace string/' file.txt -- I'm not really a human, but I play one on earth.
On Thu, 21 Aug 2003 08:40:23 -0700 (PDT)
James Philp
Thanks a lot. Brilliant little trick. However it only
perl -pi -e 's/NaN/\./' file.csv
Here is a pro search and replace script which lets you select single
files or recurse, and has "prompt on replace options"
#!/usr/bin/perl
#recursive search and replace
# POD Documentation
=head1 PROGRAM NAME AND AUTHOR
Search and Replace - Version 1.3
Build Date: May 9, 2001
peterbrown@worldcommunity.com
=head1 WHAT IT IS
Search and Replace (ok, it's not an original name...)
It's quite fast. It processed 13,402,165 replaces, (13.4 million)
(a 67 meg text file with 163,441 lines) in 1 minute, 10 seconds
on a Pentium 166 with 64 megs of RAM. (This was the test in v1.0)
On one client's system it processed 13,029 files,
with 7,487 replaces, in 11 seconds.
As an alternate method, I recommend using VEDIT, the fastest
huge file text editor in the world, at 'www.vedit.com'.
HELP: type "sr -h for help"
edit s/r values below if you're not using command line parameters
=head1 COPYRIGHT
Copyright 2001 Peter F. Brown
SR complies with the GNU GENERAL PUBLIC LICENSE
and is released as "Open Source Software".
NO WARRANTY IS OFFERED FOR THE USE OF THIS SOFTWARE
! Just remember. Back up your file first!
(when you mess with huge data sets, please do
save yourself grief and backup the file up :-)
=head1 BUG REPORTS AND SUPPORT
Send bug reports to peterbrown@worldcommunity.com.
Visit the author's web site at 'worldcommunity.com'
to view information about support, customer quotes,
a resume link, and fees for custom Perl/MySQL programming.
=head1 OBTAINING THE LATEST VERSION
==> Get the most recent version of this program at:
http://worldcommunity.com
=head1 REQUIREMENTS
Perl 5
=head1 CHANGELOG
- v1.3 - May 9, 2001
. added output of actual replaces to 'sr.replaces.log'
. changed the formal name to 'Search and Replace'
. changed the file name to 'srep.cgi' (for stability)
- v1.2 - May 6, 2001
. Initial Public Release.
. Changed to line method of parsing.
. Added prompts, recursive directories, logs, essentially a complete rewrite.
- v1.0 - May 20, 1998
. Initial release. Used 'chunk' method of parsing text, instead of lines
. Only operated on one file at a time
=cut
##############################################################################
# setup area
# these are the approved file extensions that the
# program will look for.
@file_extensions = qw[htm html shtml txt cgi pl js ];
# I recommend using an extension for the log file below that
# is NOT included in the array above.
$log_file = './sr.log';
$replaces_log = './sr.replaces.log';
# end of setup area
##############################################################################
use File::Find;
$clear = `clear`;
print $clear;
$version = 'v1.3';
$sr_header = qq~Search and Replace $version - replaces characters in a text file.
by Peter F. Brown; peterbrown\@worldcommunity.com
Copyright 2001 Peter F. Brown. All Rights Reserved Worldwide.
Open Source Software. [http://worldcommunity.com]~;
if ($#ARGV == 0 and $ARGV[0] eq "-u")
{
print "\nUsing values in text file.\n";
# EDIT VALUES HERE IF YOU'RE NOT USING COMMAND LINE PARAMETERS
###################################################################
# you can use regular expressions here, if you're brave.
# NOTE: this version doesn't support $1 parenthesizing
# (perhaps in the next version)
# note for DOS users: be careful of the 8.3 and \ conventions
# when you name your files. Otherwise, sr should work under DOS.
# NOTE: Using single quotes may have a different effect.
$input_file = "";
$search_string = "";
$replace_string = "";
$prompt_replace = "no";
$save_backups = "no";
$outfile = "outfile.sr";
$double_check = "no";
$case_sensitive = "no";
}
################ end of s/r editing ###############################
elsif ($#ARGV == 0 and $ARGV[0] eq "-h")
{
&help_header;
}
elsif ($#ARGV == 2)
{
$input_file = $ARGV[0];
$search_string = $ARGV[1];
$replace_string = $ARGV[2];
$prompt_replace = 'no';
$save_backups = 'no';
$outfile = 'outfile.sr';
$double_check = 'no';
$case_sensitive = 'no';
}
elsif ($#ARGV == 7)
{
$input_file = $ARGV[0];
$search_string = $ARGV[1];
$replace_string = $ARGV[2];
$prompt_replace = $ARGV[3];
$save_backups = $ARGV[4];
$outfile = $ARGV[5];
$double_check = $ARGV[6];
$case_sensitive = $ARGV[7];
}
else
{
&help_header;
}
# convert input vars
$prompt_replace = lc($prompt_replace);
$save_backups = lc($save_backups);
$double_check = lc($double_check);
$case_sensitive = lc($case_sensitive);
if ( $prompt_replace ne 'yes' and $prompt_replace ne 'no' )
{
print "\nPrompt Replace must equal either 'yes' or 'no' or BLANK.\n";
print "If you leave it blank, it will default to 'YES'.\n";
print "Exiting ... \n\n";
exit;
}
if ( $save_backups ne 'yes' and $save_backups ne 'no' )
{
print "\nSave Backups must equal either 'yes' or 'no' or BLANK.\n";
print "If you leave it blank, it will default to 'YES'.\n";
print "Exiting ... \n\n";
exit;
}
if ( $double_check ne 'yes' and $double_check ne 'no' )
{
print "\nDouble Check must equal either 'yes' or 'no' or BLANK.\n";
print "If you leave it blank, it will default to 'NO'.\n";
print "Exiting ... \n\n";
exit;
}
if ( $case_sensitive ne 'yes' and $case_sensitive ne 'no' )
{
print "\nCase Sensitive must equal either 'yes' or 'no' or BLANK.\n";
print "If you leave it blank, it will default to 'NO'.\n";
print "Exiting ... \n\n";
exit;
}
# check user input
#.............................
$| = 1;
# check for Unix or DOS, for console input
if (-e "/dev/tty")
{$console = "/dev/tty";}
else {$console = "con";}
unless ( open(USER_PROMPT, "$console"))
{
print "Can't open console: $!\n";
exit;
}
#..............................
$process = "false";
while ($process eq "false")
{
print qq~
$sr_header
You have specified the following:
Input File: $input_file
Search String: $search_string
Replace String: $replace_string
Prompt Replace: $prompt_replace (prompts at each replace)
Save Backups: $save_backups
Temp File: $outfile
Double Check: $double_check (double checks each replace)
Case Sensitive: $case_sensitive
For a fast UNPROMPTED replace of a directory tree, type:
"srep.cgi CURDIR 'SEARCHSTR' 'REPLACESTR' no no outfile.sr no no"
NOTE: 'case_sensitive' only applies to searching.
The replace value will use the case of the
'replace_string'.
NOTE: If Input File equals 'CURDIR', then
all the TEXT files in the current directory
and all of its subdirectories will be processed.
NOTE: If Save Backups is set to 'yes', then the
input file will be copied to $input_file\.bak
In either case, the input file
($input_file) will be overwritten with the
temp file, for 'in place' editing.
Do you wish to continue (enter only "y" or "n")? ~;
$continue =
James Philp
Derek Fountain
wrote: James Philp
wrote: I have a (huge) file which contains Matlab style 'NaN' for missing data and I need to make it Gauss readable by replacing all the 'NaN' with the symbol '.'.
Using everyone's favourite swiss army chainsaw:
perl -pi -e 's/NaN/\./' file.csv
Take a backup first!
Thanks a lot. Brilliant little trick. However it only does it for the first row of the data.
Btw, where do you learn these things from?
Another solution: sed s/NaN/\./g file.csv > file_fixed.csv It substitutes every instance of "NaN" on every line and doesn't alter the original file. My favorite resource for such questions is _Unix Power Tools_ 3nd ed., by Shelley Powers, Jerry Peek, Tim O'Reilly, and Mike Loukides. It has a wealth of (mostly command line) tips and tricks for *nix systems. The coverage of Perl (and its more modern substitute, Python) is brief, but it's a peerless general reference (I bought the 1st edition before I started using Linux). http://www.oreilly.com/catalog/upt3/ BTW, I reformatted your post into the "standard" form. Much easier to read. HTH, -rex -- If at first you don't succeed, try and try again- then give up, there's no use being a damn fool about it. --W.C.Fields
On Thursday 21 August 2003 23:40, James Philp wrote:
Thanks a lot. Brilliant little trick. However it only does it for the first row of the data. So I just had to execute the script over and over again until it parsed the entire data set.
As others have said, add the 'g' modifier to have the regex applied over all the data in the line, instead of just the first hit.
Btw, where do you learn these things from? I'm not a computer science major, so are there any quick intro's to perl?
The O'Reilly books are pretty much the definitive works on Perl; the camel book in particular is excellent. There are thousands of online resources. Start at http://www.perl.com and see where you end up. You may have noticed that you got a number of answers to your question, all using different utilities. If you want to learn to do things like your search 'n' replace thing yourself, I'd advise you learn Perl, and don't worry about any of the others. In the maze of Unix text manipulating utilites like Perl, sed, awk and others, Perl is the one and only thing you need. --
eatapple core dump
[...]
You may have noticed that you got a number of answers to your question, all using different utilities. If you want to learn to do things like your search 'n' replace thing yourself, I'd advise you learn Perl, and don't worry about any of the others. In the maze of Unix text manipulating utilites like Perl, sed, awk and others, Perl is the one and only thing you need.
thank to everyone. the perl script definitely did the trick. i tried some of the other things too like sed, awk, vi but perl was definitely faster.
--
eatapple core dump
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
participants (5)
-
Derek Fountain
-
James Philp
-
rex
-
Tage Danielsen
-
zentara