[opensuse] filtering

newer
[opensuse] "Wrong" module loaded..

Vince Oliver

9 Mar 2007 9 Mar '07

23:37

Hi All, I should take all "more*.dat" files from each "DH*" directory and to write out first 5 rows into "list" fajl. So each "DH*" directory should have "list" falj with 5 rows from all "more*.dat" within it. This command line works this job for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done But I want would NOT like to have file names in "list" that contain words "t10" or "t9". How to filter them out in this context? Thank you Oliver -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Show replies by date

Randall R Schulz

10 Mar 10 Mar

00:07

Vince, On Friday 09 March 2007 15:37, Vince Oliver wrote:

...

Hi All,

I should take all "more*.dat" files from each "DH*" directory and to write out first 5 rows into "list" fajl. So each "DH*" directory should have "list" falj with 5 rows from all "more*.dat" within it. This command line works this job

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

But I want would NOT like to have file names in "list" that contain words "t10" or "t9". How to filter them out in this context?

The grep family of commands can return only the status of the search, so you could use add these arguments to your find command before the -print argument: -exec egrep -vq '\' \; When you say "... contain words ..." I assume you mean you only want to exclude files where t9 and t10 occur as separate words, but not exclude files that contain, say, "last9" or "test10". If that is not what you want, remove the \< and \> word boundary signifiers from the egrep pattern above.

...

Thank you Oliver

Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Randall R Schulz

00:22

On Friday 09 March 2007 16:07, Randall R Schulz wrote:

...

Vince,

On Friday 09 March 2007 15:37, Vince Oliver wrote:

...
Hi All,

I should take all "more*.dat" files from each "DH*" directory and to write out first 5 rows into "list" fajl. So each "DH*" directory should have "list" falj with 5 rows from all "more*.dat" within it. This command line works this job

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

But I want would NOT like to have file names in "list" that contain words "t10" or "t9". How to filter them out in this context?

The grep family of commands can return only the status of the search, so you could add these arguments to your find command before the -print argument:

-exec egrep -vq '\' \;

Woops! Make that: -exec egrep -vq '\' {} \;

...

When you say "... contain words ..." I assume you mean you only want to exclude files where t9 and t10 occur as separate words, but not exclude files that contain, say, "last9" or "test10". If that is not what you want, remove the \< and \> word boundary signifiers from the egrep pattern above.

Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Vince Oliver

08:05

On Fri, 9 Mar 2007, Randall R Schulz wrote:

...

On Friday 09 March 2007 16:07, Randall R Schulz wrote:

...
Vince,

On Friday 09 March 2007 15:37, Vince Oliver wrote:

...
Hi All,

I should take all "more*.dat" files from each "DH*" directory and to write out first 5 rows into "list" fajl. So each "DH*" directory should have "list" falj with 5 rows from all "more*.dat" within it. This command line works this job

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

But I want would NOT like to have file names in "list" that contain words "t10" or "t9". How to filter them out in this context?

The grep family of commands can return only the status of the search, so you could add these arguments to your find command before the -print argument:

-exec egrep -vq '\' \;

Woops! Make that:

-exec egrep -vq '\' {} \;

...
When you say "... contain words ..." I assume you mean you only want to exclude files where t9 and t10 occur as separate words, but not exclude files that contain, say, "last9" or "test10". If that is not what you want, remove the \< and \> word boundary signifiers from the egrep pattern above.

thanks for reply I have files like something_t1_something.dat something_t2_something.dat something_t3_something.dat ..... something_t10_something.dat I would nor like to have files with t9 and t10 appears in the fileName Did you mean like this? It does not work but may be I did not understand where to include egrep for dir in DH* ; do for file in `find $dir -type f -name "less*data.dat" -exec egrep -vq '\' {} \; -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

...

Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Randall R Schulz

14:58

On Saturday 10 March 2007 00:05, Vince Oliver wrote:

...

On Fri, 9 Mar 2007, Randall R Schulz wrote:

...
...

...
When you say "... contain words ..." I assume you mean you only want to exclude files where t9 and t10 occur as separate words, but not exclude files that contain, say, "last9" or "test10". If that is not what you want, remove the \< and \> word boundary signifiers from the egrep pattern above.

thanks for reply I have files like

something_t1_something.dat something_t2_something.dat something_t3_something.dat ..... something_t10_something.dat

I would nor like to have files with t9 and t10 appears in the fileName

Never mind my suggestion. I thought you wanted to exclude files whose _content_ included those words, not those whose name did. Anders' suggestion will do what you want.

...

Did you mean like this? It does not work but may be I did not understand where to include egrep

It is what I meant, but as I explained above, I misunderstood your requirement (programmers are famous for this...). I don't know why you wrote "\", though.

...

for dir in DH* ; do for file in `find $dir -type f -name "less*data.dat" -exec egrep -vq '\' {} \; -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Anders Johansson

00:32

On Saturday 10 March 2007 00:37, Vince Oliver wrote:

...

Hi All,

I should take all "more*.dat" files from each "DH*" directory and to write out first 5 rows into "list" fajl. So each "DH*" directory should have "list" falj with 5 rows from all "more*.dat" within it. This command line works this job

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

But I want would NOT like to have file names in "list" that contain words "t10" or "t9". How to filter them out in this context?

I wouldn't use heavy tools like awk for something as simple as this for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do ( [ ${file/t10/} != $file ] || [ ${file/t9/} != $file ] ) || head -5 $file

...

...
$dir/list; done; done

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Randall R Schulz

07:39

On Friday 09 March 2007 16:32, Anders Johansson wrote:

...

...

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it

Check out the [[ value = pattern ]] tests. There are options for both glob and RE interpretation of "pattern." Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Anders Johansson

10:34

On Saturday 10 March 2007 08:39, Randall R Schulz wrote:

...

On Friday 09 March 2007 16:32, Anders Johansson wrote:

...
...

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it

Check out the [[ value = pattern ]] tests. There are options for both glob and RE interpretation of "pattern."

Cool, that works, thanks So for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || head -5 $file >> $dir/list; done; done -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Vince Oliver

11:28

On Sat, 10 Mar 2007, Anders Johansson wrote:

...

On Saturday 10 March 2007 08:39, Randall R Schulz wrote:

...
On Friday 09 March 2007 16:32, Anders Johansson wrote:

...
...

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it

Check out the [[ value = pattern ]] tests. There are options for both glob and RE interpretation of "pattern."

Cool, that works, thanks

So

for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || head -5 $file >> $dir/list; done; done

Thanks. But I do not want the whole content of files in 'list' just 6th, 7th columns and file names (as you mat read in awk command bellow) for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -exec egrep -vq '\' {} \; -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done this command works fine but filtering out lines with egrep does not work

...

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Anders Johansson

12:04

On Saturday 10 March 2007 12:28, Vince Oliver wrote:

...

On Sat, 10 Mar 2007, Anders Johansson wrote:

...
On Saturday 10 March 2007 08:39, Randall R Schulz wrote:

...
On Friday 09 March 2007 16:32, Anders Johansson wrote:

...
...

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it

Check out the [[ value = pattern ]] tests. There are options for both glob and RE interpretation of "pattern."

Cool, that works, thanks

So

for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || head -5 $file >> $dir/list; done; done

Thanks. But I do not want the whole content of files in 'list' just 6th, 7th columns and file names (as you mat read in awk command bellow)

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -exec egrep -vq '\' {} \; -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

this command works fine but filtering out lines with egrep does not work

No. It wouldn't take much to add my test to your awk, but I still think awk is the wrong tool to use for something so light for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || for i in 1 2 3 4 5; do IFS=' ' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done There are very few things you can't do with bash alone. I just wish I could think of a way to eliminate the 'find'. That one really annoys me -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Vince Oliver

12:47

On Sat, 10 Mar 2007, Anders Johansson wrote:

...

On Saturday 10 March 2007 12:28, Vince Oliver wrote:

...
On Sat, 10 Mar 2007, Anders Johansson wrote:

...
On Saturday 10 March 2007 08:39, Randall R Schulz wrote:

...
On Friday 09 March 2007 16:32, Anders Johansson wrote:

...
...

I'm not very happy with the string tests, but I couldn't find a bash function that returned true on substring match. If anyone can think of a cleaner way of doing it, I'd love to know it

Check out the [[ value = pattern ]] tests. There are options for both glob and RE interpretation of "pattern."

Cool, that works, thanks

So

for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || head -5 $file >> $dir/list; done; done

Thanks. But I do not want the whole content of files in 'list' just 6th, 7th columns and file names (as you mat read in awk command bellow)

for dir in DH* ; do for file in `find $dir -type f -name "more*data.dat" -exec egrep -vq '\' {} \; -print`; do awk 'BEGIN{FS=","}{if(NR>1 && NR<7){ f=n=FILENAME;sub(/[^/]+$/,"list",f);sub(/.*\//,"",n);print $6,$7,n>>f}}' $file done done

this command works fine but filtering out lines with egrep does not work

No. It wouldn't take much to add my test to your awk, but I still think awk is the wrong tool to use for something so light

for dir in DH*; do for file in `find $dir -type f -name more\*.dat`; do [[ $file =~ t$10\|9$ ]] || for i in 1 2 3 4 5; do IFS=' ' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done

There are very few things you can't do with bash alone.

I just wish I could think of a way to eliminate the 'find'. That one really annoys me

It does not work. Name of the files are like: less_box1_tau1_data.dat less_box1_tau2_data.dat less_box1_tau3_data.dat ... less_box1_tau10_data.dat so I run like: for dir in DH*; do for file in `find $dir -type f -name less\*data.dat`; do [[ $file =~ tau$1\|2$ ]] || for i in 1 2 3 4 5; do IFS=' ' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done But this command store all file names in 'list' without filtering and line[5], line[6] (columns in files are separated by comma) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Anders Johansson

12:56

On Saturday 10 March 2007 13:47, Vince Oliver wrote:

...

It does not work. Name of the files are like:

less_box1_tau1_data.dat less_box1_tau2_data.dat less_box1_tau3_data.dat ... less_box1_tau10_data.dat

so I run like:

for dir in DH*; do for file in `find $dir -type f -name less\*data.dat`; do [[ $file =~ tau$1\|2$ ]] || for i in 1 2 3 4 5; do

This should mean that you use all files except those called tau1, tau10 and tau2. If you want to avoid only tau1 and tau2, and include tau10, make it tau$1\|2$_ instead

...

IFS=' ' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done

But this command store all file names in 'list' without filtering and line[5], line[6] (columns in files are separated by comma)

Then change IFS=' ' to IFS=',' -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

Vince Oliver

13:40

On Sat, 10 Mar 2007, Anders Johansson wrote:

...

On Saturday 10 March 2007 13:47, Vince Oliver wrote:

...
It does not work. Name of the files are like:

less_box1_tau1_data.dat less_box1_tau2_data.dat less_box1_tau3_data.dat ... less_box1_tau10_data.dat

so I run like:

for dir in DH*; do for file in `find $dir -type f -name less\*data.dat`; do [[ $file =~ tau$1\|2$ ]] || for i in 1 2 3 4 5; do

This should mean that you use all files except those called tau1, tau10 and tau2. If you want to avoid only tau1 and tau2, and include tau10, make it

tau$1\|2$_

instead

...
IFS=' ' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done

But this command store all file names in 'list' without filtering and line[5], line[6] (columns in files are separated by comma)

Then change IFS=' ' to IFS=','

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

well almost there with: for dir in DH*; do for file in `find $dir -type f -name less\*data.dat`; do [[ $file =~ tau$1\|2$ ]] || for i in 1 3 4 5 6; do IFS=',' read -a line; echo ${line[5]} ${line[6]} $(basename $file) >> $dir/list; done < $file; done; done I have this in output: ragar decgar less_box1_tau1_data.dat 153.30632 -0.89683 less_box1_tau1_data.dat 153.95998 -1.18637 less_box1_tau1_data.dat 164.02272 -0.03873 less_box1_tau1_data.dat 180.1395 -0.73408 less_box1_tau1_data.dat 198.55013 1.12816 less_box1_tau1_data.dat ragar decgar less_box1_tau2_data.dat 147.03909 0.52561 less_box1_tau2_data.dat 148.23259 1.07151 less_box1_tau2_data.dat 151.31052 0.3381 less_box1_tau2_data.dat 156.72609 0.62027 less_box1_tau2_data.dat 157.75874 0.76738 less_box1_tau2_data.dat ..... steel not filtered out and I have repeated headers that I dont want to have ('ragar decgar less_box1_tau2_data.dat') -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org

6266

Age (days ago)

6267

Last active (days ago)

List overview

Download

12 comments

3 participants

participants (3)

Anders Johansson
Randall R Schulz
Vince Oliver