Mailinglist Archive: opensuse (1264 mails)

< Previous Next >
Re: [opensuse] Quick question: how to call a script function from "find"?
On Wed, Jun 14, 2017 at 6:58 AM, Vojtěch Zeisek
<vojtech.zeisek@xxxxxxxxxxxx> wrote:
Dne středa 14. června 2017 12:07:22 CEST, Carlos E. R. napsal(a):
On 2017-06-14 08:09, Vojtěch Zeisek wrote:
Dne úterý 13. června 2017 22:34:01 CEST, Carlos E. R. napsal(a):
On 2017-06-13 22:14, Per Jessen wrote:
Vojtěch Zeisek wrote:
find /home/cer/Fusion/Videos/ -type d | parallel "sudo... && chmod...
&&..." This is my favorite style. :-)

I didn't known "parallel" - it has a 3000+ line man page :-)
Thanks for making me aware.

There are two packages: one from gnu, and another from another party.

The manual is way too long... it needs a quick guide to get started. My
Linux course, slides 99-101. Not showing all features, of course, but at
least something to start with...

Thanks :-)

I don't say I'm GNU Parallel expert, there are definitely bigger experts in
this ML. :-) I also haven't read whole man page thoroughly. ;-)
One more example I used recently:
find $DIR -name "*trm_R1*" -print | parallel "echo && echo '{}' && echo && bwa
mem '{//}'/$REFB '{//}'/*trm_R1* '{//}'/*trm_R2* | samtools view -bu |
samtools sort -l 9 -o '{= s:trm.+$:paired.bam: =}'"
The input files are in subdirectories. Names of files and directories reflect
sample names and some more information (fixed structure). bwa and samtools
process genetic data (I'm not going to describe details here).
Everything inside "..." is done for one line returned by 'find'. It tells
which file is processed (echo '{}' - '{}' contains whole input line returned
by 'find'). Command 'bwa mem' is taking genetic reference stored in each
directory ($REFB contains its basename and '{//}' expands to directory part of
each line returned by 'find') and two input files (similar case, filename
pattern is known and stable). Its output is given by pipe to 'samtools' (some
magic molecular processing:-) and then to final 'samtools' (even more bio
magic:-). I really love the construction of '{= ... =}'. It contains Perl
regular expression. It takes the whole input name (as '{}') and it can be
processed by Perl (almost sed-like, ehm, ehm;-) search/replace playing. This
is really useful to obtain some regular pattern for output files. I.e. the
input file is named ...trm... and I'm changing this to unpaired.bam. Perfect.
:-) Like this I process sometimes several hundreds samples. Every takes up to
several hours. If I distribute the task among 12-16 CPU threads, I do it
significantly faster than when using for/while loop. :-)

Using find with multiple threads is likely much more valuable now that
NVMe SSDs are out. I think mine can handle 4 simultaneous i/o
requests. (Samsung 950 Pro)

The 960 series is out now.


To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx

< Previous Next >