Mailinglist Archive: opensuse (1264 mails)

< Previous Next >
Re: [opensuse] Quick question: how to call a script function from "find"?
  • From: Vojtěch Zeisek <vojtech.zeisek@xxxxxxxxxxxx>
  • Date: Wed, 14 Jun 2017 12:58:50 +0200
  • Message-id: <2843731.7PqvaIjhpi@tilia>
Dne středa 14. června 2017 12:07:22 CEST, Carlos E. R. napsal(a):
On 2017-06-14 08:09, Vojtěch Zeisek wrote:
Dne úterý 13. června 2017 22:34:01 CEST, Carlos E. R. napsal(a):
On 2017-06-13 22:14, Per Jessen wrote:
Vojtěch Zeisek wrote:
find /home/cer/Fusion/Videos/ -type d | parallel "sudo... && chmod...
&&..." This is my favorite style. :-)

I didn't known "parallel" - it has a 3000+ line man page :-)
Thanks for making me aware.

There are two packages: one from gnu, and another from another party.

The manual is way too long... it needs a quick guide to get started.

https://soubory.trapa.cz/linuxcourse/linux_bash_metacentrum_course.pdf My
Linux course, slides 99-101. Not showing all features, of course, but at
least something to start with...

Thanks :-)

I don't say I'm GNU Parallel expert, there are definitely bigger experts in
this ML. :-) I also haven't read whole man page thoroughly. ;-)
One more example I used recently:
find $DIR -name "*trm_R1*" -print | parallel "echo && echo '{}' && echo && bwa
mem '{//}'/$REFB '{//}'/*trm_R1* '{//}'/*trm_R2* | samtools view -bu |
samtools sort -l 9 -o '{= s:trm.+$:paired.bam: =}'"
The input files are in subdirectories. Names of files and directories reflect
sample names and some more information (fixed structure). bwa and samtools
process genetic data (I'm not going to describe details here).
Everything inside "..." is done for one line returned by 'find'. It tells
which file is processed (echo '{}' - '{}' contains whole input line returned
by 'find'). Command 'bwa mem' is taking genetic reference stored in each
directory ($REFB contains its basename and '{//}' expands to directory part of
each line returned by 'find') and two input files (similar case, filename
pattern is known and stable). Its output is given by pipe to 'samtools' (some
magic molecular processing:-) and then to final 'samtools' (even more bio
magic:-). I really love the construction of '{= ... =}'. It contains Perl
regular expression. It takes the whole input name (as '{}') and it can be
processed by Perl (almost sed-like, ehm, ehm;-) search/replace playing. This
is really useful to obtain some regular pattern for output files. I.e. the
input file is named ...trm... and I'm changing this to unpaired.bam. Perfect.
:-) Like this I process sometimes several hundreds samples. Every takes up to
several hours. If I distribute the task among 12-16 CPU threads, I do it
significantly faster than when using for/while loop. :-)

--
Vojtěch Zeisek
https://trapa.cz/
< Previous Next >
Follow Ups