-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I'm trying to produce a list of files with 'find' skipping some paths, but I can't find an optimal method. Maybe I just need more coffe. What I do at the moment is: find / -type f | egrep -v "/var/spool/news/" | \ egrep -v "/var/run/udev/links" | egrep -v "/var/run/user/" > filelist Also, I have not found a concoction to use egrep to filter out some strings on one go, like: ... egrep -v "/var/run/udev/links\|/var/spool/news/" or ... egrep -v "/var/run/udev/links" -v "/var/spool/news/" Oh, and I see I forgot to look only at line start. But that does not matter much. - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlNdCK4ACgkQtTMYHG2NR9WY1gCeMZksduejGaYmnsmSqP2pv3+P rBEAn02aYgucdeHZt/rYRhl65I8o+bLv =VPac -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Oops, I forgot to change the subject line from the mail template On 2014-04-27 15:39, Carlos E. R. wrote:
Hi,
I'm trying to produce a list of files with 'find' skipping some paths, but I can't find an optimal method. Maybe I just need more coffe.
-- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 04/27/2014 09:39 AM, Carlos E. R. wrote:
I'm trying to produce a list of files with 'find' skipping some paths, but I can't find an optimal method. Maybe I just need more coffe.
Perhaps a better description of what "in" and what's "out" would enable some of us who are slightly more conversant with find to help. You might also try reversing things. The first argument to find is a LIST OF DIRECTORIES. So you might think about that should be in that list and what should not be in it. In fact you might think about find $(list generator) -type f where the embedded script might even be something like find / -type d -maxdepth 3 -path "xxxx" -regex "yyyy" for values of xxxx and yyyy that match your needs, possibly negated. That way you are making find do the work and the outer find only dealing with the directories you want. This strikes me as being less of a load than having find list every file on the file system and then greping some out. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 16:28, Anton Aylward wrote:
On 04/27/2014 09:39 AM, Carlos E. R. wrote:
I'm trying to produce a list of files with 'find' skipping some paths, but I can't find an optimal method. Maybe I just need more coffe.
Perhaps a better description of what "in" and what's "out" would enable some of us who are slightly more conversant with find to help.
I'm simply doing this: find / -type f > listoffiles And I want to avoid these paths to be included in the list: /var/spool/news/ /var/run/udev/links /var/run/user/ /var/run/systemd/ /var/lib/ntp/proc/ /proc/ What I'm doing is generating the full list, then prune it. I was hooping 'find' to have an "--exclude-path", but I can't see such a thing in the large manual.
You might also try reversing things.
The first argument to find is a LIST OF DIRECTORIES.
So you might think about that should be in that list and what should not be in it. In fact you might think about
find $(list generator) -type f
where the embedded script might even be something like
find / -type d -maxdepth 3 -path "xxxx" -regex "yyyy"
for values of xxxx and yyyy that match your needs, possibly negated.
That way you are making find do the work and the outer find only dealing with the directories you want.
This strikes me as being less of a load than having find list every file on the file system and then greping some out.
Mmm.... interesting... [...] Yep... it works, it seems. I have: find "/" -maxdepth 4 -type d > $LISTADO_FIND_DIRS cat $LISTADO_FIND_DIRS | egrep -v "/var/spool/news/" \ | egrep -v "/var/run/udev/links" | egrep -v "/var/run/user/" \ | egrep -v "/var/run/systemd/" | egrep -v "/var/lib/ntp/proc/" \ | egrep -v "/proc/" > $LISTADO_FIND_PRUNED while read FILES ; do find "$FILES" -type f >> $LISTADO_FIND done < $LISTADO_FIND_PRUNED And I get: Telcontar:~ # wc -l encontrarscript* 20659 encontrarscript_listado_dirs 13038 encontrarscript_listado_dirs_pruned 3130530 encontrarscript_listado_ficheros 3164227 total Telcontar:~ # It is still running, the list of files is running and growing. Mmm. I think you propose something instead of my 'while' loop above: find $LISTADO_FIND_PRUNED -type f Where $LISTADO_FIND_PRUNED is a file containing the list of paths to search? -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 2014-04-27 17:12, Carlos E. R. wrote:
On 2014-04-27 16:28, Anton Aylward wrote:
On 04/27/2014 09:39 AM, Carlos E. R. wrote:
You might also try reversing things.
The first argument to find is a LIST OF DIRECTORIES.
Mmm.... interesting...
[...]
Yep... it works, it seems.
I have:
find "/" -maxdepth 4 -type d > $LISTADO_FIND_DIRS
cat $LISTADO_FIND_DIRS | egrep -v "/var/spool/news/" \ | egrep -v "/var/run/udev/links" | egrep -v "/var/run/user/" \ | egrep -v "/var/run/systemd/" | egrep -v "/var/lib/ntp/proc/" \ | egrep -v "/proc/" > $LISTADO_FIND_PRUNED
while read FILES ; do find "$FILES" -type f >> $LISTADO_FIND done < $LISTADO_FIND_PRUNED
Nope, does not work right. It is searching on the wrong paths: find: ‘/var/run/user/1000/gvfs’: Permission denied find: ‘/home/p2phelper/.gvfs’: Permission denied The reason is that the $LISTADO_FIND_PRUNED contains these entries: /var/lock/gkrellm /var/lock/lvm /var/run /var/run/lightdm /var/run/lightdm/cer /var/run/lightdm/cer2 The '/var/run/user/' entries are missing, but as /var/run is in there, it will be searched again. I have to revert to my previous procedure. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 04/27/2014 11:12 AM, Carlos E. R. wrote:
Mmm. I think you propose something instead of my 'while' loop above:
find $LISTADO_FIND_PRUNED -type f
Where $LISTADO_FIND_PRUNED is a file containing the list of paths to search?
In short, yes, but the $LISTADO_FIND_PRUNED is geenrated by 'find' as a subshell find $(program that generates LISTADO_FIND_PRUNED) -type f See the shell manual page for how to use embedded subshells. I make use of this facility quite a lot. Coprocessing offers a number of advantages over seperate sequential, although this isn't the best example of that.
I was hooping 'find' to have an "--exclude-path", but I can't see such a thing in the large manual.
Its there it just isn't called "--exclude-path". Look: it says ! expression Negation of a primary; the unary NOT operator. and expression [-a] expression Conjunction of primaries; the AND operator is implied by the juxtaposition of two primaries or made explicit by or you might look at -path pattern File name matches shell pattern pattern. The metacharacters do not treat `/' or `.' specially; so, for example, find . -path "./sr*sc" will print an entry for a directory called `./src/misc' (if one exists). Now here it is: To ignore a whole directory tree, use -prune rather than checking every file in the tree. So you could EITHER -prune or you could say "not this pattern and not this pattern and not this pattern" RTFM for further details http://www.theunixschool.com/2012/07/find-command-15-examples-to-exclude.htm... But this seems to be exactly what you want http://www.liamdelahunty.com/tips/linux_find_exclude_multiple_directories.ph... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 18:30, Anton Aylward wrote:
On 04/27/2014 11:12 AM, Carlos E. R. wrote:
Mmm. I think you propose something instead of my 'while' loop above:
find $LISTADO_FIND_PRUNED -type f
Where $LISTADO_FIND_PRUNED is a file containing the list of paths to search?
In short, yes, but the $LISTADO_FIND_PRUNED is geenrated by 'find' as a subshell
find $(program that generates LISTADO_FIND_PRUNED) -type f
See the shell manual page for how to use embedded subshells. I make use of this facility quite a lot. Coprocessing offers a number of advantages over seperate sequential, although this isn't the best example of that.
I see. Interesting. That gave me an idea of another part of the script, but I failed:
cer@Telcontar:~> grep Bourne-Again "$(head -c 1000 bin/0_script_constructs | file -)" grep: /dev/stdin: Bourne-Again shell script, UTF-8 Unicode text executable: No such file or directory cer@Telcontar:~>
The intention here was for grep to locate the string "Bourne-Again" on the output text of the subshell. But it does not do that, it interprets it as the file(s) to look inside. This part works:
cer@Telcontar:~> echo "$(head -c 1000 bin/0_script_constructs | file -)" /dev/stdin: Bourne-Again shell script, UTF-8 Unicode text executable cer@Telcontar:~>
The intention is simply to trigger an 'if' selector depending on whether it is a script or not (grep produces appropriate exit codes). I have this working instead with another method, which is a modification from an answer from a thread a year ago (Re: Re: [opensuse] Searching for string in many files [Was: 12.3 + ntfs + ext4 + USB3 + different copying speeds])
while read FILES ; do TIPO=`head -c 1000 "$FILES" | file - | awk '/Bourne-Again/{print $2}'` if [ "$TIPO" = "Bourne-Again" ]; then echo "$FILES" >> $LISTADO_SCRIPTS fi done < $LISTADO_FIND
which works perfectly, but I was attempting to speed it up a bit by using grep instead of awk. Of course, 'test' would be better, but it only matches on full strings, not substrings. Ie, there is no: test string -inside-string string which as a bash built-in would be the fastest one ([ -eq ])
I was hooping 'find' to have an "--exclude-path", but I can't see such a thing in the large manual.
Its there it just isn't called "--exclude-path".
Ah. I thought it might be so. ...
So you could EITHER -prune or you could say "not this pattern and not this pattern and not this pattern"
RTFM for further details http://www.theunixschool.com/2012/07/find-command-15-examples-to-exclude.htm...
But this seems to be exactly what you want http://www.liamdelahunty.com/tips/linux_find_exclude_multiple_directories.ph...
It is indeed. I'm using now: find "$DONDE" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND which is exactly what I wanted :-) -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 04/27/2014 02:06 PM, Carlos E. R. wrote:
On 2014-04-27 18:30, Anton Aylward wrote:
On 04/27/2014 11:12 AM, Carlos E. R. wrote:
[...]
I see. Interesting.
That gave me an idea of another part of the script, but I failed:
cer@Telcontar:~> grep Bourne-Again "$(head -c 1000 bin/0_script_constructs | File -)" grep: /dev/stdin: Bourne-Again shell script, UTF-8 Unicode text executable: No such file or directory cer@Telcontar:~>
The intention here was for grep to locate the string "Bourne-Again" on the output text of the subshell. But it does not do that, it interprets it as the file(s) to look inside.
Of course it fails! Check the man page for grep. In that usage it take a list of files. What you subshell is producing is a texzt stream What you meant was head -c 1000 bin/0_script_constructs | File - | Grep Bourne-Again No need for an embedded subshell.
But this seems to be exactly what you want http://www.liamdelahunty.com/tips/linux_find_exclude_multiple_directories.ph...
It is indeed. I'm using now:
find "$DONDE" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND
which is exactly what I wanted :-)
-- "The capacity to learn is a gift; The ability to learn is a skill; The willingness to learn is a choice." -- Brain Herbert, -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 20:26, Anton Aylward wrote:
On 04/27/2014 02:06 PM, Carlos E. R. wrote:
The intention here was for grep to locate the string "Bourne-Again" on the output text of the subshell. But it does not do that, it interprets it as the file(s) to look inside.
Of course it fails! Check the man page for grep. In that usage it take a list of files. What you subshell is producing is a texzt stream
I know that, and I said so in my email :-)
What you meant was
head -c 1000 bin/0_script_constructs | File - | Grep Bourne-Again
No need for an embedded subshell.
Ah... I see, yes. And if the exit code I get from that ( $? ) comes from the grep, it be fantastic. [...] Yep, it works: while read FILES ; do if [ -f "$FILES" ]; then TIPO=`head -c 1000 "$FILES" | file - | grep Bourne-Again` if [ $? -eq 0 ]; then echo "$FILES" >> $LISTADO_SCRIPTS fi fi done < $LISTADO_FIND Notes: a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later: head: error reading ‘/var/run/vmblock-fuse/dev’: Invalid argument head: error reading ‘/var/run/user’: Is a directory head: error reading ‘/var/run/systemd’: Is a directory head: error reading ‘/var/run/udev/links’: Is a directory head: error reading ‘/var/spool/news’: Is a directory head: error reading ‘/var/lib/ntp/proc’: Is a directory head: error reading ‘/proc’: Is a directory Some of those directories were supposedly excluded: find "$DONDE" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND "/proc" should be avoided. The contents are avoided indeed, but not the parent. b) The use of: head -c 1000 "$FILES" | file - instead of directly: file "$FILES" is because "file" takes an awful amount CPU time to find out the types of all the files I feed it with, some of them huge (several gigabytes). To find out if it is a bash script needs only the first line of the script (the shebang), or a bit more if it is missing. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 04/27/2014 02:54 PM, Carlos E. R. wrote:
a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later:
Possibly. Possibly not. The thing is that find produces a stream So if you have a file with the path $HOME/Long Directory name/even longer file name dot text then you will get the following path names Long Directory name even longer file name dot text What you need is to use "-print0" and "xargs -0" See the man page for details -- It is an error to imagine that evolution signifies a constant tendency to increased perfection. That process undoubtedly involves a constant remodelling of the organism in adaptation to new conditions; but it depends on the nature of those conditions whether the directions of the modifications effected shall be upward or downward. Thomas H. Huxley -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 21:12, Anton Aylward wrote:
On 04/27/2014 02:54 PM, Carlos E. R. wrote:
a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later:
Possibly. Possibly not.
The thing is that find produces a stream
So if you have a file with the path
$HOME/Long Directory name/even longer file name dot text
then you will get the following path names
Long Directory name even longer file name dot text
What you need is to use "-print0" and "xargs -0"
No, that's not a problem. The output of find is saved to a text file, which thus contains one entry per line. For your sample above, I would get this line: /home/cer/Long Directory name/even longer file name dot text Then I use: while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list Whitespace is not a problem this way. That's the reason I'm not using pipes, sometimes makes life easier and I can examine the intermediate steps with less, or even edit it, or reprocess that step with a different command without going the entire thing again. The original code was this one liner: find / -type f -print0 | xargs -0 file \ | awk '/Bourne-Again/{print $1}' | tr -d ':' \ | xargs -r grep -D skip SEARCHSTRING | less -S The problem is that it is terribly slow, several hours. One reason is that it explores absolutely all paths, like "/proc". That could be avoided with the -prune syntax. But another is that "file" often examines the full file, and some are huge. Even if it were a million files of 1 megabyte each, it means reading one terabyte bytes. Which why I'm trying the "head -c 1000" trick. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 04/27/2014 10:23 PM, Carlos E. R. wrote:
find / -type f -print0 | xargs -0 file \ | awk '/Bourne-Again/{print $1}' | tr -d ':' \ | xargs -r grep -D skip SEARCHSTRING | less -S
The problem is that it is terribly slow, several hours. One reason is that it explores absolutely all paths, like "/proc".
What about passing the mount points you're interested in, and then use the -xdev option to limit the search to these file systems? Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 23:03, Bernhard Voelker wrote:
What about passing the mount points you're interested in, and then use the -xdev option to limit the search to these file systems?
Nope! I want to find a script that I don't know in which disk it is... O:-) -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
В Sun, 27 Apr 2014 22:23:18 +0200 "Carlos E. R." <robin.listas@telefonica.net> пишет:
On 2014-04-27 21:12, Anton Aylward wrote:
On 04/27/2014 02:54 PM, Carlos E. R. wrote:
a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later:
Possibly. Possibly not.
The thing is that find produces a stream
So if you have a file with the path
$HOME/Long Directory name/even longer file name dot text
then you will get the following path names
Long Directory name even longer file name dot text
What you need is to use "-print0" and "xargs -0"
No, that's not a problem. The output of find is saved to a text file, which thus contains one entry per line. For your sample above, I would get this line:
/home/cer/Long Directory name/even longer file name dot text
Which breaks if filename contains newline.
Then I use:
while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list
Which loses consecutive whitespaces and performs "\" escaping.
Whitespace is not a problem this way.
Sure because you will lose them :)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-04-28 a las 06:44 +0400, Andrey Borzenkov escribió:
В Sun, 27 Apr 2014 22:23:18 +0200 "Carlos E. R." <> пишет:
/home/cer/Long Directory name/even longer file name dot text
Which breaks if filename contains newline.
Yes, of course, I thought of that, but... I've never seen a file with a newline. I would never dream of creating one. Why would anyone do such a thing? If spaces in names breaks so many scripts around, a newline would make the rest go berseck
Then I use:
while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list
Which loses consecutive whitespaces and performs "\" escaping.
Double spaces, you mean?
Whitespace is not a problem this way.
Sure because you will lose them :)
I have not seen a single "file not found error" in the output. The search has been running for hours, but I had to leave and I hibernated the machine. I will let it run when I get back. And then I will do a test run on a smaller path with names with double spaces, and see what happens. - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNfD6YACgkQja8UbcUWM1y5rQD5AYIMlilqyi01ifYuLpvhCbkS 1qrxE51Wo0+C6ychbIgBAJ3EDEm0Lr50VLw1a97TkZY058SN6pY1sQmxDALDzvG9 =uf/P -----END PGP SIGNATURE-----
On 04/27/2014 04:23 PM, Carlos E. R. wrote:
Then I use:
while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list
OUCH! You need to be careful with that. if you have a number of words on an input line the 'rest of the line" is assigned to the last variable of the line So if the text file contained onewordonaline twowords onaline three words onaline four words ona line five words on a line then yes, you version would soa up all the words on the line But one that began while read WORD WORD2 ; do echo "$WORD" ..... would produce Wrote four five That is why I use the "-print0" or the -printf or the \{\} options with find and xargs -- Quiet Great perfection seems incomplete, But does not decay; Great abundance seems empty, But does not fail. Great truth seems contradictory; Great cleverness seems stupid; Great eloquence seems awkward. As spring overcomes the cold, And autumn overcomes the heat, So calm and quiet overcome the world. -- Lao Tse, "Tao Te Ching" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-04-27 a las 22:54 -0400, Anton Aylward escribió:
On 04/27/2014 04:23 PM, Carlos E. R. wrote:
Then I use:
while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list
OUCH! You need to be careful with that.
if you have a number of words on an input line the 'rest of the line" is assigned to the last variable of the line
So if the text file contained
onewordonaline twowords onaline three words onaline four words ona line five words on a line
then yes, you version would soa up all the words on the line But one that began
while read WORD WORD2 ; do echo "$WORD" .....
would produce
Wrote four five
Nope, it doesn't. Look for yourself: cer@minas-tirith:~/bin> cat test_while #!/bin/bash while read FILES ; do echo "$FILES" # or whatever. done < text_file_containing_list cer@minas-tirith:~/bin> cat text_file_containing_list onewordonaline twowords onaline three words onaline four words ona line five words on a line line with double and triple spaces cer@minas-tirith:~/bin> cer@minas-tirith:~/bin> test_while onewordonaline twowords onaline three words onaline four words ona line five words on a line line with double and triple spaces cer@minas-tirith:~/bin> Words are written correctly, and with the correct number of spaces. :-)
That is why I use the "-print0" or the -printf or the \{\} options with find and xargs
Well, see above, it works just fine. I'm intentionally not using WORD1 WORD2 etc. - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNfEdYACgkQja8UbcUWM1yBdQD9EWexPEtRPmSRbN+JZfOVJXA+ H2Wmj9BRdfQEs6XPDJwA/jbzhjo9Zo3ft4c5Y6+3fjwfnYSfEYXJSboE897cb+RN =wF21 -----END PGP SIGNATURE-----
On 04/28/2014 10:43 PM, Carlos E. R. wrote:
Nope, it doesn't. Look for yourself:
Yes it does. I wrote
while read WORD WORD2 ; do echo "$WORD" .....
Note that WORD2 It means that the shell assigns the first token to WORD and the rest of the tokens to WORD2. If you read the man page for BASH/shell it mentions this behaviour :-) <quote> read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...] One line is read from the standard input, or from the file descriptor fd supplied as an argument to the -u option, and the first word is assigned to the first name, the second word to the second name, and so on, with leftover words and their intervening separators assigned to the last name. In your case, with only one 'name', the first and all the rest are assigned to that. In my case the first token is assigned to the first name and all subsequent tokens to the second name. You can treat this a number of ways. One is to use "-print0" and quoting. Another is to use a non-zero value of the second token to tell you that there are spaces. However I prefer the first as it prefers file names with tabs and multiples spaces. You can also use "-printf". However a lot of the time I use "-print0" and pipe the output of find into "xargs -0". -- When you are in any contest you should work as if there were - to the very last minute - a chance to lose it. Dwight D. Eisenhower -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-04-29 a las 10:00 -0400, Anton Aylward escribió:
On 04/28/2014 10:43 PM, Carlos E. R. wrote:
Nope, it doesn't. Look for yourself:
Yes it does.
I wrote
while read WORD WORD2 ; do echo "$WORD" .....
Note that WORD2
Why would I want to use that WORD2? Ok, I know why I would, and I have used that syntax on occasion. But my script is not using it because it would not work!
In my case the first token is assigned to the first name and all subsequent tokens to the second name. You can treat this a number of ways. One is to use "-print0" and quoting. Another is to use a non-zero value of the second token to tell you that there are spaces. However I prefer the first as it prefers file names with tabs and multiples spaces. You can also use "-printf". However a lot of the time I use "-print0" and pipe the output of find into "xargs -0".
I prefer instead to write intermediate steps on text files where I can examine them, and re-execute a secondary step directly from the intermediary file without running the first step. - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNgStgACgkQja8UbcUWM1wClgD9GOLrcIREzxrD2scbizP9lRrh 6L+g8wDFTLDRx7RtYX0A/AuIaMjJ+TvhPnnzX4ZgxEJwqmUTrsqomBh0UMmzTCtp =nSgs -----END PGP SIGNATURE-----
On 04/29/2014 08:59 PM, Carlos E. R. wrote:
El 2014-04-29 a las 10:00 -0400, Anton Aylward escribió:
You can treat this a number of ways. One is to use "-print0" and quoting. Another is to use a non-zero value of the second token to tell you that there are spaces. However I prefer the first as it prefers file names with tabs and multiples spaces. You can also use "-printf". However a lot of the time I use "-print0" and pipe the output of find into "xargs -0".
I prefer instead to write intermediate steps on text files where I can examine them, and re-execute a secondary step directly from the intermediary file without running the first step.
I believe someone mentioned that the shell methods treat all whitespace, that is any string of space, tab, newline, as a single separator. So a file with the name "This\ is\ \t\ a\ \ \ long\nfile\ name" would be output with the "-print" in such a way that it is read into your WORD as either "This is a long file name" or "This is a long" depending on whether you piped or used an intermediate file. Thus when you tried accessing the file by the name in WORD it would fail. BTDT. That's why I use "-print0" and "xargs -0" Of course people coming from a MS-DOS (and a few other Oss come to that) world don't get to deal with long file names with embedded space :-) -- There are 10 kinds of people in the world, those that understand trinary, those that don't, and those that confuse it with binary. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-04-29 a las 22:03 -0400, Anton Aylward escribió:
I believe someone mentioned that the shell methods treat all whitespace, that is any string of space, tab, newline, as a single separator.
So a file with the name "This\ is\ \t\ a\ \ \ long\nfile\ name" would be output with the "-print" in such a way that it is read into your WORD as either
"This is a long file name" or "This is a long"
depending on whether you piped or used an intermediate file.
Then I will not us use "-print" :-) The way I'm doing it I get the right number of spaces and I can access all those files, it appears. I just get a few unintended entries (the parent dirs). I will do a few more tests to verify, though, so thanks :-)
Thus when you tried accessing the file by the name in WORD it would fail.
BTDT.
That's why I use "-print0" and "xargs -0"
Of course people coming from a MS-DOS (and a few other Oss come to that) world don't get to deal with long file names with embedded space :-)
I do have a lot of long file names with spaces in them, several some times :-) - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNgX48ACgkQja8UbcUWM1wTSAD/Qm1KZMVeWLn4Zrj5MBrKyn03 KJPhmbFVgUqe6xTbEYYA/3kLO2IETamjfpv7go45vkgMqALcOI0KoHvF52SeyIkn =O7/a -----END PGP SIGNATURE-----
On 04/29/2014 10:27 PM, Carlos E. R. wrote:
The way I'm doing it I get the right number of spaces and I can access all those files, it appears.
You probably don't have the kind of file names I described, though :-) May people do shell programming as a 'suck it and see' iteration such as this thread has addressed. In the 80% case it works and you sometimes, as this thread has covered, learn along the way. But sometimes there are errors because things don't who up. Some of those pernicious file names are an example of that. I got BTDT'd when doing a backup. A file with a double space "\ \ " didn't get backed up. If course it was an important file. You live and learn though making mistakes. Hopefully the mistakes aren't too serious and its best if you can learn from the mistakes of others. I learnt early one a few important principles, despite resistance from my peers and managers: Context is Everything Generalise and parametrise Document everything, especially the design decisions Solve the right problem If it gets too difficult or too complex then you're doing it wrong "Form follows Function" - Loius Sullivan, 1896 Reduce 'ripple effect' by reducing coupling -- Overcoming Water overcomes the stone; Without substance it requires no opening; This is the benefit of taking no action. Yet benefit without action, And experience without abstraction, Are practiced by very few. -- Lao Tse, "Tao Te Ching" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Tue, 29 Apr 2014, Anton Aylward wrote:
That's why I use "-print0" and "xargs -0"
Use: find . [...] -exec sh -c 'shellscript' find-sh {} + That's fast, portable and handles all filenames correctly. If you got a lot / weird stuff in "shellscript", use an actual external (temporary) script. In both cases, you get filenames in "$@", so you can iterate over them in the usual way, i.e. for file in "$@"; do ... ;done ### portable POSIX sh for file; do ... done ### bash (and others?) so, it could e.g. look like this: find . -type f -exec \ /bin/bash -c 'for f; do echo ">>$f<<"; done; echo =====;' find-sh {} + ^^^^^^^ this becomes "$0" to be seen in 'ps' output with the 'echo =====' you'll see what batches find called the sh-script with. HTH, -dnh -- / "I think the key is finding a spousish unit who shares the \ [ tendency to sit for hours at a time reading and typing for ] \ no perceptible reason." -- Bill Cole / -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 04/27/2014 08:54 PM, Carlos E. R. wrote:
while read FILES ; do if [ -f "$FILES" ]; then TIPO=`head -c 1000 "$FILES" | file - | grep Bourne-Again` if [ $? -eq 0 ]; then echo "$FILES" >> $LISTADO_SCRIPTS fi fi done < $LISTADO_FIND
Notes:
a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later:
head: error reading ‘/var/run/vmblock-fuse/dev’: Invalid argument
Very unlikely - there's something wrong in $LISTADO_FIND. However it was produced ... it contains directories.
b) The use of:
head -c 1000 "$FILES" | file -
instead of directly:
file "$FILES"
is because "file" takes an awful amount CPU time to find out the types of all the files I feed it with, some of them huge (several gigabytes).
Maybe: find ... -size -100k ,,, assuming that shell script are usually smaller than 100k (okay, java shell installers are bigger, of course). Have fun, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 22:16, Bernhard Voelker wrote:
On 04/27/2014 08:54 PM, Carlos E. R. wrote:
a) The test for "regular file" is needed, because the find, despite using "find "$DONDE" -type f ..." finds some directories, and thus I get some errors later:
head: error reading ‘/var/run/vmblock-fuse/dev’: Invalid argument
Very unlikely - there's something wrong in $LISTADO_FIND. However it was produced ... it contains directories.
It does indeed, I know. Very few, but some. But the one above (‘/var/run/vmblock-fuse/dev’) is a regular file, but zero bytes. I already added a check for that one: if [ -f "$FILES" -a -s "$FILES" ]; then which I have not tested yet because the script has been running for some time and I do not want to stop it again, unless I hit another big error. The "find" command I'm running now is this: find "/" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND And the directories it appears to find are: ‘/var/run/user’ ‘/var/run/systemd’ ‘/var/run/udev/links’ ‘/var/spool/news’ ‘/var/lib/ntp/proc’ ‘/proc’ That is, it finds exactly those I tell it avoid. Everything inside those paths are skipped, but not the parents. It is not a big problem, I could just ignore the errors, because it would be faster that testing for them.
b) The use of:
head -c 1000 "$FILES" | file -
instead of directly:
file "$FILES"
is because "file" takes an awful amount CPU time to find out the types of all the files I feed it with, some of them huge (several gigabytes).
Maybe:
find ... -size -100k ,,,
assuming that shell script are usually smaller than 100k (okay, java shell installers are bigger, of course).
Yes, that would work, too. Interesting... Ah, you mean those big scripts with the compressed data tagged behind. Java installers, typically. No big deal... I'm not interested in them. I could do both things: tell "find" to limit itself to a certain size, and then also use "head". And also explore if I can tell find to skip non-regular files, and empty files too (Self-RTFM-again). -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
В Sun, 27 Apr 2014 22:43:02 +0200 "Carlos E. R." <robin.listas@telefonica.net> пишет:
The "find" command I'm running now is this:
find "/" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND
And the directories it appears to find are:
‘/var/run/user’ ‘/var/run/systemd’ ‘/var/run/udev/links’ ‘/var/spool/news’ ‘/var/lib/ntp/proc’ ‘/proc’
That is, it finds exactly those I tell it avoid.
No you did not. You misunderstand how -prune works. To prune this list you need \( -path /var/spool/news -o -path /var/run/udev/links -o ... \) -prune -o \( -type f <whatever you want to do with non-pruned paths> \)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-04-28 a las 06:33 +0400, Andrey Borzenkov escribió:
В Sun, 27 Apr 2014 22:43:02 +0200 "Carlos E. R." <> пишет:
That is, it finds exactly those I tell it avoid.
No you did not. You misunderstand how -prune works. To prune this list you need
\( -path /var/spool/news -o -path /var/run/udev/links -o ... \) -prune -o \( -type f <whatever you want to do with non-pruned paths> \)
I certainly do not understand how prune works. I simply copied an example from one of the given links. :-) Notice that it is working, kind of. It does not "find" "/var/spool/news/message.id/" or any of the million files under it. It simply finds the directory entry "/var/spool/news". - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNfCeMACgkQja8UbcUWM1wbuAD/cLgM10gTWxS7LhE1pvPie5qO B2snH/7EL3bAiAvDQiUA/1unkfBChFLNPRRkZWOtcmjJqeTDrYL8gpPjToGSCsDy =L4hX -----END PGP SIGNATURE-----
Hello, On Tue, 29 Apr 2014, Carlos E. R. wrote:
El 2014-04-28 a las 06:33 +0400, Andrey Borzenkov escribió:
? Sun, 27 Apr 2014 22:43:02 +0200 "Carlos E. R." <> ?????: No you did not. You misunderstand how -prune works. To prune this list you need
\( -path /var/spool/news -o -path /var/run/udev/links -o ... \) -prune ^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ -o \( -type f <whatever you want to do with non-pruned paths> \) [..] Notice that it is working, kind of. It does not "find" "/var/spool/news/message.id/" or any of the million files under it.
Because you -pruned that path. The "action" "-prune" applies to the whole expression before it. I.e.: \( -path a -o -path b -o -path c -o -path d \) -prune is the same as -path a -prune -o -path b -prune -o -path c -prune -o -path d -prune HTH -dnh -- / "People from East Germany find the West so confusing. It's so much \ [ easier when you have only one choice." -- Linus Torvalds, explaining ] \ why having $BIGNUM Linux distributions is not necessarily a bad thing / -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Sun, 27 Apr 2014, Carlos E. R. wrote:
Some of those directories were supposedly excluded:
find "$DONDE" -type f \ -prune -o -path '/var/spool/news' \ -prune -o -path '/var/run/udev/links' \ -prune -o -path '/var/run/user' \ -prune -o -path '/var/run/systemd' \ -prune -o -path '/var/lib/ntp/proc' \ -prune -o -path '/proc' \ -prune -o -path '/run/udev/links' \ > $LISTADO_FIND
"/proc" should be avoided. The contents are avoided indeed, but not the parent.
You've got '-prune' backwards. It should be: find "$DONDE" \ -path "/var/spool/news" -prune \ -o -path '/var/run/udev/links' -prune \ ... -o -path "/proc" -prune \ -o -type f -print HTH, -dnh -- Trying to make bits uncopyable is like trying to make water not wet. The sooner people accept this, and build business models that take this into account, the sooner people will start making money again. -- Bruce Schneier -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Content-ID: <alpine.LSU.2.11.1404290425260.20923@minas-tirith.valinor> El 2014-04-28 a las 03:07 +0200, David Haller escribió:
On Sun, 27 Apr 2014, Carlos E. R. wrote:
Some of those directories were supposedly excluded:
You've got '-prune' backwards. It should be:
find "$DONDE" \ -path "/var/spool/news" -prune \ -o -path '/var/run/udev/links' -prune \ ... -o -path "/proc" -prune \ -o -type f -print
Ah... :-) Let's try, with a reduced search for testing: minas-tirith:~ # find /var/run -path /var/run/udev/links -prune find: ‘/var/run/user/1000/gvfs’: Permission denied /var/run/udev/links minas-tirith:~ # Nope. Ok, let's try again: minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f -print /var/run/dhclient-enp0s29f7u1.pid /var/run/atd.pid /var/run/suspend.grubonce.default /var/run/pm-suspend /var/run/pm-utils/pm-suspend/storage/state:cpu1_governor /var/run/pm-utils/pm-suspend/storage/state:cpu0_gover ... Right, that's it :-) Why is "-print" needed? minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f | wc -l find: ‘/var/run/user/1000/gvfs’: Permission denied 476 minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f -print | wc -l find: ‘/var/run/user/1000/gvfs’: Permission denied 475 minas-tirith:~ # Just one line less... which one? Heh! minas-tirith:~ # diff p1 p2 207a208
/var/run/udev/links minas-tirith:~ #
Precissely that one! Ok, so I need that "-print", but I don't understand why it has that effect... :-? - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlNfDgEACgkQja8UbcUWM1y4HAD/YEQXEGnM1R+UpbEH4Tmcz0Zb 5utinlXmsX4CtARzFu8A/3NZ+QODvO3MqyeGKVhHTCoAdSSgpHNQzIGFE4OzE25w =eSXM -----END PGP SIGNATURE-----
On 04/29/2014 04:27 AM, Carlos E. R. wrote:
Why is "-print" needed?
From 'man find':
If no expression is given, the expression -print is used (but you should probably consider using -print0 instead, anyway). Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Tue, 29 Apr 2014, Carlos E. R. wrote:
El 2014-04-28 a las 03:07 +0200, David Haller escribió:
On Sun, 27 Apr 2014, Carlos E. R. wrote:
Some of those directories were supposedly excluded: You've got '-prune' backwards. It should be:
find "$DONDE" \ -path "/var/spool/news" -prune \ -o -path '/var/run/udev/links' -prune \ ... -o -path "/proc" -prune \ -o -type f -print
Ah... :-)
Let's try, with a reduced search for testing:
minas-tirith:~ # find /var/run -path /var/run/udev/links -prune find: '/var/run/user/1000/gvfs': Permission denied /var/run/udev/links minas-tirith:~ #
Nope. Ok, let's try again:
minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f -print /var/run/dhclient-enp0s29f7u1.pid /var/run/atd.pid /var/run/suspend.grubonce.default /var/run/pm-suspend /var/run/pm-utils/pm-suspend/storage/state:cpu1_governor /var/run/pm-utils/pm-suspend/storage/state:cpu0_gover ...
Right, that's it :-)
Why is "-print" needed?
I think it is because "If no expression is given, the expression -print is used" (I think that should be "action" in both cases, as 'find . -type f' has an expression, but no action, and the default kicks in)) and you have an expression (rather "action" "-prune"). That you have no "action" for the "-o -type f" is your fault ;) Why /v/r/udev/links is printed in the first example is beyond me. Maybe a bug.
minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f | wc -l find: '/var/run/user/1000/gvfs': Permission denied 476 minas-tirith:~ # find /var/run -path /var/run/udev/links -prune -o -type f -print | wc -l find: '/var/run/user/1000/gvfs': Permission denied 475 minas-tirith:~ #
Just one line less... which one?
Heh!
minas-tirith:~ # diff p1 p2 207a208
/var/run/udev/links minas-tirith:~ #
Precissely that one! Ok, so I need that "-print", but I don't understand why it has that effect... :-?
Probably a bug ;) Don't worry about it, just _always_ use -print and give '.' as the search-path, then your stuff will also work with other finds. HTH, -dnh -- Carter: Sir, I've been thinking. O'Neill: I'd be shocked if you ever stopped, Carter. -- Stargate SG-1, 5x05 - Red Sky -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote:
I'm trying to produce a list of files with 'find' skipping some paths, but I can't find an optimal method. Maybe I just need more coffe.
What I do at the moment is:
find / -type f | egrep -v "/var/spool/news/" | \ egrep -v "/var/run/udev/links" | egrep -v "/var/run/user/" > filelist
That's pretty much what I usually do, except I tend to combine the patterns into one regex.
Also, I have not found a concoction to use egrep to filter out some strings on one go, like:
... egrep -v "/var/run/udev/links\|/var/spool/news/"
Don't escape the or: ... egrep -v "/var/run/udev/links|/var/spool/news/" -- Per Jessen, Zürich (11.8°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-04-27 16:53, Per Jessen wrote:
Carlos E. R. wrote:
Also, I have not found a concoction to use egrep to filter out some strings on one go, like:
... egrep -v "/var/run/udev/links\|/var/spool/news/"
Don't escape the or:
... egrep -v "/var/run/udev/links|/var/spool/news/"
Right! That works. Thanks. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
participants (7)
-
Andrey Borzenkov
-
Anton Aylward
-
Bernhard Voelker
-
Carlos E. R.
-
Carlos E. R.
-
David Haller
-
Per Jessen