Randall wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 10:44:
Danny, Laurent,
On Friday 21 January 2005 07:32, Danny Sauer wrote:
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 04:29:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
If you just want all of the files in a dir, just do md5sum * > checksums.txt It'll skip directories (and warn on the sommand line for each skipped dir, unless you add a 2>/dev/null). No need to fire up find *and* xargs in that case. :)
But there is a limit on how many characters of arguments can be passed in a single command invocation. The whole point of xargs is to deal with that limit.
So if you use the short version of the command suggested by Danny in a sufficiently large directory (where large is defined both in terms of the number of entries, including unacceptable entries such as directories, and the length of the names), it will fail and the xargs approach becomes necessary.
True, but in most common uses, the much shorter "md5sums *" is quite a bit easier to type - both in terms of speed and accuracy. In the border cases where there are a whole lot of entries in a directory, xargs will definately save you.
--Danny, who generally prefers -exec to piping through xargs...
And when you use "-exec" a fork/exec pair happens for _every find hit_! The overhead is considerable and becomes noticeable for even moderately populated directories or directory hierarchies.
I'm looking at the file xargs.c right now. Line 766 from findutils 4.1.20 - the beginning of "do_exec". --- while ((child = fork ()) < 0 && errno == EAGAIN && procs_executing) wait_for_proc (false); switch (child) { case -1: error (1, errno, _("cannot fork")); case 0: /* Child. */ execvp (cmd_argv[0], cmd_argv); error (0, errno, "%s", cmd_argv[0]); _exit (errno == ENOENT ? 127 : 126); } add_proc (child); } --- It's essentially the same code in find (function launch in pred.c). Then, why is xargs faster? Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that. I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain why it takes 8-10 seconds to do all 2642 files in the samba3 source dir (which was convenient at the time) using -exec, while it only takes about 2 seconds to do the same thing with xargs. The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :) --Danny, who learned something today