Hello everyone, how could i make a md5sum file calculated from files stored in a directory ( like in ftp's ... ) ? Thanks for your help -- Laurent Renard
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ? Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt Regards, -- Lars Haugseth
Lars Haugseth wrote:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt
Regards,
10/10 Thank you very much and have a good day ;) -- Laurent Renard
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 04:29:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
If you just want all of the files in a dir, just do md5sum * > checksums.txt It'll skip directories (and warn on the sommand line for each skipped dir, unless you add a 2>/dev/null). No need to fire up find *and* xargs in that case. :) --Danny, who generally prefers -exec to piping through xargs...
Danny Sauer wrote:
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 04:29:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
If you just want all of the files in a dir, just do md5sum * > checksums.txt It'll skip directories (and warn on the sommand line for each skipped dir, unless you add a 2>/dev/null). No need to fire up find *and* xargs in that case. :)
--Danny, who generally prefers -exec to piping through xargs...
GREAT : Thanx Danny ;) -- Laurent Renard
Danny, Laurent, On Friday 21 January 2005 07:32, Danny Sauer wrote:
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 04:29:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
If you just want all of the files in a dir, just do md5sum * > checksums.txt It'll skip directories (and warn on the sommand line for each skipped dir, unless you add a 2>/dev/null). No need to fire up find *and* xargs in that case. :)
But there is a limit on how many characters of arguments can be passed in a single command invocation. The whole point of xargs is to deal with that limit. So if you use the short version of the command suggested by Danny in a sufficiently large directory (where large is defined both in terms of the number of entries, including unacceptable entries such as directories, and the length of the names), it will fail and the xargs approach becomes necessary.
--Danny, who generally prefers -exec to piping through xargs...
And when you use "-exec" a fork/exec pair happens for _every find hit_! The overhead is considerable and becomes noticeable for even moderately populated directories or directory hierarchies. Randall Schulz
Randall wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 10:44:
Danny, Laurent,
On Friday 21 January 2005 07:32, Danny Sauer wrote:
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 04:29:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
If you just want all of the files in a dir, just do md5sum * > checksums.txt It'll skip directories (and warn on the sommand line for each skipped dir, unless you add a 2>/dev/null). No need to fire up find *and* xargs in that case. :)
But there is a limit on how many characters of arguments can be passed in a single command invocation. The whole point of xargs is to deal with that limit.
So if you use the short version of the command suggested by Danny in a sufficiently large directory (where large is defined both in terms of the number of entries, including unacceptable entries such as directories, and the length of the names), it will fail and the xargs approach becomes necessary.
True, but in most common uses, the much shorter "md5sums *" is quite a bit easier to type - both in terms of speed and accuracy. In the border cases where there are a whole lot of entries in a directory, xargs will definately save you.
--Danny, who generally prefers -exec to piping through xargs...
And when you use "-exec" a fork/exec pair happens for _every find hit_! The overhead is considerable and becomes noticeable for even moderately populated directories or directory hierarchies.
I'm looking at the file xargs.c right now. Line 766 from findutils 4.1.20 - the beginning of "do_exec". --- while ((child = fork ()) < 0 && errno == EAGAIN && procs_executing) wait_for_proc (false); switch (child) { case -1: error (1, errno, _("cannot fork")); case 0: /* Child. */ execvp (cmd_argv[0], cmd_argv); error (0, errno, "%s", cmd_argv[0]); _exit (errno == ENOENT ? 127 : 126); } add_proc (child); } --- It's essentially the same code in find (function launch in pred.c). Then, why is xargs faster? Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that. I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain why it takes 8-10 seconds to do all 2642 files in the samba3 source dir (which was convenient at the time) using -exec, while it only takes about 2 seconds to do the same thing with xargs. The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :) --Danny, who learned something today
Danny, On Friday 21 January 2005 09:49, Danny Sauer wrote:
...
It's essentially the same code in find (function launch in pred.c). Then, why is xargs faster?
Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that. I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain why it takes 8-10 seconds to do all 2642 files in the samba3 source dir (which was convenient at the time) using -exec, while it only takes about 2 seconds to do the same thing with xargs.
The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :)
True enough. But another thing to keep in mind, at least in general, is that not all systems that support the Gnu tools (find and xargs, e.g.) implement copy-on-write fork semantics. On such systems the performance disparity between the -exec and xargs approaches is far greater than what you'd experience on a Linux system. Cygwin is such a Gnu tools platform, e.g.
--Danny, who learned something today
So... It's a _good_ day, right? Randall Schulz
Randall R Schulz wrote:
Danny,
On Friday 21 January 2005 09:49, Danny Sauer wrote:
...
It's essentially the same code in find (function launch in pred.c). Then, why is xargs faster?
Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that. I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain why it takes 8-10 seconds to do all 2642 files in the samba3 source dir (which was convenient at the time) using -exec, while it only takes about 2 seconds to do the same thing with xargs.
The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :)
True enough. But another thing to keep in mind, at least in general, is that not all systems that support the Gnu tools (find and xargs, e.g.) implement copy-on-write fork semantics. On such systems the performance disparity between the -exec and xargs approaches is far greater than what you'd experience on a Linux system. Cygwin is such a Gnu tools platform, e.g.
--Danny, who learned something today
So... It's a _good_ day, right?
Randall Schulz
Of course, yes ;) -- Laurent Renard
On Friday 21 January 2005 18:49, Danny Sauer wrote:
Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that.
That is correct.
I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain why it takes 8-10 seconds to do all 2642 files in the samba3 source dir (which was convenient at the time) using -exec, while it only takes about 2 seconds to do the same thing with xargs.
The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :)
You can also use 'xargs -n 1' to execute the command with a single argument at a time. -- Lars Haugseth
Lars, Danny, On Friday 21 January 2005 11:11, Lars Haugseth wrote:
On Friday 21 January 2005 18:49, Danny Sauer wrote:
Oh, wait, looking closer... It appears that xargs builds a commandline up until the maximum length on a system, and then runs that.
That is correct.
...
The -exec method is more flexible, though, and will work with programs that only accept one file as an arg, etc. I guess it's just a matter of choosing the right tool for the job, and sinc -exec always works, I tend to go that way. It's easier to remember one thing than 2. :)
You can also use 'xargs -n 1' to execute the command with a single argument at a time.
Which of course extends to situations where the source of arguments is not the file system and hence "find" is not useful.
-- Lars Haugseth
Randall Schulz
Lars wrote regarding 'Re: [SLE] Md5sum file from dir' on Fri, Jan 21 at 13:05:
On Friday 21 January 2005 18:49, Danny Sauer wrote: [...]
I always thought that xargs ran a new instance of the exec()'d program for each individual argument. Hmph. I guess that'd explain [...] You can also use 'xargs -n 1' to execute the command with a single argument at a time.
I normally just reimplement that kind of behavior in perl - which is my standard solution when the shell won't do what I want. :) I've never actually used xargs, except when I was benchmarking it against find. Though, having read the man page now, I think I may have a use for the process accounting ability - the -P option - where I've wanted to make a few jobs run in parallel, but I haven't gotten around to actually implementing the logic neccesary to keep track of fork()'d children... Darn you people. I clearly said that I didn't want to have to remember more than one thing, but this discussion has forced me to not only read and understand the sorce for xargs, but also to consider rewriting several maintenence scripts. I already had enough to do! Randall, does that answer your question as to whether this is a good day or not? :) --Danny, happy to have some new "interesting" work that'll be good to break up the more mundane.
Lars Haugseth wrote:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt
How do you get these to work with filenames that have spaces in their name?
On Friday 21 January 2005 22:38, user86 wrote:
Lars Haugseth wrote:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt
How do you get these to work with filenames that have spaces in their name?
$ find <directory> -type f | xargs -i md5sum "{}" > checksums.txt Regards, -- Lars Haugseth
Lars Haugseth wrote:
On Friday 21 January 2005 22:38, user86 wrote:
Lars Haugseth wrote:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt How do you get these to work with filenames that have spaces in their name?
$ find <directory> -type f | xargs -i md5sum "{}" > checksums.txt
Regards,
Thanks, but that also fails if a filename has an apostrophe in its name. "Bull's Eye.txt" I get an error of "xargs: unmatched single quote".
U, On Friday 21 January 2005 17:32, user86 wrote:
...
How do you get these to work with filenames that have spaces in their name?
$ find <directory> -type f | xargs -i md5sum "{}" > checksums.txt
Regards,
Thanks, but that also fails if a filename has an apostrophe in its name. "Bull's Eye.txt" I get an error of "xargs: unmatched single quote".
Then my NUL byte technique will work (I think) % find ... |tr '\n' '\0' |xargs -0 cmd fixedArgs... Randall Schulz
On Saturday 22 January 2005 02:47, Randall R Schulz wrote:
U,
On Friday 21 January 2005 17:32, user86 wrote:
...
How do you get these to work with filenames that have spaces in their name?
$ find <directory> -type f | xargs -i md5sum "{}" > checksums.txt
Regards,
Thanks, but that also fails if a filename has an apostrophe in its name. "Bull's Eye.txt" I get an error of "xargs: unmatched single quote".
Then my NUL byte technique will work (I think)
% find ... |tr '\n' '\0' |xargs -0 cmd fixedArgs...
No need to pipe through 'tr': $ find <directory> -type f -print0 | xargs -0 md5sum > checksums.txt -- Lars Haugseth
Lars, On Saturday 22 January 2005 01:09, Lars Haugseth wrote:
On Saturday 22 January 2005 02:47, Randall R Schulz wrote:
U,
...
Then my NUL byte technique will work (I think)
% find ... |tr '\n' '\0' |xargs -0 cmd fixedArgs...
No need to pipe through 'tr':
$ find <directory> -type f -print0 | xargs -0 md5sum > checksums.txt
Ah, yes. Now that you mention it, I dimly remember this option, but didn't think of it when it would be most useful.
Lars Haugseth
Randall Schulz
U, On Friday 21 January 2005 13:38, user86 wrote:
Lars Haugseth wrote:
On Friday 21 January 2005 11:02, Laurent Renard wrote: | Hello everyone, | | how could i make a md5sum file calculated from files stored in a | directory ( like in ftp's ... ) ?
Non-recursive: $ find <directory> -type f -maxdepth 1 | xargs md5sum > checksums.txt
Recursive: $ find <directory> -type f | xargs md5sum > checksums.txt
How do you get these to work with filenames that have spaces in their name?
This is what I do in such circumstances: % find ... |tr '\n' '\0' |xargs -0 cmd fixedArgs... Lars' solution is probably better: simpler, cleaner, fewer processes, etc. Randall Schulz
participants (5)
-
Danny Sauer
-
Lars Haugseth
-
Laurent Renard
-
Randall R Schulz
-
user86