Randall wrote regarding 'Re: [SLE] slow reiserfs?' on Thu, Sep 09 at 13:15:
Dave, Danny,
On Thursday 09 September 2004 10:46, Danny Sauer wrote:
Dave wrote regarding '[SLE] slow reiserfs?' on Thu, Sep 09 at 08:06:
I'm trying to discover the cause of some long delays in the running of a program.
I'm running a Perl script that is comparing the files in two directories. There are about 150,000 files of a few KB each in each directory, which mostly correspond but they're not all identical. My script is running diff on each pair. It prints a timestamped line for each pair and this scrolls up the screen but sometimes stops for several seconds - ten is the most I've noticed -
Dave, you say you're using "diff" to compare the files. Do you need to know exactly _how_ the files differ, or only _that_ they differ? If it's the latter, then use the "cmp" command and you'll cut down on the CPU time consumed. Of course, this will only make the process more disk-bound than it already is, but that's still going produce some improvement in overall run-time.
Either way, it'd be faster to use one of the perl modules that implements the diff algorithm rather than launching the diff program. If it's just "do they differ" then it'd be quicker to just cmp them line by line... Calculating a checksum will require reading the whole file, and by the time you've read in the file, you could've been comparing it to the other file and be done. A checksum would only be useful if you were using the contents of the file more than once - in which case it'd cut down on memory consumption quite a bit (though you'd still have to compare the files directly if the checksum matched, given the possibility of multiple files generating the same checksum with some algorithms). Ignore the whole "read them into memory" thing, though. I read the OP as "comparing the files in the directories" instead of "comparing the files in two directories". --Danny, who should really change his screen font...