Mailinglist Archive: opensuse (1445 mails)

< Previous Next >
Re: [opensuse] faster way to get total dir size besides du?
Hello,

On Mon, 05 Mar 2012, David C. Rankin wrote:
Does anyone have a favorite way to get the total directory size
besides running 'du -hcs'? With directories with many files, du is
just too slow. Even a quick rough hack that is within 10% would be
fine for taking a quick look to find which directory is hogging all
the space.

Simple: du. That operation is I/O bound and depends _a lot_ on the
filesystem. With reiserfs, btrfs it'll be quite fast I guess, with
ext2 and ext3, it's rather slow. Some facts:

- AFAIK no filesystem I know of stores the sizes of a directory's
contents somewhere in the directory's metadata

- therefore one has to walk the directories and have a look at the
contained files and subfolders to sum up the sizes (i.e. call
'stat(2)' on the file and subdirs).

Here's a simplistic recursive perl version:

==== dirsize.pl ====
#!/usr/bin/perl
use warnings;
use strict;
use Cwd;

unless(scalar @ARGV) {
$0 =~ m{.*?([^/]+$)};
print STDERR "Usage: $1 dir [dir ...]\n";
}

sub dirsize {
my $arg = shift;
my $olddir = cwd();
if( -d $arg && ! -l $arg ) {
chdir($arg) or die $!;
opendir(my $dir, ".") or die "cannot read dir '$arg': $!\n";
my $sum = 0;
while( my $_ = readdir($dir)) {
next if $_ eq "." || $_ eq "..";
if( -d $_ && ! -l $_) {
$sum += (stat($_))[7];
$sum += dirsize($_);
} else {
$sum += (stat($_))[7];
}
}
closedir($dir) or die "$!\n";
chdir($olddir);
return $sum;
} else { # safeguard if called with something not a dir
return (stat($arg))[7];
}
}

foreach my $arg (@ARGV) {
print dirsize($arg), "\t$arg\n";
}
====

Above is just to showcase what's to be done, it won't follow any
symlinks but at least avoids loops (c.f. /boot/boot ;) And it ignores
hardlinks, i.e. counts them each time.

Now, how fast is the above? In short: it isn't, as perl's stat seems
to do a lot of string ops and memory allocation etc. while doing stat,
which is a lot more than is needed to get the size. After calling the
script/du a couple of times to get the metadata into the FS caches:

$ time ./dirsize.pl .
8468228838 .

real 0m0.842s
user 0m0.152s
sys 0m0.224s
$ time du -bs .
8468242394 .

real 0m0.058s
user 0m0.012s
sys 0m0.048s

BTW: if you look at an 'ltrace -S du -bs .' for a dir with one or two
files, you'll see that basically du does what above perlscript does,
just more directly ;)

That perlscript is that slow in directories with lots of
subdirectories, in other cases it's almost as fast as 'du -bs'. Not
idea why, that cwd + -d + chdir + chdir shouldn't be that slow.
Probably calling the sub ...

# time du -bs /sbin/
13422137 /sbin/
real 0m0.041s

# time /home/dh/tmp/dirsize.pl /SUSE_R/sbin/
50474451 /SUSE_R/sbin/
real 0m0.079s

/SUSE_R is a rsync'ed copy of /, and neither /sbin's nor /usr's were
in the FS-Cache ATM (only part of /sbin and /usr probably), the size
differences is because of the hardlinks (in /sbin: 295 names for 284
files).

Bottomline: there almost can't be a faster tool than 'du'. How fast du
is will basically _only_ depend on how fast your FS is (and the medium
the FS is on).

All(!)[1] other tools will be slower, as they have to do the same
stuff as du does, _and_ put some fluff around (like a GUI as kdirstat
or whotsitsname does ;)

HTH,
-dnh

[1] except maybe a minimized du that does nothing but look at sizes,
much like the perlscript above, du can do a bit more and I don't
know how efficient it does if it's "just sizes".

--
The purpose of a windowing system is to put some amusing fluff
around your one almighty emacs window. -- Mark on gnu.emacs.help
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx

< Previous Next >
Follow Ups
References