Hello, On Mon, 05 Mar 2012, David C. Rankin wrote:
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Simple: du. That operation is I/O bound and depends _a lot_ on the filesystem. With reiserfs, btrfs it'll be quite fast I guess, with ext2 and ext3, it's rather slow. Some facts: - AFAIK no filesystem I know of stores the sizes of a directory's contents somewhere in the directory's metadata - therefore one has to walk the directories and have a look at the contained files and subfolders to sum up the sizes (i.e. call 'stat(2)' on the file and subdirs). Here's a simplistic recursive perl version: ==== dirsize.pl ==== #!/usr/bin/perl use warnings; use strict; use Cwd; unless(scalar @ARGV) { $0 =~ m{.*?([^/]+$)}; print STDERR "Usage: $1 dir [dir ...]\n"; } sub dirsize { my $arg = shift; my $olddir = cwd(); if( -d $arg && ! -l $arg ) { chdir($arg) or die $!; opendir(my $dir, ".") or die "cannot read dir '$arg': $!\n"; my $sum = 0; while( my $_ = readdir($dir)) { next if $_ eq "." || $_ eq ".."; if( -d $_ && ! -l $_) { $sum += (stat($_))[7]; $sum += dirsize($_); } else { $sum += (stat($_))[7]; } } closedir($dir) or die "$!\n"; chdir($olddir); return $sum; } else { # safeguard if called with something not a dir return (stat($arg))[7]; } } foreach my $arg (@ARGV) { print dirsize($arg), "\t$arg\n"; } ==== Above is just to showcase what's to be done, it won't follow any symlinks but at least avoids loops (c.f. /boot/boot ;) And it ignores hardlinks, i.e. counts them each time. Now, how fast is the above? In short: it isn't, as perl's stat seems to do a lot of string ops and memory allocation etc. while doing stat, which is a lot more than is needed to get the size. After calling the script/du a couple of times to get the metadata into the FS caches: $ time ./dirsize.pl . 8468228838 . real 0m0.842s user 0m0.152s sys 0m0.224s $ time du -bs . 8468242394 . real 0m0.058s user 0m0.012s sys 0m0.048s BTW: if you look at an 'ltrace -S du -bs .' for a dir with one or two files, you'll see that basically du does what above perlscript does, just more directly ;) That perlscript is that slow in directories with lots of subdirectories, in other cases it's almost as fast as 'du -bs'. Not idea why, that cwd + -d + chdir + chdir shouldn't be that slow. Probably calling the sub ... # time du -bs /sbin/ 13422137 /sbin/ real 0m0.041s # time /home/dh/tmp/dirsize.pl /SUSE_R/sbin/ 50474451 /SUSE_R/sbin/ real 0m0.079s /SUSE_R is a rsync'ed copy of /, and neither /sbin's nor /usr's were in the FS-Cache ATM (only part of /sbin and /usr probably), the size differences is because of the hardlinks (in /sbin: 295 names for 284 files). Bottomline: there almost can't be a faster tool than 'du'. How fast du is will basically _only_ depend on how fast your FS is (and the medium the FS is on). All(!)[1] other tools will be slower, as they have to do the same stuff as du does, _and_ put some fluff around (like a GUI as kdirstat or whotsitsname does ;) HTH, -dnh [1] except maybe a minimized du that does nothing but look at sizes, much like the perlscript above, du can do a bit more and I don't know how efficient it does if it's "just sizes". -- The purpose of a windowing system is to put some amusing fluff around your one almighty emacs window. -- Mark on gnu.emacs.help -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org