[opensuse] faster way to get total dir size besides du?
All, Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space. Any favorites here? -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, 6 Mar 2012 07:01:31 David C. Rankin wrote:
All,
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Any favorites here?
Filelight? -- =================================================== Rodney Baker VK5ZTV rodney.baker@iinet.net.au =================================================== -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, 06 Mar 2012 04:13:55 +0530, Rodney Baker
On Tue, 6 Mar 2012 07:01:31 David C. Rankin wrote:
All,
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Any favorites here?
Filelight?
in my experience filelight takes about as long as "du" to show the total size of a directory including sub.s. i think the only way to speed this up significantly would be to cache the information at certain intervals when CPU load is low. i'm not aware of any application that does that. -- phani -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 06/03/12 12:26, phanisvara das wrote:
On Tue, 06 Mar 2012 04:13:55 +0530, Rodney Baker
wrote: On Tue, 6 Mar 2012 07:01:31 David C. Rankin wrote:
All,
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Any favorites here?
Filelight?
in my experience filelight takes about as long as "du" to show the total size of a directory including sub.s. i think the only way to speed this up significantly would be to cache the information at certain intervals when CPU load is low. i'm not aware of any application that does that.
Dolphin. Right click on a directory and read off how many bytes it using in Properties. BC -- The vulgar crowd always is taken by appearances, and the world consists chiefly of the vulgar. Niccolo Machiavelli -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-03-05 21:31, David C. Rankin wrote:
Any favorites here?
I use baobab, but it is not fast. Calculating dir sizes is slow in Linux, I don't know why. - -- Cheers / Saludos, Carlos E. R. (from 11.4 x86_64 "Celadon" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk9VZ8QACgkQIvFNjefEBxov/ACcCS0wuRtW4h2ZUVMqL7ZRomI+ EFsAoIn1nJbM6xGty56RQtNPj3ZjjjzG =/p9k -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Mon, 05 Mar 2012, David C. Rankin wrote:
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Simple: du. That operation is I/O bound and depends _a lot_ on the filesystem. With reiserfs, btrfs it'll be quite fast I guess, with ext2 and ext3, it's rather slow. Some facts: - AFAIK no filesystem I know of stores the sizes of a directory's contents somewhere in the directory's metadata - therefore one has to walk the directories and have a look at the contained files and subfolders to sum up the sizes (i.e. call 'stat(2)' on the file and subdirs). Here's a simplistic recursive perl version: ==== dirsize.pl ==== #!/usr/bin/perl use warnings; use strict; use Cwd; unless(scalar @ARGV) { $0 =~ m{.*?([^/]+$)}; print STDERR "Usage: $1 dir [dir ...]\n"; } sub dirsize { my $arg = shift; my $olddir = cwd(); if( -d $arg && ! -l $arg ) { chdir($arg) or die $!; opendir(my $dir, ".") or die "cannot read dir '$arg': $!\n"; my $sum = 0; while( my $_ = readdir($dir)) { next if $_ eq "." || $_ eq ".."; if( -d $_ && ! -l $_) { $sum += (stat($_))[7]; $sum += dirsize($_); } else { $sum += (stat($_))[7]; } } closedir($dir) or die "$!\n"; chdir($olddir); return $sum; } else { # safeguard if called with something not a dir return (stat($arg))[7]; } } foreach my $arg (@ARGV) { print dirsize($arg), "\t$arg\n"; } ==== Above is just to showcase what's to be done, it won't follow any symlinks but at least avoids loops (c.f. /boot/boot ;) And it ignores hardlinks, i.e. counts them each time. Now, how fast is the above? In short: it isn't, as perl's stat seems to do a lot of string ops and memory allocation etc. while doing stat, which is a lot more than is needed to get the size. After calling the script/du a couple of times to get the metadata into the FS caches: $ time ./dirsize.pl . 8468228838 . real 0m0.842s user 0m0.152s sys 0m0.224s $ time du -bs . 8468242394 . real 0m0.058s user 0m0.012s sys 0m0.048s BTW: if you look at an 'ltrace -S du -bs .' for a dir with one or two files, you'll see that basically du does what above perlscript does, just more directly ;) That perlscript is that slow in directories with lots of subdirectories, in other cases it's almost as fast as 'du -bs'. Not idea why, that cwd + -d + chdir + chdir shouldn't be that slow. Probably calling the sub ... # time du -bs /sbin/ 13422137 /sbin/ real 0m0.041s # time /home/dh/tmp/dirsize.pl /SUSE_R/sbin/ 50474451 /SUSE_R/sbin/ real 0m0.079s /SUSE_R is a rsync'ed copy of /, and neither /sbin's nor /usr's were in the FS-Cache ATM (only part of /sbin and /usr probably), the size differences is because of the hardlinks (in /sbin: 295 names for 284 files). Bottomline: there almost can't be a faster tool than 'du'. How fast du is will basically _only_ depend on how fast your FS is (and the medium the FS is on). All(!)[1] other tools will be slower, as they have to do the same stuff as du does, _and_ put some fluff around (like a GUI as kdirstat or whotsitsname does ;) HTH, -dnh [1] except maybe a minimized du that does nothing but look at sizes, much like the perlscript above, du can do a bit more and I don't know how efficient it does if it's "just sizes". -- The purpose of a windowing system is to put some amusing fluff around your one almighty emacs window. -- Mark on gnu.emacs.help -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-03-06 04:07, David Haller wrote:
- AFAIK no filesystem I know of stores the sizes of a directory's contents somewhere in the directory's metadata
FAT does, kind of. The directory is a record that contains file names, attributes, sizes, and starting record. You just have to load the directory record and sum the sizes of all files: one single read one disc record operation (I don't remember how many files per record). If the directory is big, then there have to be more reads. The operation is very fast. If there are subdirectories, then it is slower (recursive calls). - -- Cheers / Saludos, Carlos E. R. (from 11.4 x86_64 "Celadon" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk9Vhq0ACgkQIvFNjefEBxqJNgCgq4scJxVr9DgHHBe2hbLlpnuE ew8AnipYINIfIeFIHF3zigjxfcY5oAz3 =QWO4 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Tue, 06 Mar 2012 04:38:21 +0100, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2012-03-06 04:07, David Haller wrote:
- AFAIK no filesystem I know of stores the sizes of a directory's contents somewhere in the directory's metadata
FAT does, kind of. The directory is a record that contains file names, attributes, sizes, and starting record. You just have to load the directory record and sum the sizes of all files: one single read one disc record operation (I don't remember how many files per record). If the directory is big, then there have to be more reads. The operation is very fast. If there are subdirectories, then it is slower (recursive calls).
That's different than storing the size of the contents in the directory entry's metadata - the summing of each subordinate file is what takes the time. What David is trying to say is that no filesystem that he knows of (nor that I know of for that matter) stores the total of the subordinate objects. ie, for a structure of: / /usr /usr/file1 /usr/file2 /usr/file3 /usr doesn't store the sum of the sizes of file1, file2, and file3. It has to be calculated on the fly. Jim -- Jim Henderson Please keep on-topic replies on the list so everyone benefits -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Tue, 06 Mar 2012, Jim Henderson wrote:
On Tue, 06 Mar 2012 04:38:21 +0100, Carlos E. R. wrote:
On 2012-03-06 04:07, David Haller wrote:
- AFAIK no filesystem I know of stores the sizes of a directory's contents somewhere in the directory's metadata
FAT does, kind of. The directory is a record that contains file names, attributes, sizes, and starting record. You just have to load the directory record and sum the sizes of all files: one single read one disc record operation (I don't remember how many files per record). If the directory is big, then there have to be more reads. The operation is very fast. If there are subdirectories, then it is slower (recursive calls).
That's different than storing the size of the contents in the directory entry's metadata - the summing of each subordinate file is what takes the time. What David is trying to say is that no filesystem that he knows of (nor that I know of for that matter) stores the total of the subordinate objects.
More specifically: I know of no FS that any sane person would want to have a Linux' e.g. / on. I wouldn't put it beyond the guys in Redmond or Cupertino do devise and even implement such a thing. Hm. WinFS probably would have stored such metadata ;) I wonder why M$ has cancelled it.
ie, for a structure of:
/ /usr /usr/file1 /usr/file2 /usr/file3
/usr doesn't store the sum of the sizes of file1, file2, and file3. It has to be calculated on the fly.
What are we actually talking about? What would have to be stored and _updated_ at any time a file is created or changed in size somewhere below a dir? The sum of the size of all files in a directory (plus the size of metadata?) and of all directories "above" the file would have to be updated. While following symlinks? And mountpoints? To remotely mounted FSen? Even with reiserfs/btrfs (which are AFAIK the fastest concerning metadata operations) that'd be stupid. If you don't think so, you haven't realized how much stuff is going on with the FS. Just think about /var/log/messages. That'd cause a continuous flood of updates to /var/log, /var and /. So, consider a simple single "write(2)" to the syslog, instead of that one syscall and I/O you'd have at least: - write to /v/l/m - open /v/l - write to /v/l - close /v/l - open /v - write to /v - close /v - open / - write to / - close / (ok, you could keep those dirs to /v/l/m open, but you can't do that with all dirs). Or do that stuff "in-kernel", that'd save a huge lot of time, but still, it's obvious that it'd impact FS performance tremendously, as you'd still have at least 4 instead of one I/O op. And I/O takes a lot of time, even with SSDs (assuming you don't care about the writes on the assumed SSD), and even "buffered". That stuff has to be written at some time ... Not convinced? Install GNU ddrescue[0]. Put a CSS protected DVD into your DVD-drive. Start a 'tail -f /var/log/messages' in a root xterm. Start 'ddrescue /dev/sr0 foo.iso' in another xterm and be prepared to hit Ctrl-C fast! Think of anything other flooding the logs. Now, do you still want to update /, /v and /v/l with every write to /v/l/m? And, seriously, how often does the user/root need the size of a dir? Once a month? And does the "frequency of the need" not correspond to the size of the dir(s) involved? I have no problem whatsoever with 'du -hs ~/*' running for 4mins (with subsequent calls being much faster, and actually, it's just ~3.4s, tons of stuff symlinked out[1]), every 2 years or so ;) So, IMO, keeping dirsizes in dirs is, in a word: Dainbramaged. And that's probably the reason why that "feature" is "missing". And that's a good thing. HTH even though I was guessing quite a bit, -dnh [0] you should IMO always have both GNU ddrescue and Kurt Garloff's dd_rescue installed! [1] there's one symlink with locally a guesstimated >= 13TB lurking behind it, and when the NFS mounts are up from the other "fileserver" box there's another ~8TB symlinked in ;) I keep an index (filenames with relative path and sizes in bytes and MiBi) of that. Updating the local index with find takes ~5m40s. I don't care. I start the update about daily and let it run while doing other stuff like writing this mail. ;) -- I still maintain the point that designing a monolithic kernel in 1991 is a fundamental error. Be thankful you are not my student. You would not get a high grade for such a design. -- Andrew Tanenbaum to Linus Torvalds -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
El 05/03/12 17:31, David C. Rankin escribió: , du is just too
slow.
It has to be, unless there is a bug where it is stat()'ing more than needed. Fastest hard disk and filesystems are the only real solution most of the time. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-03-06 20:48, Cristian Rodríguez wrote:
Fastest hard disk and filesystems are the only real solution most of the time.
Storing all inodes in memory. :-P - -- Cheers / Saludos, Carlos E. R. (from 11.4 x86_64 "Celadon" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk9WhPwACgkQIvFNjefEBxpV3ACgvl9glAD8+cIgwS7DcXxo442i cN8AnRHwI/sNahj5PBMWJRsT6ddI5J/3 =EWAR -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 03/05/2012 09:31 PM, David C. Rankin wrote:
All,
Does anyone have a favorite way to get the total directory size besides running 'du -hcs'? With directories with many files, du is just too slow. Even a quick rough hack that is within 10% would be fine for taking a quick look to find which directory is hogging all the space.
Any favorites here?
The following doesn't sum up the directory totals, but to find big files, you may want to try find, e.g. find . -size +20M -ls Have a nice day, Berny -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (9)
-
Basil Chupin
-
Bernhard Voelker
-
Carlos E. R.
-
Cristian Rodríguez
-
David C. Rankin
-
David Haller
-
Jim Henderson
-
phanisvara das
-
Rodney Baker