Mailinglist Archive: opensuse (1445 mails)

< Previous Next >
Re: [opensuse] Re: faster way to get total dir size besides du?
Hello,

On Tue, 06 Mar 2012, Jim Henderson wrote:
On Tue, 06 Mar 2012 04:38:21 +0100, Carlos E. R. wrote:
On 2012-03-06 04:07, David Haller wrote:
- AFAIK no filesystem I know of stores the sizes of a directory's
contents somewhere in the directory's metadata

FAT does, kind of. The directory is a record that contains file names,
attributes, sizes, and starting record. You just have to load the
directory record and sum the sizes of all files: one single read one
disc record operation (I don't remember how many files per record). If
the directory is big, then there have to be more reads. The operation is
very fast. If there are subdirectories, then it is slower (recursive
calls).

That's different than storing the size of the contents in the directory
entry's metadata - the summing of each subordinate file is what takes the
time. What David is trying to say is that no filesystem that he knows of
(nor that I know of for that matter) stores the total of the subordinate
objects.

More specifically: I know of no FS that any sane person would want to
have a Linux' e.g. / on. I wouldn't put it beyond the guys in Redmond
or Cupertino do devise and even implement such a thing.

Hm. WinFS probably would have stored such metadata ;) I wonder why M$
has cancelled it.

ie, for a structure of:

/
/usr
/usr/file1
/usr/file2
/usr/file3

/usr doesn't store the sum of the sizes of file1, file2, and file3. It
has to be calculated on the fly.

What are we actually talking about? What would have to be stored and
_updated_ at any time a file is created or changed in size somewhere
below a dir?

The sum of the size of all files in a directory (plus the size of
metadata?) and of all directories "above" the file would have to be
updated. While following symlinks? And mountpoints? To remotely
mounted FSen?

Even with reiserfs/btrfs (which are AFAIK the fastest concerning
metadata operations) that'd be stupid. If you don't think so, you
haven't realized how much stuff is going on with the FS. Just think
about /var/log/messages. That'd cause a continuous flood of updates to
/var/log, /var and /. So, consider a simple single "write(2)" to the
syslog, instead of that one syscall and I/O you'd have at least:

- write to /v/l/m
- open /v/l
- write to /v/l
- close /v/l
- open /v
- write to /v
- close /v
- open /
- write to /
- close /

(ok, you could keep those dirs to /v/l/m open, but you can't do that
with all dirs). Or do that stuff "in-kernel", that'd save a huge lot
of time, but still, it's obvious that it'd impact FS performance
tremendously, as you'd still have at least 4 instead of one I/O op.
And I/O takes a lot of time, even with SSDs (assuming you don't care
about the writes on the assumed SSD), and even "buffered". That stuff
has to be written at some time ...

Not convinced? Install GNU ddrescue[0]. Put a CSS protected DVD into
your DVD-drive. Start a 'tail -f /var/log/messages' in a root
xterm. Start 'ddrescue /dev/sr0 foo.iso' in another xterm and be
prepared to hit Ctrl-C fast! Think of anything other flooding the
logs. Now, do you still want to update /, /v and /v/l with every write
to /v/l/m?

And, seriously, how often does the user/root need the size of a dir?

Once a month? And does the "frequency of the need" not correspond to
the size of the dir(s) involved? I have no problem whatsoever with 'du
-hs ~/*' running for 4mins (with subsequent calls being much faster,
and actually, it's just ~3.4s, tons of stuff symlinked out[1]), every
2 years or so ;)

So, IMO, keeping dirsizes in dirs is, in a word: Dainbramaged.

And that's probably the reason why that "feature" is "missing".

And that's a good thing.

HTH even though I was guessing quite a bit,
-dnh

[0] you should IMO always have both GNU ddrescue and Kurt Garloff's
dd_rescue installed!

[1] there's one symlink with locally a guesstimated >= 13TB lurking
behind it, and when the NFS mounts are up from the other
"fileserver" box there's another ~8TB symlinked in ;) I keep an
index (filenames with relative path and sizes in bytes and MiBi)
of that. Updating the local index with find takes ~5m40s. I don't
care. I start the update about daily and let it run while doing
other stuff like writing this mail. ;)

--
I still maintain the point that designing a monolithic kernel in
1991 is a fundamental error. Be thankful you are not my student.
You would not get a high grade for such a design.
-- Andrew Tanenbaum to Linus Torvalds
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse+owner@xxxxxxxxxxxx

< Previous Next >