[opensuse] Is here a file size statistics tool?
Rather than hack perl, I wonder if there is already a tool that will give me a profile of the sizes of files on a file system or tree? I'm looking at experimenting with other file systems like btrfs, xfs and the like to see how they perform with various fixes of files - videos, icons, mail boxes, PDFs, development (source and object fragments) According to what I read, late model FS has either sensitivity to extents or can be configured. But presumably his isn't a J Random filesize thing. I did find this .. http://www.cs.vu.nl/~ast/publications/osr-jan-2006.pdf -- The great tragedy of Science - the slaying of a beautiful hypothesis by an ugly fact. Thomas H. Huxley -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 11 of July 2010, Anton Aylward wrote:
Rather than hack perl, I wonder if there is already a tool that will give me a profile of the sizes of files on a file system or tree?
Filelight (KDE4) does exactly this. It is available from the KDE:Community repository and is a little unstable, although it should get the work done.
I'm looking at experimenting with other file systems like btrfs, xfs and the like to see how they perform with various fixes of files - videos, icons, mail boxes, PDFs, development (source and object fragments)
Perhaps you could post the results here? I'm certain quite a few would be interested. Regards, Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
auxsvr@gmail.com said the following on 07/12/2010 06:47 AM:
On Sunday 11 of July 2010, Anton Aylward wrote:
Rather than hack perl, I wonder if there is already a tool that will give me a profile of the sizes of files on a file system or tree?
Filelight (KDE4) does exactly this. It is available from the KDE:Community repository and is a little unstable, although it should get the work done.
Sorry, no. It tells me what space is used in what directory, which is NOT what I want. Have you see kdirstat? That's more informative but still not what I want. What I want is something that give me a distribution - table by range or table by files below a given size - of the file sizes, not in a single directory but in a tree. For example; Look at /usr/share/icons Now you'd expect icon files to be small. Well some are. But some SVGs get to be over 64K. Many are below 4K and lots are below 1K. How does this compare with what's on the rest of /usr/share or even /usr? I have a "Media" folder - are the movies worth putting on a XFS? The pictures? What about raw downloads from my camera? Yes, things like ReiserFS and EXT seem very resilient as "general purpose", but could I optimise the 'static' things like those icons? Only a few hundred meg? Is it worth it? Probably not. But then what about the movies I download from my camera? My photo galleries?
I'm looking at experimenting with other file systems like btrfs, xfs and the like to see how they perform with various fixes of files - videos, icons, mail boxes, PDFs, development (source and object fragments)
Perhaps you could post the results here? I'm certain quite a few would be interested.
I'd be happy to, but don't expect the same level of detail as you'll find at www.phoronix.com. I you want solid numbers look at their test suite http://www.phoronix-test-suite.com/ and results http://www.phoronix.com/scan.php?page=article&item=linux2635_btrfs&num=1 I was asking about, the paper I referred to in my original post was talking abut, distribution of file sizes. The paper discussed block size and file fragmentation. With modern file systems that can 'stuff' more than one file in logical extent (?block?) the fragmentation isn't so much an issue, but file system coherency, especially after many deletes/edits, is another matter. Still, if I have my "media" FS, should I use XFS? Should I have a different file system for /use/share/icons where the SVGs are mainly under 64K (many even under 1K) and optimise it for those smaller files? What would be the best file system type for my backing databases - MySQL, SQLite and the simple SleepyCat Berkeley DB (so what do YOU use for LDAP, then?) -- Enter any 11-digit prime number to continue. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 12 of July 2010, Anton Aylward wrote:
What I want is something that give me a distribution - table by range or table by files below a given size - of the file sizes, not in a single directory but in a tree.
In R (package R-base) try the following: strtoi(system("find DIR -type f -exec du {} \\; | awk '{print $1}'", intern=T)) -> fs_sizes barplot(table(cut(fs_sizes, breaks=c(0,2^(1:27)) ))) (replace DIR with the directory you want) to display a bar plot with the categories corresponding to the ones in the article you link to. It is trivial to do further statistical analysis in R. Regards, Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 12 of July 2010, auxsvr@gmail.com wrote:
On Monday 12 of July 2010, Anton Aylward wrote:
What I want is something that give me a distribution - table by range or table by files below a given size - of the file sizes, not in a single directory but in a tree.
In R (package R-base) try the following:
strtoi(system("find DIR -type f -exec du {} \\; | awk '{print $1}'", intern=T)) -> fs_sizes barplot(table(cut(fs_sizes, breaks=c(0,2^(1:27)) )))
(replace DIR with the directory you want) to display a bar plot with the categories corresponding to the ones in the article you link to. It is trivial to do further statistical analysis in R.
Regards, Peter
I posted too soon. Here's the version that displays byte count instead of block count: strtoi(system("find DIR -type f -exec du -b {} \\; | awk '{print $1}'", intern=T)) -> fs_sizes This may overestimate disk usage of sparse files (man du). Regards, Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello, On Mon, 12 Jul 2010, auxsvr@gmail.com wrote:
strtoi(system("find DIR -type f -exec du {} \\; | awk '{print $1}'",
useless use of du, useless use of awk. find DIR -type f -printf '%s\n' HTH, -dnh -- Wash: "[..] this landings is gonna get pretty interesting" Mal: "Define 'interesting.'" Wash: "Oh, God, oh, God, we're all gonna die?" -- Firefly Serenity -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Monday 12 of July 2010, David Haller wrote:
Hello,
On Mon, 12 Jul 2010, auxsvr@gmail.com wrote:
strtoi(system("find DIR -type f -exec du {} \\; | awk '{print $1}'",
useless use of du, useless use of awk.
find DIR -type f -printf '%s\n'
Update: as.numeric(system("find DIR -type f -printf '%s\n'", intern=T)) -> fs_sizes barplot(table(cut(fs_sizes, breaks=c(0,2^(1:27)) ))) in R.
HTH, -dnh
Thanks, Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Mon, 2010-07-12 at 08:07 -0400, Anton Aylward wrote:
auxsvr@gmail.com said the following on 07/12/2010 06:47 AM: For example; Look at /usr/share/icons Now you'd expect icon files to be small. Well some are. But some SVGs get to be over 64K. Many are below 4K and lots are below 1K. How does this compare with what's on the rest of /usr/share or even /usr? I have a "Media" folder - are the movies worth putting on a XFS?
No.
The pictures?
No.
What about raw downloads from my camera?
No.
Yes, things like ReiserFS and EXT seem very resilient as "general purpose", but could I optimise the 'static' things like those icons? Only a few hundred meg? Is it worth it? Probably not.
Correct.
But then what about the movies I download from my camera? My photo galleries?
No.
I'm looking at experimenting with other file systems like btrfs, xfs and the like to see how they perform with various fixes of files - videos, icons, mail boxes, PDFs, development (source and object fragments) Perhaps you could post the results here? I'm certain quite a few would be interested. I'd be happy to, but don't expect the same level of detail as you'll find at www.phoronix.com. I you want solid numbers look at their test suite http://www.phoronix-test-suite.com/ and results http://www.phoronix.com/scan.php?page=article&item=linux2635_btrfs&num=1 I was asking about, the paper I referred to in my original post was talking abut, distribution of file sizes. The paper discussed block size and file fragmentation. With modern file systems that can 'stuff' more than one file in logical extent (?block?) the fragmentation isn't so much an issue, but file system coherency, especially after many deletes/edits, is another matter. Still, if I have my "media" FS, should I use XFS? Should I have a different file system for /use/share/icons where the SVGs are mainly under 64K (many even under 1K) and optimise it for those smaller files? What would be the best file system type for my backing databases - MySQL, SQLite and the simple SleepyCat Berkeley DB (so what do YOU use for LDAP, then?)
I use ext3 for OpenLDAP. With modern systems there isn't really any point in using anything else [we used to be a primarily XFS shop]. At least not until btrfs arrives. Any [negligable] gain is easily negated by the added complexity of your setup. -- Adam Tauno Williams <awilliam@whitemice.org> LPIC-1, Novell CLA <http://www.whitemiceconsulting.com> OpenGroupware, Cyrus IMAPd, Postfix, OpenLDAP, Samba -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (4)
-
Adam Tauno Williams
-
Anton Aylward
-
auxsvr@gmail.com
-
David Haller