[opensuse] Best filesystem type for HUGE directories?
All, I have a Windows based app we run at our office. It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory. I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server. If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario. Thanks Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Thursday 2008-01-03 at 13:00 -0500, Greg Freemyer wrote:
I have a Windows based app we run at our office.
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server.
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
Reiserfs will be very happy with millions of files in a single directory. You can try it yourself, creating and deleting such files with a script and timing the operation: I did so myself to verify. Like: $DONDE=/Somewhere time for X in `seq 1 1000`; do for Y in `seq 1 1000`; do dd if=/dev/zero of=$DONDE/Zero_$X"_"$Y bs=1k count=1 2> /dev/null done echo $X thousands done (You can add timings in there) However, you will be mounting it over samba, and that is something I can't comment on, but I have my doubts. You should do that verification with a batch file in windows creating that million files. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHfTGutTMYHG2NR9URApgKAKCQigmX95Ax88h2y3xuoh36NgvdOgCeOL6e Ge72Yb4EaZiMXSr0b4xfhZ0= =AwLy -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
Reiserfs will be very happy with millions of files in a single directory. Yes, Reiserfs is the best choice. Its organized on a b-tree (very fast) and
On Thursday 03 January 2008 13:04, Carlos E. R. wrote: the leaves take up only as much diskspace as the file needs (unlike file systems that allocate predetermined block sizes regardless of file size). -- Kind regards, M Harris <>< -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Jan 3, 2008 2:04 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The Thursday 2008-01-03 at 13:00 -0500, Greg Freemyer wrote:
I have a Windows based app we run at our office.
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server.
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
Reiserfs will be very happy with millions of files in a single directory. You can try it yourself, creating and deleting such files with a script and timing the operation: I did so myself to verify.
Like:
$DONDE=/Somewhere time for X in `seq 1 1000`; do for Y in `seq 1 1000`; do dd if=/dev/zero of=$DONDE/Zero_$X"_"$Y bs=1k count=1 2> /dev/null done echo $X thousands done
(You can add timings in there)
Testing now on a native reiser. I'll play with samba tomorrow. FYI: I guess that extra $ was to make sure I knew what I was doing. I copied the above a little to literally the first time and created a few hundred /Zero* files. Easy enough to delete. Glad you gave them a nice easy to identify name. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Thursday 2008-01-03 at 17:48 -0500, Greg Freemyer wrote:
On Jan 3, 2008 2:04 PM, Carlos E. R. <> wrote:
Reiserfs will be very happy with millions of files in a single directory. You can try it yourself, creating and deleting such files with a script and timing the operation: I did so myself to verify.
Like:
$DONDE=/Somewhere time for X in `seq 1 1000`; do for Y in `seq 1 1000`; do dd if=/dev/zero of=$DONDE/Zero_$X"_"$Y bs=1k count=1 2> /dev/null done echo $X thousands done
(You can add timings in there)
Testing now on a native reiser. I'll play with samba tomorrow.
FYI: I guess that extra $ was to make sure I knew what I was doing. I copied the above a little to literally the first time and created a few hundred /Zero* files. Easy enough to delete. Glad you gave them a nice easy to identify name.
:-) I should have written some comments, I guess O:-) $DONDE is a variable; the first line defined it to be "/Somewhere". The idea was to change it there to an appropriate path for your system. I also deleted my timing commands, which I thought might be confusing. Another detail: in linux, the system keeps tracks of access time (atime), which means something has to written each time the directory or file is accessed, slowing the throughput. I always disable atime in the mount command, I have no use for it and the disk is faster; for instance: LABEL=160_xtr /xtr reiserfs acl,user_xattr,noatime,nodiratime 1 2 - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFHfZdstTMYHG2NR9URAn3cAJ9XB+Cwzb+EuyCE1+Q1+QXQ5mtrgwCgiwS9 Y+sEq721YVwLEguMJUVtMoc= =EFUN -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
As far as filesystem performance in your given scenario, reiserfs is likely the best choice, as others have mentioned. For maximum performance, mount that filesystem without acls or xattrs, use the "-notail" mount option, and most importantly, use the "-noatime" mount option. Joe -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi :) El Thursday 03 January 2008, Greg Freemyer escribió:
All,
I have a Windows based app we run at our office.
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server.
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
We've got customers with over millions (yes, millions) of files in each directory (XFS in these cases). It works like a charm. But ... I do not recommend directories with over 10 thousand files for Windows. We've seen Windows very limited when it has to list a directory with over 10 thousand files, no matter what filesystem you are using on the Samba server. You can try locally and see the same thing happens: 1.- create a directory on your Windows machine 2.- populate it with +10000 files 3.- try to browse it 4.- Good luck ;) Rafa -- "We cannot treat computers as Humans. Computers need love." rgriman@skype.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Jan 4, 2008 4:10 AM, Rafa Grimán <rafagriman@gmail.com> wrote:
Hi :)
El Thursday 03 January 2008, Greg Freemyer escribió:
All,
I have a Windows based app we run at our office.
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server.
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
We've got customers with over millions (yes, millions) of files in each directory (XFS in these cases). It works like a charm.
But ... I do not recommend directories with over 10 thousand files for Windows. We've seen Windows very limited when it has to list a directory with over 10 thousand files, no matter what filesystem you are using on the Samba server.
You can try locally and see the same thing happens: 1.- create a directory on your Windows machine 2.- populate it with +10000 files 3.- try to browse it 4.- Good luck ;)
Rafa
I wish I had more control to keep the count down, but I have a Windows 3rd party app that will TIFF a PST file (in total). We need to do that fairly regularly, and a single PST can generate a milllion+ TIFFs on occasion. When this happen we see our speeds drop drastically (as you describe) because all those TIFFs are in one big dir. If you work from the CMD prompt you can at least move around the drive. If you're using the explorer you can get stuck for hours at a time just because you clicked in the wrong place. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi :) El Friday 04 January 2008, Greg Freemyer escribió:
On Jan 4, 2008 4:10 AM, Rafa Grimán <rafagriman@gmail.com> wrote:
Hi :)
El Thursday 03 January 2008, Greg Freemyer escribió:
All,
I have a Windows based app we run at our office.
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
I'm thinking of sitting up a dedicated Samba Server to serve just the data drive out to this windows server.
If I did that, what would be the best choice of filesystem? ReiserFS? I know it has been optimized for lots of small files, but I'm not sure about the couple million in one directory scenario.
We've got customers with over millions (yes, millions) of files in each directory (XFS in these cases). It works like a charm.
But ... I do not recommend directories with over 10 thousand files for Windows. We've seen Windows very limited when it has to list a directory with over 10 thousand files, no matter what filesystem you are using on the Samba server.
You can try locally and see the same thing happens: 1.- create a directory on your Windows machine 2.- populate it with +10000 files 3.- try to browse it 4.- Good luck ;)
Rafa
I wish I had more control to keep the count down, but I have a Windows 3rd party app that will TIFF a PST file (in total).
We need to do that fairly regularly, and a single PST can generate a milllion+ TIFFs on occasion. When this happen we see our speeds drop drastically (as you describe) because all those TIFFs are in one big dir. If you work from the CMD prompt you can at least move around the drive. If you're using the explorer you can get stuck for hours at a time just because you clicked in the wrong place.
We have the same problem you describe with a customer here :( Can you use scripting? Can you talk to the ISV to modify the app? In our case ... there's nothing to do because the ISV doesn't want to modify the app. And we can't use scripting because there's an MS-SQL Server that stores the files path to the SMB/Linux server. Rafa -- "We cannot treat computers as Humans. Computers need love." rgriman@skype.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
It sometimes creates directories with literally millions of small files in one directory. Using a local drive with NTFS it is taking hours to do simple things in that directory.
Hello: Once I untarred the freedb database (has zillions of small files in a few directories only; the compressed file's size is about 350-400 MB) onto an ext3 partiton - it took more than 24 hours to finish the job. Then I tried the same on ReiserFS and it took only about 5-10 minutes to uncompress the files and create the directory structure. So ReiserFS seems to be a good choice. IG ________________________________________________________ A karácsony elmúlt…Ragaszd albumba a pillanatot! 50 db kép csak 1299 Ft! http://www.fotokidolgozas.origo.hu -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (6)
-
Carlos E. R.
-
Greg Freemyer
-
Istvan Gabor
-
Joe Sloan
-
M Harris
-
Rafa Grimán