[opensuse-factory] btrfs still fails writing more than 15-TB

newer
[opensuse-factory] deleterequest...

Lew Wolfgang

16 Oct 2013 16 Oct '13

22:04

(its finished now) Hi Folks, I finally have a chance to work on my previously reported issue with btrfs where I can't write more than about 15-TB to a single partition. Here's the machine: Supermicro X9DRH mobo 2-ea Xeon E5-2643 64-GB RAM 1-ea LSI MegaRAID SAS 2208 RAID controller 2-ea SSD 120-GB drives configured as RAID-1 mirror 24-ea 2-TB SATA drives configured as two 11-disk RAID-6 arrays with 2-hotswaps So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations. 1. No more reiserfs! 2. The installer complains about using XFS on the boot partition 3. The installer complains if the boot partition is less than 12-MB (good!) But the installer won't let the install proceed past the final summary screen if I have multiple system partitions. It complains that there isn't enough room for the selected packages (default load). There is room, but it complains nevertheless. Here's the error: "Not enough disk space. Remove some packages in the Single Selection" The screen said that 3.2-GB was required, but there was more than 50-GB available. I got around the error by just having one swap and one root partition. I configured the root partition and one of the data arrays with btrfs. The second data array was formatted as XFS. The test uses dd to copy a 4-GB binary file in ~root on the SSD disk to the btrfs array that contains 17576753152 1K-blocks as reported by df. The copy is repeated 4,000 times. The process is then repeated for the XFS array. Failure: The btrfs process failed when creating file #3688 when the file length truncated. But it went on to create 6 more truncated files after the initial failure, as shown here: -rw-r--r-- 1 root root 4194304000 Oct 15 23:37 test3686 -rw-r--r-- 1 root root 4194304000 Oct 15 23:37 test3687 -rw-r--r-- 1 root root 938475520 Oct 15 23:38 test3688 -rw-r--r-- 1 root root 41943040 Oct 15 23:38 test3689 -rw-r--r-- 1 root root 114294784 Oct 15 23:38 test3690 -rw-r--r-- 1 root root 2097152 Oct 15 23:38 test3691 -rw-r--r-- 1 root root 2097152 Oct 15 23:38 test3697 -rw-r--r-- 1 root root 2097152 Oct 15 23:38 test3703 -rw-r--r-- 1 root root 8388608 Oct 15 23:39 test3819 df shows this: /dev/sdb1 17576753152 15140885368 2434798528 87% /export/data0 While dd reported: dd: failed to open test3818: No space left on device So this is a rather messy failure. But the XFS formatted filesystem worked as expected. /dev/sdc1 17574666240 16384034856 1190631384 94% /export/data1 Times for the two process are: btrfs real 365m4.381s user 0m6.292s sys 128m51.396s XFS real 282m6.769s user 0m10.008s sys 177m24.177s Thus XFS seems to be faster by a fair margin, even when writing more data. Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Show replies by date

Ladislav Slezak

17 Oct 17 Oct

07:25

Dne 17.10.2013 00:04, Lew Wolfgang napsal(a): [...]

...

"Not enough disk space. Remove some packages in the Single Selection"

The screen said that 3.2-GB was required, but there was more than 50-GB available.

It looks like a bug in the disk usage estimation code, please open a bug report and attach YaST logs. Ladislav -- Best Regards Ladislav Slezák Yast Developer ------------------------------------------------------------------------ SUSE LINUX, s.r.o. e-mail: lslezak@suse.cz Lihovarská 1060/12 tel: +420 284 028 960 190 00 Prague 9 fax: +420 284 028 951 Czech Republic http://www.suse.cz/ -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Bruno Friedmann

13:25

On Thursday 17 October 2013 09.25:05 Ladislav Slezak wrote:

...

Dne 17.10.2013 00:04, Lew Wolfgang napsal(a): [...]

...
"Not enough disk space. Remove some packages in the Single Selection"

The screen said that 3.2-GB was required, but there was more than 50-GB available.

It looks like a bug in the disk usage estimation code, please open a bug report and attach YaST logs.

Ladislav

also attach the output of btrfs filesystem df /your_mount_point also compared to your timeline test a snapper list output could help -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lars Müller

18 Oct 18 Oct

13:00

Hi, On Wed, Oct 16, 2013 at 03:04:52PM -0700, Lew Wolfgang wrote:

...

I finally have a chance to work on my previously reported issue with btrfs where I can't write more than about 15-TB to a single partition.

Here's the machine:

Supermicro X9DRH mobo 2-ea Xeon E5-2643 64-GB RAM 1-ea LSI MegaRAID SAS 2208 RAID controller 2-ea SSD 120-GB drives configured as RAID-1 mirror 24-ea 2-TB SATA drives configured as two 11-disk RAID-6 arrays with 2-hotswaps

So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs!

That's not true, you're not running kernel default, or you talk about the YaST partitioning module. lmuelle@hip:~> ls -l /lib/modules/$( uname -r)/kernel/fs/reiserfs/reiserfs.ko -rw-r--r-- 1 root root 394536 4. Okt 11:42 /lib/modules/3.11.3-1-default/kernel/fs/reiserfs/reiserfs.ko kernel-desktop-3.11.3-1.1.x86_64.rpm has reiserfs too: /lib/modules/3.11.3-1-desktop/kernel/fs/reiserfs/reiserfs.ko And as other stated dumping one huge report to this list with multiple issues included highers the risk things are get lost or missed. It's welcome to write a summary like you did but the separate issues need to be split into single defect reports. With a mail to the list you then are able to reference the individual bug IDs. Cheers, Lars -- Lars Müller [ˈlaː(r)z ˈmʏlɐ] Samba Team + SUSE Labs SUSE Linux, Maxfeldstraße 5, 90409 Nürnberg, Germany

Lew Wolfgang

13:55

On 10/18/2013 06:00 AM, Lars Müller wrote:

...

Hi,

...
I finally have a chance to work on my previously reported issue with btrfs where I can't write more than about 15-TB to a single partition.

Here's the machine:

Supermicro X9DRH mobo 2-ea Xeon E5-2643 64-GB RAM 1-ea LSI MegaRAID SAS 2208 RAID controller 2-ea SSD 120-GB drives configured as RAID-1 mirror 24-ea 2-TB SATA drives configured as two 11-disk RAID-6 arrays with 2-hotswaps

So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs! That's not true, you're not running kernel default, or you talk about

On Wed, Oct 16, 2013 at 03:04:52PM -0700, Lew Wolfgang wrote: the YaST partitioning module.

lmuelle@hip:~> ls -l /lib/modules/$( uname -r)/kernel/fs/reiserfs/reiserfs.ko -rw-r--r-- 1 root root 394536 4. Okt 11:42 /lib/modules/3.11.3-1-default/kernel/fs/reiserfs/reiserfs.ko

kernel-desktop-3.11.3-1.1.x86_64.rpm has reiserfs too: /lib/modules/3.11.3-1-desktop/kernel/fs/reiserfs/reiserfs.ko

And as other stated dumping one huge report to this list with multiple issues included highers the risk things are get lost or missed.

It's welcome to write a summary like you did but the separate issues need to be split into single defect reports. With a mail to the list you then are able to reference the individual bug IDs.

Hi Lars, Yes, I agree. This was just a quick thumb-nail look to see if the bug I filed in July on 12.3 (Id#828229) had been addressed in 13.1. It apparently hasn't. The lack of reiserfs was noticed in the partition module in the install process. Reiserfs was not presented as a choice. I figured that I could always install it later, but what's the use at that point? Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

19 Oct 19 Oct

23:51

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2013-10-18 at 06:55 -0700, Lew Wolfgang wrote:

...

The lack of reiserfs was noticed in the partition module in the install process. Reiserfs was not presented as a choice. I figured that I could always install it later, but what's the use at that point?

I had to add it using gparted. - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJjGwsACgkQtTMYHG2NR9XIYgCfcL/LSMZ5bhNsyWRtcvPvClbs RaEAn3L5j+02A8Z/72/UKlnw9MZm5DOt =8GRd -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Per Jessen

20 Oct 20 Oct

11:30

Lew Wolfgang wrote:

...

The lack of reiserfs was noticed in the partition module in the install process. Reiserfs was not presented as a choice. I figured that I could always install it later, but what's the use at that point?

If you want to install on to reiserfs, you can swap to a console and format the root filesystem manually, then let YaST mount it and install. That's how I do it with JFS. -- Per Jessen, Zürich (17.1°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

18 Oct 18 Oct

13:13

On Thursday 2013-10-17 00:04, Lew Wolfgang wrote:

...

So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs!

The package is still there.

...

2. The installer complains about using XFS on the boot partition

Which seems correct. XFS cannot be combined with PBR if you chose to use that. In addition, there is a track record of GRUB1 failing to load the stage2 file (possibly due to delayed allocation). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lew Wolfgang

14:27

On 10/18/2013 06:13 AM, Jan Engelhardt wrote:

...

On Thursday 2013-10-17 00:04, Lew Wolfgang wrote:

...
So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs! The package is still there.

It probably is, but the choice to use it did not present itself in the install partitioner. If you can't select it there, what are your options if you want your system partitions to be reiserfs?

...

...
2. The installer complains about using XFS on the boot partition Which seems correct. XFS cannot be combined with PBR if you chose to use that. In addition, there is a track record of GRUB1 failing to load the stage2 file (possibly due to delayed allocation).

After being burned by trying to use reiserfs on a bootable partition many years ago, I've taken to creating a small /boot partition formatted with ext2 to remove any potential issues. I wasn't able to do that this time because of the software capacity calculation issue I also reported. I don't mind writing these things up more thoroughly for Bugzilla, but I want to document more carefully and I only have console access to the system during the work week. Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

14:39

On Friday 2013-10-18 16:27, Lew Wolfgang wrote:

...

On 10/18/2013 06:13 AM, Jan Engelhardt wrote:

...
On Thursday 2013-10-17 00:04, Lew Wolfgang wrote:

...
So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs! The package is still there.

If you can't select it there, what are your options if you want your system partitions to be reiserfs?

Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lew Wolfgang

15:32

On 10/18/2013 07:39 AM, Jan Engelhardt wrote:

...

On Friday 2013-10-18 16:27, Lew Wolfgang wrote:

...
...
On Thursday 2013-10-17 00:04, Lew Wolfgang wrote:

...
So I loaded 13.1 beta1 and pulled a "zipper dup". Here are my initial observations.

1. No more reiserfs! The package is still there. If you can't select it there, what are your

On 10/18/2013 06:13 AM, Jan Engelhardt wrote: options if you want your system partitions to be reiserfs? Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade.

If something is working well, why change? We've been using reiserfs for system partitions and XFS for large RAID arrays. Works for us. Different projects have different requirements. We have one system with five RAID controllers running 216 3.5-inch SATA and SAS disks. One of the RAID-6 arrays is 70-TB net with one hotswap. This system is running 12.3 and is working perfectly. We thank everyone here for helping to produce such a remarkable operating system. Of course, we'll have to move from reiserfs at some point. Maybe this is the time? Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

16:22

On Friday 2013-10-18 17:32, Lew Wolfgang wrote:

...

...
Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade.

If something is working well, why change?

Oh, I see, you are in the "Denial" phase (cf. Five Stages of Grief). Because certain filesystems work "better" than "well", and the defaults _during installation_ go for the better. Simple as that. A reason why FAT is enlisted is not because we like it, but because there may be corner cases of interoperability with Windows. With a Linux fs like reiser3 however, the Windows argument is out the window.

...

Of course, we'll have to move from reiserfs at some point. Maybe this is the time?

(From parallel subthread:)

...

What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs?

<sarcasm> Maybe it is? If things work "well" with reiser3, and you see FAT being better than reiser3 because it is listed in the default installation and reiser3 is not, surely you would not mind storing your 70 TB of data on FAT (hypothetically speaking — if FAT could hold that amount of data, which it does not). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lew Wolfgang

16:55

On 10/18/2013 09:22 AM, Jan Engelhardt wrote:

...

On Friday 2013-10-18 17:32, Lew Wolfgang wrote:

...
...
Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade. If something is working well, why change? Oh, I see, you are in the "Denial" phase (cf. Five Stages of Grief).

Because certain filesystems work "better" than "well", and the defaults _during installation_ go for the better. Simple as that.

A reason why FAT is enlisted is not because we like it, but because there may be corner cases of interoperability with Windows. With a Linux fs like reiser3 however, the Windows argument is out the window.

Now that you bring it up, we do use 32-bit FAT and NTFS, but not for Linux system disks. We deliver product on disks and the customer insists on FAT and more recently NTFS. Just out of curiosity, would the installer even let someone try to install system partitions on FAT? I don't think FAT even allows sym-links.

...

...
Of course, we'll have to move from reiserfs at some point. Maybe this is the time?

(From parallel subthread:)

...
What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs? <sarcasm> Maybe it is? If things work "well" with reiser3, and you see FAT being better than reiser3 because it is listed in the default installation and reiser3 is not, surely you would not mind storing your 70 TB of data on FAT (hypothetically speaking — if FAT could hold that amount of data, which it does not).

I did forget my <sarcasm> brackets, but I'm not going to rise to your snarky comments, Jan. If you'd rather not hear how a user reacts with and uses openSuSE you could always "plonk" comments from those not in your echo chamber. You also didn't answer my question, "why change?" And which of your choices would you recommend we use? btrfs? Why? I don't have a religious faith in reierfs and am perfectly happy with changing, I'd just like to know why. Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Claudio Freire

18:40

On Fri, Oct 18, 2013 at 1:55 PM, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:

...

And which of your choices would you recommend we use? btrfs? Why? I don't have a religious faith in reierfs and am perfectly happy with changing, I'd just like to know why.

Unplug the plug of your machine while it's working, and the results will speak for themselves. Try to do it on a test machine. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

20 Oct 20 Oct

00:00

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2013-10-18 at 16:39 +0200, Jan Engelhardt wrote:

...

On Friday 2013-10-18 16:27, Lew Wolfgang wrote:

...

...
...
...
1. No more reiserfs! The package is still there.

If you can't select it there, what are your options if you want your system partitions to be reiserfs?

Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade.

Really? What filesystem is better for a million small files? - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJjHToACgkQtTMYHG2NR9V7gQCfbwjD9jrRFqVGzMlihmw0T9Uj 7MQAoJUR0HX3XM8GcNqtyyy0wmE70KLu =/x7B -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Cristian Rodríguez

00:18

El 19/10/13 21:00, Carlos E. R. escribió:

...

What filesystem is better for a million small files?

Any other filesystem that is not abandoned by their creators and kept in life-support by less than a handful of people that may stop doing so at any time (either due to the lack of business case or simply wanting to move on..) -- "If debugging is the process of removing bugs, then programming must be the process of putting them in." - Edsger Dijkstra -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

00:45

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-10-20 02:18, Cristian Rodríguez wrote:

...

El 19/10/13 21:00, Carlos E. R. escribió:

...
What filesystem is better for a million small files?

Any other filesystem that is not abandoned by their creators and kept in life-support by less than a handful of people that may stop doing so at any time (either due to the lack of business case or simply wanting to move on..)

That's a non-answer. - -- Cheers / Saludos, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlJjJ40ACgkQtTMYHG2NR9X1+ACffU25FMJMNOTHRK500/0gQj4F mw8AnR48y0WvjXDXmKtgpIVUfhuFeHnk =MKLW -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

07:45

On Sunday 2013-10-20 02:00, Carlos E. R. wrote:

...

On Friday, 2013-10-18 at 16:39 +0200, Jan Engelhardt wrote:

...
Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade.

Really?

What filesystem is better for a million small files?

Better for what usecase, though? - ro - space: probably squashfs - interoperability: iso9660/udf - rw - speed: xfs (David Chinner does regular talks/videos with graphs) - wear-leveling support: one of the oodles of flash fs (I do not have enough information on this one) - snapshots: btrfs - interoperability: vfat (+ posixovl) - distributedness: one of the handful of cluster solutions (ceph/gluster/...) Still in flux so as to make a definite answer For each, you pay a price, though. And usually in the realm of "\forall x \in f, can't pick more than two at most". For reiser3, I do not see more than one x offered to pick from. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Stefan Seyfried

21 Oct 21 Oct

07:03

Am 20.10.2013 09:45, schrieb Jan Engelhardt:

...

On Sunday 2013-10-20 02:00, Carlos E. R. wrote:

...
What filesystem is better for a million small files?

Better for what usecase, though?

...

- rw - speed: xfs (David Chinner does regular talks/videos with graphs)

Realls? Lots of small files, XFS and speed are three things that never went together in my world. Might have changed recently, but e.g. removing a kernel tree from XFS always took ages. (I'm still a big fan of XFS, but I know when it is better to just mkfs and restore the backup without the files you want gone instead of "rm -rf /bigdir" :-) -- Stefan Seyfried "If your lighter runs out of fluid or flint and stops making fire, and you can't be bothered to figure out about lighter fluid or flint, that is not Zippo's fault." -- bkw -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

20 Oct 20 Oct

19:30

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-20 at 02:00 +0200, Carlos E. R. wrote:

...

On Friday, 2013-10-18 at 16:39 +0200, Jan Engelhardt wrote:

...
On Friday 2013-10-18 16:27, Lew Wolfgang wrote:

...
...
...
...
1. No more reiserfs! The package is still there.

If you can't select it there, what are your options if you want your system partitions to be reiserfs?

Why would *anyone* want to still use reiser3? There are, arguably, better choices available in this decade.

Really?

What filesystem is better for a million small files?

Ok, I have tested this myself, on reiserfs, ext4, btrfs, and xfs. I attempted to create 1 million files on a 2 GiB partition of those types. I use this code: ERROR=0 time for i in `seq -w 1 100`; do if [ $ERROR -ne 0 ]; then break fi echo $i for j in `seq -w 1 100`; do if [ $ERROR -ne 0 ]; then break fi for k in `seq -w 1 100`; do cp $SAMPLE $WHERE/TT-$i-$j-$k if [ $? -gt 0 ]; then echo "Copy error on TT-$i-$j-$k, abort" ERROR=1 break fi done done done The 'sample' file is 100B size, with random content, and the test is run on a virtual machine (vmplayer) with 13.1 RC1 These were the results: time (real/user/sys) number used free of files space space reiserfs 62m32s/ 11m24s/ 35m27s 1000000 286M 1,8G ext4 8m14s/1m28/4m41s 141736 541M 1,3G (fails) xfs 30m19s/ 5m25/ 17m7 488254 2,0G 104K btrfs can not test (1) ext4 aborts the test because inodes are spent too soon. xfs aborts because of lack of space, despite the fact that the files occupy just 100 MB, in theory. The rest is metadata, I assume. I can not test btrfs because I ran the test once previously, and despite deleting the files, they must be still there somewhere: Eleanor4:~ # df -h /data/btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb7 2,0G 264K 7,9M 4% /data/btrfs Eleanor4:~ # There is only 7.9 MB available. But the directories are empty, and there are no hidden files. yast-snapper says there is no configuration and exits. The partition is thus unusable. Besides reformatting it, I don't know what to do with it. Thus, so far only Reiserfs passes the test. *** raw data *** - --- reiserfs: 100 real 62m32.182s user 11m24.198s sys 35m27.802s 4,7G /data/reiserfs/test Filesystem Size Used Avail Use% Mounted on /dev/sdb6 2,0G 286M 1,8G 14% /data/reiserfs cer@Eleanor4:~> - --- ext4: Copy error on TT-014-017-036, abort real 8m14.710s user 1m28.186s sys 4m41.063s 519M /data/ext4/test Filesystem Size Used Avail Use% Mounted on /dev/sdb9 2,0G 541M 1,3G 30% /data/ext4 cer@Eleanor4:~> - --- xfs 047 048 cp: cannot create regular file ‘/data/xfs/test/TT-048-082-054’: No space left on device Copy error on TT-048-082-054, abort real 30m19.953s user 5m25.073s sys 17m7.339s 1,9G /data/xfs/test Filesystem Size Used Avail Use% Mounted on /dev/sda8 2,0G 2,0G 104K 100% /data/xfs cer@Eleanor4:~> - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJkL1wACgkQtTMYHG2NR9VQAgCfZLpQRtdsADT+fXD9AVTuXe91 qGoAn0+182zYSVDWSFsyQA+Hns5fEYdD =bF3M -----END PGP SIGNATURE-----

Carlos E. R.

22:56

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-20 at 21:30 +0200, Carlos E. R. wrote:

...

I can not test btrfs because I ran the test once previously, and despite deleting the files, they must be still there somewhere:

Filesystem is broken beyond repair. I used "btrfsck --repair" and now this happens: Eleanor4:~ # mount /data/btrfs mount: mount /dev/sdb7 on /data/btrfs failed: Cannot allocate memory Eleanor4:~ # I'm asking about that in a new thread: Re: [opensuse-factory] [13.1 RC1] I managed to destroy a btrfs - anyone wants to investigate it? - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJkX5EACgkQtTMYHG2NR9VuogCaAoiGSMZOHKa9LcruQcui3+oI rGEAoIfyYIsbsm7+rbP6llBYMMLE2Zac =FTAu -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

22 Oct 22 Oct

07:42

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Sunday 2013-10-20 21:30, Carlos E. R. wrote:

...

These were the results:

time (real/user/sys) number used free of files space space

reiserfs 62m32s/ 11m24s/ 35m27s 1000000 286M 1,8G ext4 8m14s/1m28/4m41s 141736 541M 1,3G (fails) xfs 30m19s/ 5m25/ 17m7 488254 2,0G 104K

Note that xfs and btrfs have delayed allocations/reservations or so, which cause the filesystem logic to reserve a large region for the expected journal/data/something items (I have observed values around the order of 500 to 4096MB - `df` will indicate it), and will only release it after it is synced to disk. In other words, a 2 GB disk may not be enough to hold all the temp things that xfs and btrfs want to use. You can of course mount with -o sync, at a given penalty. Heck, btrfs considers 1 GB to be really really small; there's a -M mixed option to mkfs.btrfs just for it (though it is not one that has helped in my experiments with such small volumes). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

09:17

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-10-22 09:42, Jan Engelhardt wrote:

...

On Sunday 2013-10-20 21:30, Carlos E. R. wrote:

...
These were the results:

time (real/user/sys) number used free of files space space

reiserfs 62m32s/ 11m24s/ 35m27s 1000000 286M 1,8G ext4 8m14s/1m28/4m41s 141736 541M 1,3G (fails) xfs 30m19s/ 5m25/ 17m7 488254 2,0G 104K

Note that xfs and btrfs have delayed allocations/reservations or so, which cause the filesystem logic to reserve a large region for the expected journal/data/something items (I have observed values around the order of 500 to 4096MB - `df` will indicate it), and will only release it after it is synced to disk.

In other words, a 2 GB disk may not be enough to hold all the temp things that xfs and btrfs want to use. You can of course mount with -o sync, at a given penalty.

Heck, btrfs considers 1 GB to be really really small; there's a -M mixed option to mkfs.btrfs just for it (though it is not one that has helped in my experiments with such small volumes).

There is something more: the btrfs partition crashes with this test. I can not empty it. I have tried 3 times, filling it up with small files, then deletting them (there are no snapshots), yet the partition remains full (with no files). I have to reformat it to be able to reuse it. (Bug 846807) Unknown if the last update solves this. - -- Cheers / Saludos, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlJmQpMACgkQtTMYHG2NR9UOLQCfVF5YDquYaolac4aLtnTBYRzB bx4An3PJmgCgCNzi3YVKlhxHQK5Xa1nf =DNVQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

David Disseldorp

23 Oct 23 Oct

00:58

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 20 Oct 2013 21:30:36 +0200 (CEST) "Carlos E. R." <robin.listas@telefonica.net> wrote:

...

These were the results:

time (real/user/sys) number used free of files space space

reiserfs 62m32s/ 11m24s/ 35m27s 1000000 286M 1,8G ext4 8m14s/1m28/4m41s 141736 541M 1,3G (fails) xfs 30m19s/ 5m25/ 17m7 488254 2,0G 104K

488254 * 4096 (default XFS block size) = 1999888384 Use `mkfs.xfs -b size=512 ...` and you shouldn't have a problem. ...

...

Thus, so far only Reiserfs passes the test.

Your pass criteria leave a lot to be desired. Nevertheless, thanks for the testing and bug report. Cheers, David -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJnHzsACgkQYYkJRV19jVz3tACgoI/fu5gtVVku/HIjViS1dSjU ACkAn1ox/LfM1fHmV4X28cLVipiNOl8c =Xt1X -----END PGP SIGNATURE-----

Carlos E. R.

27 Oct 27 Oct

10:14

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Content-ID: <alpine.LNX.2.00.1310271110500.9596@Telcontar.valinor> On Sunday, 2013-10-20 at 21:30 +0200, Carlos E. R. wrote:

...

The 'sample' file is 100B size, with random content, and the test is run on a virtual machine (vmplayer) with 13.1 RC1

These were the results:

time (real/user/sys) number used free of files space space

reiserfs 62m32s/ 11m24s/ 35m27s 1000000 286M 1,8G ext4 8m14s/1m28/4m41s 141736 541M 1,3G (fails) xfs 30m19s/ 5m25/ 17m7 488254 2,0G 104K btrfs can not test (1)

Fill test - 10⁶ files, 100 bytes each. number | time (real/user/sys) | rm files time | used free % of files | | (real/user/sys) | space space reiserfs 1000000 | 62m18.341s/11m16.432s/35m44.060s | 0m30.594s/0m0.583s/0m27.837s | 259M 1,8G 13% ext4 131635 | 8m05.024s/01m26.484s/04m37.876s | 0m02.143s/0m0.154s/0m01.824s | 546M 1,3G 30% - failed: spent inodes. ext3 131635 | 8m08.570s/01m27.086s/04m40.344s | 0m01.994s/0m0.159s/0m01.665s | 522M 1,3G 29% - failed: spent inodes. xfs 478941 | 30m23.573s/05m20.020s/17m15.349s | 0m23.925s/0m0.481s/0m20.670s | 2,0G 20K 100% - failed btrfs 1000000 | 67m1.486s/ 11m44.258s/38m4.694s | 6m06.872s/0m0.489s/1m14.960s | 1,3G 566M 70% However, btrfs breaks on this test. After removal of the files, it is still full, without snapshots: Filesystem Size Used Avail Use% Mounted on /dev/sdb7 2,0G 648K 7,5M 8% /data/btrfs ***** btrfsck repairs nothing. There is a bugzilla on this. Both reiserfs and btrfs are the winners in number of small files they allow, but btrfs is simply not reliable, it breaks. Both ext3 and ext4 fail to allow that big number of files. xfs also fails, but it is better than ext3/4. Speed test - done with only 10000 files of 100 bytes each so that all filesystems can cope: time (real/user/sys) | | used free % cp files | rm files |space space usd reiserfs 6m05.288s/1m4.242s/3m29.046s | 0m1.933s/0m0.104s/0m1.775s | 55M 2,0G 3% ext3 6m06.564s/1m3.989s/3m30.341s | 0m1.457s/0m0.138s/0m1.219s | 397M 1,5G 22% ext4 6m09.012s/1m5.431s/3m30.416s | 0m1.518s/0m0.110s/0m1.327s | 422M 1,4G 23% xfs 6m11.045s/1m4.769s/3m31.739s | 0m3.998s/0m0.127s/0m3.464s | 632M 1,4G 31% btrfs 6m02.617s/1m4.548s/3m27.906s | 0m2.912s/0m0.113s/0m2.648s | 119M 1,8G 7% All filesystems are equally fast in virtual hardware. Interestingly, xfs is slow removing many files (rm -rf $WHERE/), and btrfs is slow too, but faster. Possibly the timing results would be different in real hardware. Speed test - done with 2000 files of 10000 bytes each so that all filesystems can cope: time (real/user/sys) | used | free % cp files | rm files space | space usd reiserfs 1m12.859s/0m12.199s/0m41.629s 0m0.500s/0m0.014s/0m0.470s 270M 1,8G 14% ext3 1m12.677s/0m12.330s/0m41.809s 0m0.295s/0m0.022s/0m0.260s 238M 1,6G ext4 1m12.288s/0m12.652s/0m41.277s 0m0.318s/0m0.020s/0m0.286s 247M 1,6G 14% xfs 1m12.528s/0m12.653s/0m41.425s 0m0.765s/0m0.024s/0m0.689s 337M 1,7G 17% btrfs 1m11.970s/0m12.462s/0m41.382s 0m0.637s/0m0.020s/0m0.585s 221M 1,6G 13% I'll post other set of results done on real hardware. The virtual machine simply does not have enough CPU power, and the virtual disk is also slow. - -- Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJs530ACgkQtTMYHG2NR9WP8gCeO8uwMlqN7mipONip3tYJNcFF 2T0AnjQk8KO2f4l8Aq1uKEMGmrrezRnf =+ji9 -----END PGP SIGNATURE-----

Carlos E. R.

11:02

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Content-ID: <alpine.LNX.2.00.1310271152190.30825@Telcontar.valinor> On Sunday, 2013-10-27 at 11:14 +0100, Carlos E. R. wrote:

...

I'll post other set of results done on real hardware. The virtual machine simply does not have enough CPU power, and the virtual disk is also slow.

Here I repeat the test on real hardware. I have used a script with variations as I find out things; when the difference is significative, I repeated the test. I'll post the script below. I did this on my same main machine, but on a spare 1TB HD. I reuse the same test partition, creating a different filesystem in the script prior to testing it. It was done on 12.3 system, I do not have available a 13.1 partition on this machine yet. Sorry about that. The first test is simply one to try to fill the partition to capacity with very small files (100B), written to the same directory This is intentional: it is a kind of load for which reiserfs was designed for, as the goal of these tests is to find a replacement for reiserfs. Well, reiserfs is still the king on this test, there is no replacement, period. Filesystem formatting are all defaults, no adjustments of any kind. Another finding is that I made the system to crash, several times (kernel crash or even panic). The crash seems related to the btrfs partition. Two theories: the filesystem would get corrupted during the test, or the system got confused when the btrfs partition was reformatted as something else and reused. I have not analyzed this. (In the previous test on virtual hardware with 13.1 RC1 the btrfs partition would get corrupted beyond repair. Despite erasing the files, and being no snapshots, the metadata would not be freed. Or some other explanation. I had to reformat the partition to recover it. Either it doesn't like the small file stress test, or it doesn't like to be filled to capacity. At least, the kernel would not crash. Yes, there is a bugzilla.) Partition of 8000MiB (7.81 GiB) Fill test - up to 10 million files files, 100 bytes each. number | time (real/user/sys) | listing time | rm files time | used free % of files | | (real/user/sys) | (real/user/sys) | space space reiserfs 10001000 | 182m19.821s/9m34.975s/30m20.041s | 1m24.464s/1m20.187s/0m3.848s| 7m38.435s/0m8.169s/5m15.013s | 2.2G 5.7G 29% ext4 512051 | 28m 4.919s/0m30.328s/20m13.093s | 0m 0.856s/0m0.618s/0m0.250s | 0m33.213s/0m0.546s/0m10.289s | 2.2G 5.2G 30% fail ext3 512051 | 28m 1.007s/0m30.820s/20m3.668s | 0m32.291s/0m0.559s/0m9.764s | 0m 0.860s/0m0.613s/0m0.257s | 2.2G 5.2G 29% fail xfs 1509755 | 25m 0.937s/1m21.044s/4m17.662s | 0m 1.657s/0m0.923s/0m0.531s | 1m31.523s/0m1.413s/0m52.991s | 6.7G 1.2G 86% fail btrfs +5555076 | 493m43.155s/6m32.774s/24m9.205s | 2m39.443s/0m3.718s/0m3.267s | kernel panic! | 5.7G 808M 88% fail ext3/4 fail soon enough. The limit is apparently lack of inodes, as there is plenty of space available. xfs apparently converts disk space to metadata on the fly, it allows 3 times as many files as ext3/4, and in less time! btrfs... Well, btrfs is fast till some point, when it starts to go slower, and then crawl. When there were about 810 MB free (about 4 million files), writing would stop for a minute or more about every 50 files, with cpu almost 0 (50 id, 50 wa)). The only busy task was btrfs-submit-1. The entire system was slowed, windows would not redraw, keyboard would lag. It is possible that if the test disk were different from the system disk results might be different. I had to force the test to abort by filing the rest of the partition with a big file (the idea was to abort and get the timings), using: rescate2:~ # time dd if=/dev/zero of=/data/Test/dummy dd: writing to ‘/data/Test/dummy’: No space left on device 1651458+0 records in 1651457+0 records out 845545984 bytes (846 MB) copied, 1158.74 s, 730 kB/s real 19m57.715s user 0m0.371s sys 0m5.646s rescate2:~ # rm /data/Test/dummy # - instant. Notice the time: I let it run for about half a day! Well, 8 hours, apparently. Five and a half million files, which is impressive - but unusable slow. reiserfs... I wrote TEN MILLION small files on a single directory on reiserfs, in about 180 minutes. Well, that's really impressive. And it seemed happy to accept many more, but I had to stop somewhere... :-) I'll put on another email the results of testing a larger partition with fewer files. - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJs8qsACgkQtTMYHG2NR9XAIgCeL0bo+/pWM7d3/G/eyLf5et5O ayYAn0QTlXqVgjhe2kROE+w1dbEJ9xEj =jOkc -----END PGP SIGNATURE-----

Carlos E. R.

11:43

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 12:02 +0100, Carlos E. R. wrote:

...

Here I repeat the test on real hardware. I have used a script with variations as I find out things; when the difference is significative, I repeated the test. I'll post the script below. I did this on my same main machine, but on a spare 1TB HD. I reuse the same test partition, creating a different filesystem in the script prior to testing it.

...

I'll put on another email the results of testing a larger partition with fewer files.

For this part of the test, I enlarged the partition to 120 GiB. The script was the same, but the goal now was to time writing of files of different sizes on the different types of filesystem. Same directory (intentional). On files bigger than 1 MB, the limiting factor is raw disk write speed. Smaller, and the limit is CPU and code (ie, the disk was not at max w. speed). Partition of 120 GiB 10000 files of 10 MB (not 10 MiB). Disk busy at up to 155MB/s - hardware is the limit. time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 12m39.628s/0m4.775s/3m10.519s | 0m0.309s/0m0.032s/0m0.005s | 0m13.438s/0m0.032s/0m0.005s | 94G 27G 78% ext4 12m15.974s/0m3.833s/1m43.232s | 0m0.077s/0m0.043s/0m0.009s | 0m6.394s/0m0.004s/0m1.490s | 94G 19G 84% ext3 15m20.931s/0m3.756s/2m40.206s | 0m0.156s/0m0.045s/0m0.007s | 0m9.677s/0m0.005s/0m3.006s | 94G 19G 84% xfs 11m11.731s/0m4.737s/1m26.311s | 0m0.155s/0m0.024s/0m0.005s | 0m9.480s/0m0.011s/0m1.584s | 94G 27G 78% btrfs 10m37.071s/0m4.837s/1m11.542s | 0m0.376s/0m0.027s/0m0.004s | 0m9.755s/0m0.004s/0m2.691s | 92G 27G 78% 10000 files of 1 MB (not 1 MiB). time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 1m 6.632s/0m2.175s/0m20.469s | 0m0.171s/0m0.026s/0m0.003s | 0m9.939s/0m0.004s/0m2.638s | 9.4G 111G 8% ext4 1m 5.183s/0m1.362s/0m11.854s | 0m0.200s/0m0.043s/0m0.007s | 0m7.921s/0m0.010s/0m1.376s | 9.6G 103G 9% ext3 1m31.627s/0m2.897s/0m19.668s | 0m0.132s/0m0.043s/0m0.007s | 0m7.735s/0m0.006s/0m1.600s | 9.6G 103G 9% xfs 0m57.785s/0m2.869s/0m9.561s | 0m0.150s/0m0.025s/0m0.004s | 0m6.516s/0m0.009s/0m1.428s | 9.4G 111G 8% btrfs 0m56.036s/0m3.342s/0m10.203s | 0m0.163s/0m0.025s/0m0.006s | 0m1.913s/0m0.008s/0m1.853s | 8.1G 110G 7% 10000 files of 100 KB . time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 0m12.491s/0m3.102s/0m8.400s | 0m0.031s/0m0.027s/0m0.005s | 0m0.537s/0m0.006s/0m0.528s | 1011M 120G 1% No more than 20 MB/s ext4 0m11.733s/0m0.786s/0m1.725s | 0m0.050s/0m0.045s/0m0.005s | 0m0.361s/0m0.005s/0m0.350s | 1.2G 111G 2% No more than 20 MB/s ext3 0m28.568s/0m3.134s/0m8.338s | 0m0.051s/0m0.046s/0m0.005s | 0m0.470s/0m0.004s/0m0.459s | 1.2G 111G 2% No more than 50 MB/s xfs 0m11.061s/0m1.032s/0m2.495s | 0m0.030s/0m0.026s/0m0.005s | 0m0.891s/0m0.004s/0m0.441s | 1.1G 119G 1% negligible or delayed btrfs 0m11.006s/0m0.846s/0m1.735s | 0m0.030s/0m0.025s/0m0.005s | 0m0.600s/0m0.006s/0m0.583s | 392M 118G 1% negligible or delayed 10000 files of 10 KB . time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 0m10.244s/0m0.667s/0m1.273s | 0m0.029s/0m0.026s/0m0.003s | 0m0.619s/0m0.008s/0m0.485s | 151M 120G 1% negligible + peak of 60 ext4 0m10.137s/0m0.766s/0m1.400s | 0m0.050s/0m0.044s/0m0.007s | 0m0.152s/0m0.004s/0m0.144s | 345M 112G 1% ext3 0m11.788s/0m0.676s/0m1.361s | 0m0.049s/0m0.043s/0m0.007s | 0m0.181s/0m0.001s/0m0.175s | 306M 112G 1% xfs 0m10.149s/0m0.559s/0m1.237s | 0m0.028s/0m0.024s/0m0.005s | 0m0.261s/0m0.007s/0m0.253s | 348M 120G 1% btrfs 0m10.143s/0m0.732s/0m1.217s | 0m0.030s/0m0.023s/0m0.007s | 0m0.362s/0m0.010s/0m0.348s | 68M 118G 1% 10000 files of 1 KB time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 0m10.112s/0m0.653s/0m1.317s | 0m0.030s/0m0.026s/0m0.004s | 0m0.294s/0m0.001s/0m0.290s | 20M 118G 1% ext4 0m10.135s/0m0.812s/0m1.471s | 0m0.049s/0m0.047s/0m0.002s | 0m0.125s/0m0.003s/0m0.119s | 266M 112G 1% ext3 0m10.326s/0m0.769s/0m1.536s | 0m0.052s/0m0.043s/0m0.009s | 0m0.154s/0m0.001s/0m0.150s | 228M 112G 1% xfs 0m10.209s/0m0.634s/0m1.279s | 0m0.029s/0m0.027s/0m0.002s | 0m0.240s/0m0.008s/0m0.231s | 270M 120G 1% btrfs 0m10.155s/0m0.761s/0m1.447s | 0m0.030s/0m0.022s/0m0.008s | 0m0.289s/0m0.009s/0m0.277s | 20M 118G 1% 10000 files of 100 B time (real/user/sys) | listing time | rm files time | used free % | (real/user/sys) | (real/user/sys) | space space reiserfs 0m10.171s/0m0.795s/0m1.445s | 0m0.030s/0m0.027s/0m0.003s | 0m0.282s/0m0.002s/0m0.277s | 9.9M 118G 1% ext4 0m10.087s/0m0.779s/0m1.387s | 0m0.050s/0m0.045s/0m0.006s | 0m0.126s/0m0.004s/0m0.121s | 266M 112G 1% ext3 0m10.266s/0m0.732s/0m1.415s | 0m0.050s/0m0.040s/0m0.010s | 0m0.156s/0m0.005s/0m0.148s | 228M 112G 1 xfs 0m10.193s/0m0.680s/0m1.332s | 0m0.028s/0m0.023s/0m0.005s | 0m0.235s/0m0.003s/0m0.231s | 270M 120G 1% btrfs 0m10.195s/0m0.689s/0m1.394s | 0m0.030s/0m0.026s/0m0.004s | 0m0.280s/0m0.005s/0m0.273s | 9.6M 118G 1% Here I thought that it would be interesting to add a "sync" in the script and find out the differences. The results were curious: the operation was faster, sometimes significantly so, if we stopped to do a sync. I altered the script to insert a sync every thousand writes, and a few more (timed). It does a double test: with and without sync. Notice that without sync the times should be about the same as the previous round... but that was not so. Curious! 10000 files of 10 MB (not 10 MiB). time (real/user/sys) reiserfs 10m51.732s/0m4.660s/1m12.536s :-? why so fast now? ext4 12m26.604s/0m3.790s/0m3.790s One initial test, surprisingly fast. Verify script and repeat test. mkfs | cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m 2.965s | 13m1.503s/0m4.594s/3m10.574s | 14m2.930s/0m4.761s/3m35.965s ext4 0m 1.853s | 12m28.622s/0m3.748s/1m44.049s | 13m5.449s/0m4.612s/2m10.158s ext3 0m18.785s | 15m30.909s/0m3.751s/2m41.437s | 14m44.870s/0m4.696s/2m49.492s xfs 0m 0.533s | 11m30.054s/0m4.661s/1m27.444s | 31m47.276s/0m4.671s/1m53.780s btrfs 0m 0.018s | 10m54.451s/0m4.551s/1m12.906s | 11m6.555s/0m6.311s/1m37.479s | 0m9.513s # Created batch mode script, can run full night no intervention. Here I modified the script to the final form. I got tired of starting the script many times, so I decided to make a script that would run unattended the entire remaining tests. On this section is when I had crashes. Flow: 10000 file copy operations, with a sync every 1000. Time that. 10000 file copy operations, timed. one sync operation at the end, timed. Repeat for all filesystem types (reformatting the same partition), then repeat for a different file size. (log_tests.log) 10000 files of 10 MB (not 10 MiB). mkfs | cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m 2.897s | 13m2.899s/0m4.682s/3m10.229s | 0m0.104s/0m5.103s/3m35.783s | 0m9.404s ext4 0m 1.652s | 12m32.215s/0m6.298s/1m48.388s | 13m 8.495s/0m3.760s/2m3.915s | 0m9.744s ext3 0m18.623s | 15m36.742s/0m3.735s/2m40.927s | 15m15.403s/0m3.535s/2m45.379s | 0m2.636s xfs 0m 0.653s | 11m24.977s/0m4.969s/1m28.766s | 30m59.347s/0m3.964s/1m53.809s | 0m11.319s btrfs 0m 0.076s | 11m13.964s/0m5.135s/1m15.799s | 11m11.298s/0m5.208s/1m38.627s | 0m1.867s Notice the longer time XFS takes on the final sync. XFS caches metadata internally, I understand. Also notice the much longer time the operation takes _without_ periodic syncs, about 3 times more! The only explanation I can think is that the cache filled. (log_tests.2.log) 10000 files of 1 MB (not 1 MiB). mkfs | cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m 3.405s | 1m37.456s/0m1.862s/0m20.887s | 1m17.111s/0m3.297s/0m3.297s | 0m4.956s ext4 0m 1.457s | 1m32.267s/0m3.553s/0m15.194s | 1m0.911s/0m3.537s/0m18.777s | 0m8.439s ext3 0m18.491s | 1m39.369s/0m2.648s/0m19.839s | 1m36.755s/0m2.471s/0m20.403s| 0m8.131s xfs 0m 0.549s | 1m22.179s/0m1.300s/0m11.512s | 1m7.481s/0m1.344s/0m13.137s | 0m12.275s btrfs 0m 0.062s | 1m21.574s/0m1.874s/0m11.104s | 1m7.150s/0m2.163s/0m11.553s | 0m6.719s 10000 files of 100 KB. cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m20.739s/0m2.924s/0m8.877s | 0m13.576s/0m2.891s/0m8.807s | 0m2.334s ext4 0m20.953s/0m2.696s/0m7.068s | 0m14.385s/0m2.986s/0m8.717s | 1m57.328s * ext3 0m43.545s/0m3.001s/0m8.815s | 0m16.560s/0m2.942s/0m8.786s | 0m13.453s xfs 0m21.382s/0m2.912s/0m7.603s | 0m17.859s/0m3.003s/0m8.767s | 0m4.363s btrfs 0m21.254s/0m1.532s/0m4.542s | 0m11.828s/0m2.920s/0m8.103s | 0m6.200s Notice here the 2 minute delay that ext4 took to sync after the test cycle without syncs inserted. Possibly because it had cached and delayed a lot of operations. On the whole, with syncs inserted it is slower, which is expected. But not so with XFS, btrfs and reiserfs, which is, well, surprising. 10000 files of 10 KB. cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m17.250s/0m0.545s/0m1.560s | 0m11.078s/0m0.571s/0m1.691s | 0m0.526s ext4 0m17.346s/0m0.515s/0m1.625s | 0m10.825s/0m0.634s/0m1.841s | 1m53.964s * ext3 0m17.282s/0m0.537s/0m1.688s | 0m10.546s/0m0.624s/0m1.711s | 0m1.020s xfs 0m14.766s/0m0.552s/0m1.635s | 0m11.943s/0m1.096s/0m3.143s | 0m0.517s btrfs 0m15.313s/0m0.582s/0m1.841s | 0m10.771s/0m0.717s/0m2.027s | 0m1.658s kernel crash happened around here. I think that btrfs was umountable. Corruption happened around here. umount of btrfs must have failed, because: reiserfs_create: could not open /dev/disk/by-id/wwn-0x5000c500613c92d5-part6: Device or resource busy 2013-10-25T23:14:22.755216+02:00 mounted ext4 Crash happened after 23:21:24, while running 100B btrfs test. There are syslog entries at 23:42:12.756218 about creating btrfs. Probably next trial round failed. Repeat test from the 10 KB cycle. (log_tests.3.log) 10000 files of 10 KB. cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m14.895s/0m0.705s/0m1.268s | 0m11.043s/0m1.652s/0m3.153s | 0m0.319s ext4 0m16.641s/0m1.357s/0m1.528s | 0m10.640s/0m1.333s/0m2.775s | 0m0.116s ext3 0m14.608s/0m0.580s/0m1.362s | 0m11.303s/0m0.572s/0m1.373s | 0m0.554s xfs 0m12.927s/0m0.584s/0m1.421s | 0m11.263s/0m0.581s/0m1.622s | 0m0.670s btrfs 0m13.590s/0m0.622s/0m1.492s | 0m10.470s/0m0.617s/0m1.738s | 0m1.538s Now, notice the lack of the 2 minute delay on the extra sync, for the ext4 test. I have no explanation for this; perhaps as I had to reboot, the system cache was fully emptied. Maybe, to do these test properly, the system would have to be rebooted on each test :-? 10000 files of 1 KB (not 10 MiB). cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m19.629s/0m0.559s/0m1.420s | 0m10.434s/0m0.565s/0m1.457s | 0m0.248s ext4 0m16.324s/0m0.444s/0m1.455s | 0m10.400s/0m0.554s/0m1.562s | 0m0.143s ext3 0m15.161s/0m0.483s/0m1.379s | 0m10.079s/0m0.482s/0m1.228s | 0m0.322s xfs 0m13.030s/0m0.575s/0m1.535s | 0m11.279s/0m0.499s/0m1.565s | 0m0.616s btrfs 0m13.153s/0m0.587s/0m1.686s | 0m10.371s/0m0.606s/0m1.578s | 0m2.363s 10000 files of 100 B (not 10 MiB). cp with sync every 1000 files | cp without sync | extra sync reiserfs 0m17.863s/0m0.555s/0m1.460s | 0m10.433s/0m0.571s/0m1.486s | 0m1.230s and kernel crash in the ext4 test. Repeat. (log_tests.4.log) reiserfs 0m20.510s/0m0.694s/0m1.386s | 0m10.512s/0m0.951s/0m1.420s | 0m0.991s ext4 0m25.058s/0m1.327s/0m1.491s | 0m10.539s/0m0.909s/0m1.565s | 0m0.125s ext3 0m15.586s/0m0.420s/0m1.497s | 0m10.158s/0m0.440s/0m1.471s | 0m0.529s xfs 0m13.545s/0m0.555s/0m1.457s | 0m11.247s/0m0.512s/0m1.534s | 0m0.600s btrfs 0m12.838s/0m0.557s/0m1.463s | 0m10.240s/0m0.519s/0m1.398s | 0m0.455s The only conclusion I draw, for my own consumption, is that I will not use btrfs, nor in 12.3, nor in 13.1. I managed to reliable crash it several times... And if I managed to crash it with a simple, home grown stress test, what other traps may be lurking in it yet to be found? Another conclusion (that makes 2) is that reiserfs is no longer the fastest filesystem; and that despite being more than 10 years old, with only minimal maintenance, it can hold its own against newer or better maintained code. This tells a lot in its favour. With more love, it might surpass them all. Pity. For the curious: <http://en.wikipedia.org/wiki/Reiserfs> - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJs/EcACgkQtTMYHG2NR9Wd1QCdG09nkuXIIvAR9peZvUJy5FP7 6xMAnA4el22KA2Xx1r3TcAx3ByeV1wK3 =EJvt -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

12:37

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

"Carlos E. R." <robin.listas@telefonica.net> wrote:

...

-----BEGIN PGP SIGNED MESSAGE-----

Now, notice the lack of the 2 minute delay on the extra sync, for the ext4 test. I have no explanation for this; perhaps as I had to reboot, the system cache was fully emptied. Maybe, to do these test properly, the system would have to be rebooted on each test :-?

At a minimum you should empty the cache between runs: # flush dirty pages to disk sync # throw clean pages away echo 3 > /proc/sys/vm/drop_caches Drop_caches accepts 1, 2, 3. I don't recall offhand what each value does. Greg -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Yamaban

13:25

New subject: [opensuse-factory] Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Sun, 27 Oct 2013 13:37, Greg Freemyer <greg.freemyer@...> wrote:

...

"Carlos E. R." <robin.listas@telefonica.net> wrote:

...
-----BEGIN PGP SIGNED MESSAGE-----

Now, notice the lack of the 2 minute delay on the extra sync, for the ext4 test. I have no explanation for this; perhaps as I had to reboot, the system cache was fully emptied. Maybe, to do these test properly, the system would have to be rebooted on each test :-?

At a minimum you should empty the cache between runs:

# flush dirty pages to disk sync # throw clean pages away echo 3 > /proc/sys/vm/drop_caches

Drop_caches accepts 1, 2, 3. I don't recall offhand what each value does.

excerpts from my freemem-script: [ code ] # 0 = non valid !! since kernel 3.0 # 1 = free pagecache # 2 = free dentries and inodes # 3 = free both ( 1 + 2 ) /usr/bin/sync /sbin/sysctl -q -w vm.drop_caches=3 [ /code ] using /sbin/sysctl is equivialent to the "echo >/proc/sys/..." line above I recoment doing this at least once before each testrun else the kernel-caches pollute the measurements, as seen. And yes, as already pointed out : for mkfs.ext[34] the -H <inode-count> option is nescessary. be aware of ((number-of-dirs*2)+ number-of-files) *1.1) as minimum recommention for small files. - Yamaban. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

19:09

New subject: [opensuse-factory] Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 14:25 +0100, Yamaban wrote:

...

On Sun, 27 Oct 2013 13:37, Greg Freemyer <> wrote:

...
"Carlos E. R." <> wrote:

...

...
...
Now, notice the lack of the 2 minute delay on the extra sync, for the ext4 test. I have no explanation for this; perhaps as I had to reboot, the system cache was fully emptied. Maybe, to do these test properly, the system would have to be rebooted on each test :-?

At a minimum you should empty the cache between runs:

# flush dirty pages to disk sync # throw clean pages away echo 3 > /proc/sys/vm/drop_caches

Drop_caches accepts 1, 2, 3. I don't recall offhand what each value does.

excerpts from my freemem-script: [ code ]

# 0 = non valid !! since kernel 3.0 # 1 = free pagecache # 2 = free dentries and inodes # 3 = free both ( 1 + 2 ) /usr/bin/sync /sbin/sysctl -q -w vm.drop_caches=3

[ /code ]

using /sbin/sysctl is equivialent to the "echo >/proc/sys/..." line above

I recoment doing this at least once before each testrun else the kernel-caches pollute the measurements, as seen.

Noted for next time, thanks :-)

...

And yes, as already pointed out : for mkfs.ext[34] the -H <inode-count> option is nescessary. be aware of ((number-of-dirs*2)+ number-of-files) *1.1) as minimum recommention for small files.

Yes, I know that that the inode count and sector size can be adjusted. But I intentionally did not :-) I also wanted to test number of big files allowed by each filesystem, but I did not. Time! - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJtZOcACgkQtTMYHG2NR9URygCgjB066UNv8HZ7t+SvORUQArgF TzgAn0kOtHOgcUCfUqfgOCzXhAyBPKu+ =a19G -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Yamaban

20:16

New subject: [opensuse-factory] Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Sun, 27 Oct 2013 20:09, Carlos E. R. <robin.listas@...> wrote:

...

On Sunday, 2013-10-27 at 14:25 +0100, Yamaban wrote:

...
On Sun, 27 Oct 2013 13:37, Greg Freemyer <> wrote:

...
"Carlos E. R." <> wrote: [snip] And yes, as already pointed out : for mkfs.ext[34] the -H <inode-count> option is nescessary. be aware of ((number-of-dirs*2)+ number-of-files) *1.1) as minimum recommendation for small files.

Yes, I know that that the inode count and sector size can be adjusted. But I intentionally did not :-)

I also wanted to test number of big files allowed by each filesystem, but I did not. Time!

The trouble with that is the "assumptions" (Ass-out-of-you-and-me) that are made in the mke2fs code (have a look, use "-n" to see). (Details, see /etc/mke2fs.conf, esp. inode_ratio and blocksize) For a averaged file size above 20kB that may be ok, but below that? Just no! -- And that has been since before the 'invention' of ext3. I can vividly remember disaster recovery of a /var/spool/mail partition in 1999, with the changed use from mbox to maildir... most mails where about less then 4 kB, and you can do the math, no more inodes available. -- Lovely night that was. "mke2fs -T news" or "mke2fs -b 1024 -i 2048" are helpfull for such. So, it's well presumed that those that have such use-cases know something about the filesystem they use for it. Reiserfs has no static inode allocation, and thus does not hit this problem. To call btrfs "finished" or "done" is just false. Also see: http://btrfs.wiki.kernel.org/ mkfs.btrfs options "-n" and "-s" will come to play for this. Xfs, I had to look it up: man mkfs.xfs or http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/xfs-mkfs.... "mkfs.xfs -s size=1024 -b size=1024" Conclusion: In all main-stay FS, the defaults are NOT for lots of small files. Reiserfs can handle that, but would like some help, too. - Yamaban -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

20:37

New subject: [opensuse-factory] Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 21:16 +0100, Yamaban wrote:

...

On Sun, 27 Oct 2013 20:09, Carlos E. R. <robin.listas@...> wrote:

...

...
Yes, I know that that the inode count and sector size can be adjusted. But I intentionally did not :-)

I also wanted to test number of big files allowed by each filesystem, but I did not. Time!

The trouble with that is the "assumptions" (Ass-out-of-you-and-me) that are made in the mke2fs code (have a look, use "-n" to see).

(Details, see /etc/mke2fs.conf, esp. inode_ratio and blocksize)

For a averaged file size above 20kB that may be ok, but below that? Just no! -- And that has been since before the 'invention' of ext3.

Right...

...

I can vividly remember disaster recovery of a /var/spool/mail partition in 1999, with the changed use from mbox to maildir... most mails where about less then 4 kB, and you can do the math, no more inodes available. -- Lovely night that was.

I can imagine. I remember once, I untarred certain tar.gz archive. I don't remember where it came from, I think it was the database of mp3 tags that is accesible on internet somewhere. The file was not big, but when untarred it created many thousands of small files. I was untarring on an ext3 partition, perhaps ext2; it was very slow, hours perhaps, and expent the inodes. Recollection is vague. Knowing what had happened, I untarred again, this time on a reiserfs particion. Sucess! It was fast, minutes or seconds, and no problems.

...

So, it's well presumed that those that have such use-cases know something about the filesystem they use for it.

Yes. If I use a maildir spool or news spool, I early learnt that I had to do that :-)

...

Reiserfs has no static inode allocation, and thus does not hit this problem.

To call btrfs "finished" or "done" is just false. Also see: http://btrfs.wiki.kernel.org/

The mkfs.btrfs utility says that it should not be called directly, that it is intended for use by YaST.

...

mkfs.btrfs options "-n" and "-s" will come to play for this.

I confess I did not look at them, or not much. the man page said little.

...

Xfs, I had to look it up: man mkfs.xfs or http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/xfs-mkfs....

"mkfs.xfs -s size=1024 -b size=1024"

Noted, thanks.

...

Conclusion: In all main-stay FS, the defaults are NOT for lots of small files.

True. I wanted to try many files of other sizes, but I had to stop. Maybe I can rig something else for testing.

...

Reiserfs can handle that, but would like some help, too.

Yes, it would. I would like sometime to test reiserfs4... - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJteZ8ACgkQtTMYHG2NR9VlGgCdGnOUYTMogr8i3LGA8F0YIHzq rCUAn0qVxzqI8gSpXdjTsaCtQA71Q+bZ =TXwt -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Matthias G. Eckermann

28 Oct 28 Oct

16:35

New subject: [opensuse-factory] Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

Hello Carlos and all, On 2013-10-27 T 21:37 +0100 Carlos E. R. wrote:

...

...
Reiserfs has no static inode allocation, and thus does not hit this problem.

To call btrfs "finished" or "done" is just false. Also see: http://btrfs.wiki.kernel.org/

The mkfs.btrfs utility says that it should not be called directly, that it is intended for use by YaST.

this note in mkfs.btrfs is intended to prevent people from using immature/insane settings when creating a btrfs filesystem, and obviously is only true under the assumption that YaST uses sane settings, ... For your purposes (thanks, btw. for the tests, I like those:-) that comment can be ignored. so long - MgE -- Matthias G. Eckermann Senior Product Manager SUSE® Linux Enterprise SUSE LINUX Products GmbH Maxfeldstraße 5 90409 Nürnberg Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

20:32

New subject: [opensuse-factory] Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2013-10-28 at 17:35 +0100, Matthias G. Eckermann wrote:

...

Hello Carlos and all,

...

...
The mkfs.btrfs utility says that it should not be called directly, that it is intended for use by YaST.

this note in mkfs.btrfs is intended to prevent people from using immature/insane settings when creating a btrfs filesystem, and obviously is only true under the assumption that YaST uses sane settings, ... For your purposes (thanks, btw. for the tests, I like those:-) that comment can be ignored.

Ok... glad to be of use :-) - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJuyfYACgkQtTMYHG2NR9VXhwCfZu4RVM0aK4yRwWcDmL8hWjRW 8TQAn1M8oO8sVCtkhAsFN7rMmBTekKiY =19J8 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

12:56

New subject: [opensuse-factory] Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Sunday 2013-10-27 21:16, Yamaban wrote:

...

I can vividly remember disaster recovery of a /var/spool/mail partition in 1999, with the changed use from mbox to maildir... most mails where about less then 4 kB

Stats time. For 46 hoarded messages, I have Min. 1st Qu. Median Mean 3rd Qu. Max. 105 2263 3564 55700 6858 2299000 In other words: 4 KB blocks is "juuuuust fine". In 2013. :^) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Yamaban

13:30

New subject: [opensuse-factory] Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Mon, 28 Oct 2013 13:56, Jan Engelhardt <jengelh@...> wrote:

...

On Sunday 2013-10-27 21:16, Yamaban wrote:

...
I can vividly remember disaster recovery of a /var/spool/mail partition in 1999, with the changed use from mbox to maildir... most mails where about less then 4 kB

Stats time. For 46 hoarded messages, I have

Min. 1st Qu. Median Mean 3rd Qu. Max. 105 2263 3564 55700 6858 2299000

In other words: 4 KB blocks is "juuuuust fine". In 2013. :^)

And what is the inode_ratio?, sadly blocksize will not help you with the matter of a to small number of inodes available. In your example setting inode_ratio=4096 would do the trick. Just hoping that mke2fs will do the "right" thing will lead you astray. ATM /etc/mke2fs.conf defines a inode_ratio=16kB, that is ok for /usr/... , for a partition with pics, music and video, inode_ratio=64kB would be enough. For my $HOME I would need inode_ratio=8kB, b/c I do programming. I know that, but a Newbie User? I think not. For the non involved reiserfs, or -within limits- btrfs, are the sane choices. The defaults for ext[234] fs are inviting crashes. The YaST Partitioner should at least give a warning about number of inodes, and give a 'average file-size' option to select the correct "inode_ratio" for ext-fs (pre-select the default for partition-size). Other filesystems (XFS,JFS,ZFS,...) should be only in expert-mode made available, or above a certain partition-size (see btrfs and 15TB), with a strong hint about these fs not being useful in most SOHO cases. Thus I understand the propagating of btrfs as filesystem of choice. That's my 2ct on that. -Yamaban. PS: My spell checker had a funny: for "btrfs" it proposed "barfs". -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

13:44

New subject: [opensuse-factory] Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Monday 2013-10-28 14:30, Yamaban wrote:

...

...
...
I can vividly remember disaster recovery of a /var/spool/mail partition in 1999, with the changed use from mbox to maildir... most mails where about less then 4 kB

Stats time. For 46 hoarded messages, I have

Min. 1st Qu. Median Mean 3rd Qu. Max. 105 2263 3564 55700 6858 2299000

In other words: 4 KB blocks is "juuuuust fine". In 2013. :^)

And what is the inode_ratio?, sadly blocksize will not help you with the matter of a to small number of inodes available. In your example setting inode_ratio=4096 would do the trick.

14:42 ares07:~ > xfs_info . meta-data=/dev/md7 isize=256 agcount=32, agsize=22891675 blks = sectsz=512 attr=2 data = bsize=4096 blocks=732533583, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=357682, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

20:16

New subject: [opensuse-factory] Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Mon, Oct 28, 2013 at 9:30 AM, Yamaban <foerster@lisas.de> wrote:

...

Other filesystems (XFS,JFS,ZFS,...) should be only in expert-mode made available, or above a certain partition-size (see btrfs and 15TB), with a strong hint about these fs not being useful in most SOHO cases.

I argue XFS for sure should stay on the list. == background Dave Chinner (xfs devel) would argue that ext2/3/4 is what should be removed from the default list and XFS & btrfs be the only 2 choices. See this from Jan 2012: http://www.youtube.com/watch?v=FegjLbCnoBw He says anything with a 3.0 or newer kernel should be considering xfs in preference to ext2/3/4 if they have multiple CPUs (and even my phone has multiple CPUs) performing I/O to the fiilesystem. The core argument for that is recent XFS from the last couple years has incorporated a lot of the ext3 / ext4 speed improvements in journal handling, but it has maintained the scalability capacity it always had. Thus it is now useable from the desktop to the datacenter. I haven't seen a xfs related lost data horror story from a routine power outage in years, so the old statement the xfs only made sense if you had solid power is no longer true I don't think. Thus Dave Chinner argues that a user wanting a stable FS that scales from desktop to enterprise should focus on XFS. Users that want the features of btrfs should use it on their desktops and XFS on their big systems. Greg -- Greg Freemyer -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Yamaban

20:37

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Mon, 28 Oct 2013 21:16, Greg Freemyer <greg.freemyer@...> wrote:

...

On Mon, Oct 28, 2013 at 9:30 AM, Yamaban <foerster@lisas.de> wrote:

...
Other filesystems (XFS,JFS,ZFS,...) should be only in expert-mode made available, or above a certain partition-size (see btrfs and 15TB), with a strong hint about these fs not being useful in most SOHO cases.

I argue XFS for sure should stay on the list.

== background

Dave Chinner (xfs devel) would argue that ext2/3/4 is what should be removed from the default list and XFS & btrfs be the only 2 choices.

See this from Jan 2012: http://www.youtube.com/watch?v=FegjLbCnoBw He says anything with a 3.0 or newer kernel should be considering xfs in preference to ext2/3/4 if they have multiple CPUs (and even my phone has multiple CPUs) performing I/O to the fiilesystem.

The core argument for that is recent XFS from the last couple years has incorporated a lot of the ext3 / ext4 speed improvements in journal handling, but it has maintained the scalability capacity it always had. Thus it is now useable from the desktop to the datacenter.

I haven't seen a xfs related lost data horror story from a routine power outage in years, so the old statement the xfs only made sense if you had solid power is no longer true I don't think.

Thus Dave Chinner argues that a user wanting a stable FS that scales from desktop to enterprise should focus on XFS.

Users that want the features of btrfs should use it on their desktops and XFS on their big systems.

For big partitions (see 15 TB) better xfs than btrfs, sure. ATM I'd say btrfs up to ca 8 TB, XFS as a valid option ca 6 TB and up. What hinders me to give a full recomentation for XFS is the overhead on create and remove a file / dir . That summs up, see the tests. But, what I' like the most would be a easy to find page on opensuse.org that shows the pros, and cons, and use-cases for each available filesystem. With all the gotcha's and no-go's, maybe as a table FS vs. use-cases E.g. what kernel verion minimum for what feature, which distro release has what available. And include a link to that page in the mkfs man-pages. That would be help, for all, newbies and gurus. - Yamaban. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

22:09

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-- Greg Freemyer On Mon, Oct 28, 2013 at 4:37 PM, Yamaban <foerster@lisas.de> wrote:

...

On Mon, 28 Oct 2013 21:16, Greg Freemyer <greg.freemyer@...> wrote:

...
On Mon, Oct 28, 2013 at 9:30 AM, Yamaban <foerster@lisas.de> wrote:

...
Other filesystems (XFS,JFS,ZFS,...) should be only in expert-mode made available, or above a certain partition-size (see btrfs and 15TB), with a strong hint about these fs not being useful in most SOHO cases.

I argue XFS for sure should stay on the list.

== background

Dave Chinner (xfs devel) would argue that ext2/3/4 is what should be removed from the default list and XFS & btrfs be the only 2 choices.

See this from Jan 2012: http://www.youtube.com/watch?v=FegjLbCnoBw He says anything with a 3.0 or newer kernel should be considering xfs in preference to ext2/3/4 if they have multiple CPUs (and even my phone has multiple CPUs) performing I/O to the fiilesystem.

The core argument for that is recent XFS from the last couple years has incorporated a lot of the ext3 / ext4 speed improvements in journal handling, but it has maintained the scalability capacity it always had. Thus it is now useable from the desktop to the datacenter.

I haven't seen a xfs related lost data horror story from a routine power outage in years, so the old statement the xfs only made sense if you had solid power is no longer true I don't think.

Thus Dave Chinner argues that a user wanting a stable FS that scales from desktop to enterprise should focus on XFS.

Users that want the features of btrfs should use it on their desktops and XFS on their big systems.

For big partitions (see 15 TB) better xfs than btrfs, sure. ATM I'd say btrfs up to ca 8 TB, XFS as a valid option ca 6 TB and up.

What hinders me to give a full recomentation for XFS is the overhead on create and remove a file / dir . That summs up, see the tests.

I don't know what you saw in Carlos'es results to make you say that. I think the situation is far more nuanced than you suggest. For a single threaded situation, all of the speed benchmarks Carlos posted are acceptable as far as I'm concerned.

...

From Carlos'es earlier post in this thread for both 100 Byte files and 10MiB files:

...

...
Create Speads All 5 create the 10,000 100 Byte files at the same speed +/- 1%.

XFS is second fastest at creating 10MB files and is over 3 minutes faster that ext3 and a minute faster that ext4. In fact XFS is faster than everything but BtrFS at creating the 10MB files. If creating 10 MB files is important to you, then it is reiserfs and ext3/ext4 you should eliminate from you options. If you only care about small files, then they all create files very quickly. There is certainly no reason to make a blanket statement that XFS is slow at creating files.

...

...
Listing files For the 100 Byte files, all list the files at roughly the same speed.

For 10 MB files, listing is effectively fast for all except BtrFS and ReiserFS which take about 1/3 of a second each. For an interactive user, 1/3rd of a second is noticeable.

...

...
Removing files For 100 Byte files, xfs takes 0.235 seconds to delete 10,000 files. Both ReiserFS and BtrFS take longer. For 10 MB files, XFS is only bested by ext4, but admittedly by over 3 seconds.

------ I don't know about you, but I don't delete directories with 10,000 files in them very often and when I do I can stand to wait an extra 3 seconds. The speed tests certainly don't rule XFS out in my mind even for a singlethreaded test like Carlos did. Then if you watch the video, you see that Carlos tested XFS in it's worst mode. Were it shines is if you have multiple threads sending file i/o at it. For up to 8 threads it basically sees a linear speed bump. Neither ext4 or BtrFS can keep up. (ReiserFS was not part of Dave Chinner's testing). Historically XFS fell down because it couldn't handle huge tarball extraction and large tree deletion very well. That issue is resolved and if you are compiling in parallel like a lot of builds do these days, then XFS may be the best choice even for a developer working with large tarballs. fyi: I did not read the test script to confirm all the cache flushes are handled right for the above benchmark to be valid, but it probably doesn't matter. For a normal user, they just want to know how long the command prompt takes to return and don't care if the kernel is still processing the removes in the background. Greg -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

23:07

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Monday 2013-10-28 23:09, Greg Freemyer wrote:

...

...
...
Removing files

For 100 Byte files, xfs takes 0.235 seconds to delete 10,000 files. Both ReiserFS and BtrFS take longer. For 10 MB files, XFS is only bested by ext4, but admittedly by over 3 seconds.

...

I don't know about you, but I don't delete directories with 10,000 files in them very often and when I do I can stand to wait an extra 3 seconds.

Indeed. So much for real-world application. But - there is a simple test for something that does happen. Try some huge (larger than just 10M) file, like a DVD image, or better, yet, a whole disk image. Example. Cold disk caches: # sync; time (fallocate -l 500G xfs/xbig; sync); time (fallocate -l 500G ext4/xbig; sync) real 0m0.466s user 0m0.000s sys 0m0.016s real 0m9.482s user 0m0.000s sys 0m0.744s # sync; time (rm -f xfs/xbig; sync); time (rm -f ext4/xbig; sync); real 0m0.263s user 0m0.000s sys 0m0.012s real 0m11.374s user 0m0.000s sys 0m0.160s Warm cache: # sync; time (fallocate -l 500G xfs/xbig; sync); time (fallocate -l 500G ext4/xbig; sync) real 0m0.255s user 0m0.000s sys 0m0.008s real 0m6.968s user 0m0.000s sys 0m0.596s # sync; time (rm -f xfs/xbig; sync); time (rm -f ext4/xbig; sync); real 0m0.236s user 0m0.000s sys 0m0.008s real 0m8.887s user 0m0.000s sys 0m0.048s It looks like ext4 still has some bitmaps somewhere - the hard disk LED wants to support my thesis.

...

fyi: I did not read the test script to confirm all the cache flushes are handled right for the above benchmark to be valid, but it probably doesn't matter. For a normal user, they just want to know how long the command prompt takes to return and don't care if the kernel is still processing the removes in the background.

Why, so put the entire extraction into the background ;-) tar -xf linux-3.11.tar.xz & "Problem solved!1" -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

29 Oct 29 Oct

13:54

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Mon, Oct 28, 2013 at 7:07 PM, Jan Engelhardt <jengelh@inai.de> wrote:

...

Indeed. So much for real-world application. But - there is a simple test for something that does happen. Try some huge (larger than just 10M) file, like a DVD image, or better, yet, a whole disk image. Example.

Cold disk caches: # sync; time (fallocate -l 500G xfs/xbig; sync); time (fallocate -l 500G ext4/xbig; sync)

real 0m0.466s user 0m0.000s sys 0m0.016s

real 0m9.482s user 0m0.000s sys 0m0.744s

# sync; time (rm -f xfs/xbig; sync); time (rm -f ext4/xbig; sync);

real 0m0.263s user 0m0.000s sys 0m0.012s

real 0m11.374s user 0m0.000s sys 0m0.160s

Warm cache:

# sync; time (fallocate -l 500G xfs/xbig; sync); time (fallocate -l 500G ext4/xbig; sync)

real 0m0.255s user 0m0.000s sys 0m0.008s

real 0m6.968s user 0m0.000s sys 0m0.596s

# sync; time (rm -f xfs/xbig; sync); time (rm -f ext4/xbig; sync);

real 0m0.236s user 0m0.000s sys 0m0.008s

real 0m8.887s user 0m0.000s sys 0m0.048s

It looks like ext4 still has some bitmaps somewhere - the hard disk LED wants to support my thesis.

Dave Chinner addressed that in the video. Per him, ext4 still uses a bitmap to track freespace. Thus fallocate has to scan through the bitmap to find free space to build the file out of and rm has to modify all those bitmaps and write them back out to disk (since you are calling sync). It seems strange that fallocate completes faster than rm? If you are interested in xfs, then watching the video I posted before is time well spent <http://www.youtube.com/watch?v=FegjLbCnoBw> Greg -- Greg Freemyer -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

15:05

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Tuesday 2013-10-29 14:54, Greg Freemyer wrote:

...

If you are interested in xfs, then watching the video I posted before is time well spent <http://www.youtube.com/watch?v=FegjLbCnoBw>

You don't have to tell me -- I was the one who brought up (even advocated) XFS, the video and D.Chinner early, and on a regular basis ;-) {{citation}} follows: Mentioning the video: Date: Fri, 6 Sep 2013 18:35:04 (+0200) Message-ID: <alpine.LSU.2.11.1309061828420.25086@nerf07.vanv.qr> Subject: Re: [opensuse-factory] Re: [opensuse-kernel] BtrFS as default fs?

...

...
Besides Scalability there are other attributes where btrfs exceeds other filesystems. See various comparison tables out there, including the one in my blog from three years ago (yes, it's a bit dated): https://www.suse.com/communities/conversations/data-is-customers-gold/

Since somewhat-dated data seems to be popular, here is another:

http://www.youtube.com/watch?v=FegjLbCnoBw

in this talk, David Chinner showed at LCA 2012 that btrfs was rather non-scaling (for the features it shares with existing filesystems, IOW, just vanilla storing).

Mentioning Chinner: Date: Sun, 20 Oct 2013 09:45:27 +0200 Subject: Re: [opensuse-factory] btrfs still fails writing more than 15-TB Message-ID: <alpine.LSU.2.11.1310200924360.4494@nerf07.vanv.qr>

...

- speed: xfs (David Chinner does regular talks/videos with graphs)

-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

15:07

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Tue, Oct 29, 2013 at 11:05 AM, Jan Engelhardt <jengelh@inai.de> wrote:

...

On Tuesday 2013-10-29 14:54, Greg Freemyer wrote:

...
If you are interested in xfs, then watching the video I posted before is time well spent <http://www.youtube.com/watch?v=FegjLbCnoBw>

You don't have to tell me -- I was the one who brought up (even advocated) XFS, the video and D.Chinner early, and on a regular basis ;-)

Well, now you know at least one person listened to your advice! Greg -- Greg Freemyer -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

02:17

New subject: [opensuse-factory] Re: Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2013-10-28 at 18:09 -0400, Greg Freemyer wrote:

...

I don't know about you, but I don't delete directories with 10,000 files in them very often and when I do I can stand to wait an extra 3 seconds. The speed tests certainly don't rule XFS out in my mind even for a singlethreaded test like Carlos did. Then if you watch the video, you see that Carlos tested XFS in it's worst mode. Were it shines is if you have multiple threads sending file i/o at it. For up to 8 threads it basically sees a linear speed bump. Neither ext4 or BtrFS can keep up. (ReiserFS was not part of Dave Chinner's testing).

It did not occur to me to use multiple threads. However, for files bigger than 1 MB, it does not matter: the CPU was almost iddle, and the HD was writing at about the maximum speed the hardware is capable on my machine. Only on smaller files would using several threads be noticed, IMO.

...

fyi: I did not read the test script to confirm all the cache flushes are handled right for the above benchmark to be valid, but it probably doesn't matter. For a normal user, they just want to know how long the command prompt takes to return and don't care if the kernel is still processing the removes in the background.

I only used "sync" at some points. Of course, when one does some tests, you think of ideas on how to do better tests. But one can not have the machine testing full time for days on end, one has to use the machine for other things... :-) To be more accurate, I would have to use one disk for the system and another for the tests. And several services should be stopped, like cron. And things I have not even thought about... - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJvGtkACgkQtTMYHG2NR9WN5wCggnpLA+XitMvWELHCEmesD0Oj KR0An0lmU1fgVJNaQ564bB1IlYmek0Ch =eHYT -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

28 Oct 28 Oct

20:39

New subject: [opensuse-factory] Re: Re: Re: Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, 2013-10-28 at 16:16 -0400, Greg Freemyer wrote:

...

Dave Chinner (xfs devel) would argue that ext2/3/4 is what should be removed from the default list and XFS & btrfs be the only 2 choices.

Wow. Interesting... I'm using ext3 for root, xfs for home and /usr. The theory being that ext3 is, or was, easier to recover the system from disasters. But now as many binaries have migrated to /usr, the logic is not that valid... - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJuy20ACgkQtTMYHG2NR9U4KACcDJ2qoDGeYd7RglX6mD8rim6J /BYAn2jUbPn12v7i99R7gCO9jXA2mJB3 =zj0z -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

27 Oct 27 Oct

18:54

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 08:37 -0400, Greg Freemyer wrote:

...

"Carlos E. R." <> wrote:

...

...
Now, notice the lack of the 2 minute delay on the extra sync, for the ext4 test. I have no explanation for this; perhaps as I had to reboot, the system cache was fully emptied. Maybe, to do these test properly, the system would have to be rebooted on each test :-?

At a minimum you should empty the cache between runs:

# flush dirty pages to disk sync

That I do. At the end of each test there is a flush.

...

# throw clean pages away echo 3 > /proc/sys/vm/drop_caches

That one, no. Next time :-) - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJtYWsACgkQtTMYHG2NR9Ul0gCeKVOCgt184EADFHoSGPPEDizc /xwAoIUl8L6k0k55Rj3zx72cSS99u5Oi =HxqI -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

18:52

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 12:43 +0100, Carlos E. R. wrote:

...

...
Here I repeat the test on real hardware. I have used a script with variations as I find out things; when the difference is significative, I repeated the test. I'll post the script below. I did this on my same main machine, but on a spare 1TB HD. I reuse the same test partition, creating a different filesystem in the script prior to testing it.

I forgot to post the script for reference: #!/bin/bash function test_filesystem_extra() { DEVICE=/dev/disk/by-id/wwn-0x5000c500613c92d5-part6 WHERE=/data/Test TESTHERE=$WHERE/Here SAMPLE=$WHERE/sample # Verify entry value case $1 in ext4) TIPO=ext4 ;; ext3) TIPO=ext3 ;; xfs) TIPO=xfs ;; reiserfs) TIPO=reiserfs ;; btrfs) TIPO=btrfs ;; *) echo "Error de sintaxis" ;; esac # sometimes, umount does not succeed. Insist. sync umount $DEVICE sleep 3 umount $DEVICE HECHO=`mount | grep /data/Test` if [ -z "$HECHO" ]; then echo "Previous filesystem umounted sucesfully." else echo "Error, crash danger: could not umount previous filesystem." exit fi # Create filesystem case $1 in ext4) echo time mke2fs -L Test_ext4 -t ext4 $DEVICE ;; ext3) echo time mke2fs -L Test_ext3 -t ext3 $DEVICE ;; reiserfs) echo time mkreiserfs -q --label Test_reiserfs $DEVICE ;; btrfs) echo time mkfs.btrfs -L Test_btrfs $DEVICE ;; xfs) echo time mkfs.xfs -f -L Test_xfs $DEVICE ;; *) echo "Error de sintaxis" exit ;; esac # mount filesystem mount $DEVICE $WHERE HECHO=`mount | grep /data/Test` if [ -z "$HECHO" ]; then echo "Mount or filesystem creation failed." exit else echo "Filesystem created." df -h /data/Test file --dereference -s $DEVICE fi # preparation of sampe to copy echo echo "Preparing sample" mkdir $TESTHERE head -c$2 /dev/urandom > $SAMPLE sync # start DATE=`date --rfc-3339=seconds` echo "$DATE --- Timing copy test $TIPO with sync ($2B)" # Copy N times the same file with different name, syncing every 'k' # times. ERROR=0 time for i in `seq -w 0 9`; do if [ $ERROR -ne 0 ]; then break fi echo $i for k in `seq -w 0 999`; do cp $SAMPLE $TESTHERE/TT-$i--$k if [ $? -gt 0 ]; then echo "Copy error on TT-$i--$k, abort" ERROR=1 break fi done sync done echo "one more sync." time sync DATE=`date --rfc-3339=seconds` echo "$DATE ---- Timing copy test $TIPO with no sync ($2B)" # Copy N times the same file with different name, without sync. ERROR=0 time for i in `seq -w 0 9`; do if [ $ERROR -ne 0 ]; then break fi echo $i for k in `seq -w 0 999`; do cp $SAMPLE $TESTHERE/TT-$i--$k if [ $? -gt 0 ]; then echo "Copy error on TT-$i--$k, abort" ERROR=1 break fi done done # Finalize. Sometimes this sync takes longer than expected. Reason # unknown. echo "one sync before closing." time sync } function run_one_test() { echo echo "===================================== " echo "| $1 " echo test_filesystem_extra $1 $2 } function run_a_test_series() { echo "*****=====================================***** " echo "**** Running test suite of $1 bytes" echo "*****=====================================***** " run_one_test reiserfs $1 run_one_test ext4 $1 run_one_test ext3 $1 run_one_test xfs $1 run_one_test btrfs $1 } run_a_test_series 10000000 run_a_test_series 1000000 run_a_test_series 100000 run_a_test_series 10000 run_a_test_series 1000 run_a_test_series 100 - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJtYO8ACgkQtTMYHG2NR9WX4gCeLN0Idfeyjv97PAdmImQxiQGi 48UAnRqh6+AloGECe3WdSzwEKx/Mb8oP =9Wo7 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Christian Boltz

12:33

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

Hello, Am Sonntag, 27. Oktober 2013 schrieb Carlos E. R.:

...

The first test is simply one to try to fill the partition to capacity with very small files (100B), written to the same directory This is intentional: it is a kind of load for which reiserfs was designed for, as the goal of these tests is to find a replacement for reiserfs.

...

Filesystem formatting are all defaults, no adjustments of any kind.

With the goal of replacing reiserfs and handling lots of small files, maybe you should do a bit of "tuning" by using options for mkfs.ext3/ext4 to create more inodes? The relevant options for mkfs.ext[34] are: -i bytes-per-inode Specify the bytes/inode ratio. mke2fs creates an inode for every bytes-per-inode bytes of space on the disk. The larger the bytes-per-inode ratio, the fewer inodes will be created. This value generally shouldn't be smaller than the blocksize of the filesystem, since in that case more inodes would be made than can ever be used. [...] or -N number-of-inodes Overrides the default calculation of the number of inodes that should be reserved for the filesystem (which is based on the number of blocks and the bytes-per-inode ratio). This allows the user to specify the number of desired inodes directly. This way, you can get numbers for ext3/ext4 with _all_ of your small files written to disk. Regards, Christian Boltz -- If I had a cent for everytime someone complained about single RPM installation failing with KPackageKit on 11.4, I'd buy Attachmate ;-) [Martin Schlander in opensuse-factory] -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

19:17

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 13:33 +0100, Christian Boltz wrote:

...

Am Sonntag, 27. Oktober 2013 schrieb Carlos E. R.:

...

...
Filesystem formatting are all defaults, no adjustments of any kind.

With the goal of replacing reiserfs and handling lots of small files, maybe you should do a bit of "tuning" by using options for mkfs.ext3/ext4 to create more inodes?

I know of that posibility, yes. But if you adjust for a particular load, it run worse on other loads. So I intentionally did not adjust any filesystem. Reiserfs does automatically, and I guess btrfs does. XFS does somewhat. Notice that I tested files up to 10 MB, although I did not test how many files were allowed till they burst. Lack of time. - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJtZrsACgkQtTMYHG2NR9X0EgCdG7MJy7DZctiPTa7YlhvysmCM /pIAnigPCC+01VjfuxVAxRpMRqvsF4F6 =nUTL -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Greg Freemyer

12:55

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

"Carlos E. R." <robin.listas@telefonica.net> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Content-ID: <alpine.LNX.2.00.1310271152190.30825@Telcontar.valinor>

On Sunday, 2013-10-27 at 11:14 +0100, Carlos E. R. wrote:

...
I'll post other set of results done on real hardware. The virtual machine simply does not have enough CPU power, and the virtual disk is also slow.

Here I repeat the test on real hardware. I have used a script with variations as I find out things; when the difference is significative, I repeated the test. I'll post the script below. I did this on my same main machine, but on a spare 1TB HD. I reuse the same test partition, creating a different filesystem in the script prior to testing it.

It was done on 12.3 system, I do not have available a 13.1 partition on this machine yet. Sorry about that.

The first test is simply one to try to fill the partition to capacity with very small files (100B), written to the same directory This is intentional: it is a kind of load for which reiserfs was designed for, as the goal of these tests is to find a replacement for reiserfs.

Well, reiserfs is still the king on this test, there is no replacement, period.

Filesystem formatting are all defaults, no adjustments of any kind.

Xfs has a speculative preallocation feature that is horrible for your workload in this test. If this workload is something you really care about then you should be willing to tune your mount options. Xfs has a bunch of them: <https://www.kernel.org/doc/Documentation/filesystems/xfs.txt> In particular allocsize=4kb will disable preallocate and only allocate disk pages if you have data to put into it. Note that xfs reclaims unused preallocated space, but it takes time (days/weeks) and your test doesn't allow for that. Thus your benchmark is only useful if it is a workload you actually care about and not just a test. Greg -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

19:21

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-27 at 08:55 -0400, Greg Freemyer wrote:

...

"Carlos E. R." <> wrote:

...

...
Filesystem formatting are all defaults, no adjustments of any kind.

Xfs has a speculative preallocation feature that is horrible for your workload in this test.

If this workload is something you really care about then you should be willing to tune your mount options.

Xfs has a bunch of them: <https://www.kernel.org/doc/Documentation/filesystems/xfs.txt>

In particular allocsize=4kb will disable preallocate and only allocate disk pages if you have data to put into it.

Note that xfs reclaims unused preallocated space, but it takes time (days/weeks) and your test doesn't allow for that. Thus your benchmark is only useful if it is a workload you actually care about and not just a test.

Well, it is something to consider. I do have a news spool, which now is on reiserfs, and will stay that way while possible. If it stops being supported, it will then go to XFS, I guess. Years ago, I timed kernel build on different filesystems, and it was way faster on reiserfs. Thus my /usr/src7 is also reiserfs. IMHO, modern filesystems should adjust automatically to the load, without any adjustements from the user. - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJtZ88ACgkQtTMYHG2NR9UXrQCfWyN2O7Rp7XHtusFtMSmrsfB0 EdcAoIyGr5EBIUC0cAdX3KtEGj63wiDO =+nPo -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

28 Oct 28 Oct

13:00

New subject: [opensuse-factory] Testing many small file write on several filesystems [Was: btrfs still fails writing more than 15-TB]

On Sunday 2013-10-27 20:21, Carlos E. R. wrote:

...

Years ago, I timed kernel build on different filesystems, and it was way faster on reiserfs. Thus my /usr/src7 is also reiserfs.

Used to do timings too. Not intentionally though -- it was more of a byproduct of running bs_worker, and it totally fell over if you had like 8 workers concurrently trying to extract tarballs. (Which is easy because an update in kernel-source triggers multiple rebuilds simultaneously, i586 and x86_64 times all the flavors we have). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Cristian Rodríguez

18 Oct 18 Oct

14:43

El 18/10/13 11:27, Lew Wolfgang escribió:

...

It probably is, but the choice to use it did not present itself in the install partitioner.

Well... it is not there because it is not recommended to create new installations with it.. -- "If debugging is the process of removing bugs, then programming must be the process of putting them in." - Edsger Dijkstra -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lew Wolfgang

15:19

On 10/18/2013 07:43 AM, Cristian Rodríguez wrote:

...

El 18/10/13 11:27, Lew Wolfgang escribió:

...
It probably is, but the choice to use it did not present itself in the install partitioner. Well... it is not there because it is not recommended to create new installations with it..

Why would that be? What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs? Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Per Jessen

20 Oct 20 Oct

11:26

Lew Wolfgang wrote:

...

On 10/18/2013 07:43 AM, Cristian Rodríguez wrote:

...
El 18/10/13 11:27, Lew Wolfgang escribió:

...
It probably is, but the choice to use it did not present itself in the install partitioner. Well... it is not there because it is not recommended to create new installations with it..

Why would that be? What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs?

The recommendation is probably ext4 (the default). -- Per Jessen, Zürich (17.1°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

11:46

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-10-20 13:26, Per Jessen wrote:

...

Lew Wolfgang wrote:

...

...
Why would that be? What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs?

The recommendation is probably ext4 (the default).

It doesn't make much sense that you can format during install as FAT but not Reiserfs. - -- Cheers / Saludos, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlJjwpQACgkQtTMYHG2NR9WWDgCfXjVng3k99mCjqePHFc3ff6BG UqoAn3RwAT691cNAmAdOMPkBornxJvOF =Nhc1 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Per Jessen

14:11

Carlos E. R. wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 2013-10-20 13:26, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
...
Why would that be? What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs?

The recommendation is probably ext4 (the default).

It doesn't make much sense that you can format during install as FAT but not Reiserfs.

Dunno, JFS went out a couple of years ago and that is still actively maintained. -- Per Jessen, Zürich (15.4°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

14:41

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2013-10-20 at 16:11 +0200, Per Jessen wrote:

...

Carlos E. R. wrote:

...

...
...
...
Why would that be? What is the recommendation? FAT? That was one of the installation choices. Is FAT better than reiserfs?

The recommendation is probably ext4 (the default).

It doesn't make much sense that you can format during install as FAT but not Reiserfs.

Dunno, JFS went out a couple of years ago and that is still actively maintained.

Yes, but my meaning was different. You can not install on FAT a Linux system, however it is available as filesystem during install. Why is FAT available and not reiserfs? It makes no sense to me. Some one could try to install on it... - -- Cheers, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJj67YACgkQtTMYHG2NR9UZJwCgjARxa3CCtVCc5jV1YTqBeq6p oQsAnjvnLDc2aYZnnKck5ihYIfw+rz7P =NaxE -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Jan Engelhardt

14:50

On Sunday 2013-10-20 16:41, Carlos E. R. wrote:

...

Yes, but my meaning was different. You can not install on FAT a Linux system, however it is available as filesystem during install. Why is FAT available and not reiserfs? It makes no sense to me. Some one could try to install on it...

To put /boot on FAT. Makes sense? Yes. (The two symlinks in /boot can be practically ignored, they do not seem to be used.) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Lew Wolfgang

15:04

On 10/20/2013 07:50 AM, Jan Engelhardt wrote:

...

On Sunday 2013-10-20 16:41, Carlos E. R. wrote:

...
Yes, but my meaning was different. You can not install on FAT a Linux system, however it is available as filesystem during install. Why is FAT available and not reiserfs? It makes no sense to me. Some one could try to install on it... To put /boot on FAT. Makes sense? Yes. (The two symlinks in /boot can be practically ignored, they do not seem to be used.)

What would the usage case for /boot on FAT be? Would MS Windows ever be interested in looking at /boot? How about installing / on a FAT partition? It would seem that the installer would allow it since it's one of the suggestions. (I didn't try it!) Regards, Lew -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Cristian Rodríguez

15:12

El 20/10/13 12:04, Lew Wolfgang escribió:

...

What would the usage case for /boot on FAT be? Would MS Windows ever be interested in looking at /boot?

You are a little bit out of date.. UEFI bootloaders are only able to read FAT32. -- "If debugging is the process of removing bugs, then programming must be the process of putting them in." - Edsger Dijkstra -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Malcolm

15:13

New subject: [opensuse-factory] Re: btrfs still fails writing more than 15-TB

On Sun 20 Oct 2013 04:41:58 PM CDT, Carlos E. R. wrote:

...

Yes, but my meaning was different. You can not install on FAT a Linux system, however it is available as filesystem during install. Why is FAT available and not reiserfs? It makes no sense to me. Some one could try to install on it...

Hi It's needed for UEFI booting. -- Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890) SLED 11 SP3 (x86_64) GNOME 2.28.0 Kernel 3.0.93-0.8-default up 11:15, 4 users, load average: 0.59, 0.66, 0.67 CPU Intel® B840@1.9GHz | GPU Intel® Sandybridge Mobile -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Carlos E. R.

15:18

New subject: [opensuse-factory] Re: btrfs still fails writing more than 15-TB

On 2013-10-20 17:13, Malcolm wrote:

...

On Sun 20 Oct 2013 04:41:58 PM CDT, Carlos E. R. wrote:

...
Yes, but my meaning was different. You can not install on FAT a Linux system, however it is available as filesystem during install. Why is FAT available and not reiserfs? It makes no sense to me. Some one could try to install on it...

Hi It's needed for UEFI booting.

Ah, yes... but only for that. New users could try to install root on it. -- Cheers / Saludos, Carlos E. R. (from 12.3 x86_64 "Dartmouth" at Telcontar)

Andrey Borzenkov

18 Oct 18 Oct

15:35

В Fri, 18 Oct 2013 15:13:28 +0200 (CEST) Jan Engelhardt <jengelh@inai.de> пишет:

...

...
2. The installer complains about using XFS on the boot partition

Which seems correct. XFS cannot be combined with PBR if you chose to use that. In addition, there is a track record of GRUB1 failing to load the stage2 file (possibly due to delayed allocation).

grub2 includes xfs driver. As long as blocklist install is acceptable (openSUSE does it by default) it may work. I have not tested it myself. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

4072

Age (days ago)

4085

Last active (days ago)

List overview

Download

64 comments

18 participants

participants (18)

Andrey Borzenkov
Bruno Friedmann
Carlos E. R.
Carlos E. R.
Christian Boltz
Claudio Freire
Cristian Rodríguez
David Disseldorp
Greg Freemyer
Jan Engelhardt
Ladislav Slezak
Lars Müller
Lew Wolfgang
Malcolm
Matthias G. Eckermann
Per Jessen
Stefan Seyfried
Yamaban