Anton Aylward wrote:
Linda Walsh said the following on 03/10/2013 02:41 PM:
----- Sorry to bring this up so long after the fact, but I wanted to add some *counter* information to the idea of *decreasing* a reserve factor.
Instead of decreasing it, there are times when you might want to increase it -- it depends on *your priorities*.
If you need space and don't care so much about speed, shrinking is likely the best way to go, but if you want to keep your disks running in their maximum speed range, I've noticed with 1-1.5TB partitions on a 24gB (12x2tB) RAID that free space should be kept around 20-25% and that's on a B-Tree based file system (XFS).
Context please!
You were part of the original discussion -- I remember your name. In that discussion, the general consensus was decreasing it to 3% free space on larger disks was probably fine. I was giving one specific (i.e. context provided) example, yet you go off like I was saying "always do XXYZ".... ?!?!!? did you not read the whole note before thinking of your counterpoints?
I can think of specifics:
* a file system devotes to things that don't change (except for updates) such a the hierarchies /usr/lib, /usr/share/man, /usr/share/icons and many others, so there is no churn and in the limiting case these might be treated as 'read only' file systems. Certainly in a shared environment such as thin workstations that PXE boot[1] or have only a minimal local file system (such as /etc) and NFS mount everything else, administration would have it that everything except the home directory/roving-share
---- How is that a file system that has been up to 90% full and back down again?
* a file system that contains structured data, such as database files, which are very large and whose internals change but whose size does not, so the issue of free space and allocation and the 'churn' of the free list does not matter. (Some FS based databases on RAID systems used by OLTP applications can be affected by the 'small write' problem.)
---- Not my scenario.
* a file system that has a very high 'churn' such as one supporting a development project - editing of source, rcs/subversion, compiling and testing. (Which may also run into the 'small write' problem with some RAID set-ups.)
---- Much closer.
In a high churn situation such as the 3rd case or users home directories, files being created, edit and deleted at a fair clip, then the space allocation is going to try to 'optimise' for something. Some cases might want a complete rewrite of the files (think: using VI) so having ample free space so the file partitions can be 'grouped' is going to improve the performance of not only allocator[2] but also of the file access.
Right, and it sounded like a home user was asking this question. earlier, no? That's not the advice he went off with.
Of course that makes little sense with small files such as the config files that litter /etc - under 4K or whatever your allocator blocks size is set to. Or, right, some B-tree FSs can pack small files into 'tails' of other files.
AFAIK, only ReiserFS has that, and development on that FS doesn't seem to be progressing. EXT4 just added packing content into inodes as XFS has had for 20 years...(first white paper I read about it's design in replacing the EFS file system talked about the decisions that were made -- and the features included and was dated in 1993.
Now, given all that, there is a big rider. Some disk managers such as LVM mean that while a file system is optimized, the blocks that make up the partition (Logical Volume in LVM parlance) may be subject to a scatter-gather. HO HO HO!
---- ?!? I'm pretty sure you are being vague and confusing. If you have your disks allocated to be contiguous and aren't trying to use any 'thin provisioning', which AFAIK, Suse doesn't support anyway, the File system itself isn't moved around on disk. But how does that tie into file system optimization -- and what do you mean by that? I don't know of many file systems that support free space defragmenting. Until the past few year XFS was the only one that supported any type of defragmenting util at all (other than windows)... but it doesn't handle defragging free space.
That's because my max rate in read/write is 1TB/s, so even the smallest delays will create a much more noticeable hit in performance than they would out of 1 disk -- i.e. the computer can do many computations while the I/O controller is doing a 64K read/write on 1 disk, but the time it has to do calculations on a 12 spindle raid is a decimal order of magnitude less. So the cpu needs to search through 12X the data in 1/10th the time to maintain max I/O rates. The block allocator starts having problems finding large contiguous free spaces on file systems over 75% full, if (like me) you've gone up into the low 90's% usage, and back down. Once you've done that, you've got files spread throughout the free area. (I could rebuild the fs from scratch and that would give some more speed, until I let it get too full again).
Apart from that, your reasoning is correct, so long as we are considering the 'high churn rate' type of file systems.
If your HD was the speed of a floppy, using it up 99%, you wouldn't notice a speed hit because the I/O was so slow, but the faster your HD subsystem, the more you'll notice the lag time caused by the block allocator and the buffering system. Many people with RAIDs have noticed a 30% or greater speed boost by avoiding the buffer cache. As long as CPU and memory speeds were staying 100-1000 times faster than disk, alot can be done to optimize I/O in memory, but with cpu speeds going down (to conserve power) and disk speeds going up with moves to SSD's and RAID's, those same algorithms won't work as well at the limits...
Again, there are a lot of unstated assumptions. There are various types of caching to do with a file system. Oh, right, you can say "its all data" but that's unhelpful.
---- There are no unstated assumptions. Those are statistical trends. you bring the number of cpu's up to 256 and the GZ down to 1 as on some new intel machines, then toss in a stack of RAID10 w/30 spindles, you are easily going to be pushing the limits of the machine's allocators... With single SSD's *boasting* 100's of MB/s, how do they hold up in RAIDS -- some of the newer Enterprise SSD's maintain rated speeds without trim (though the rated speeds are not the highest categories).
Yes "a lot can be done to optimise I/O in memory" and many of the algorithms are independent of the file system. Whether things like inode caching and name segment caching are used is one. See http://makarevitch.org/rant/bufchint.html, and note the part on how an application can advise the kernel about its expected FS use.
The day you can get all the apps to cooperate and tell the OS about their projected I/O usage is the day I'll never need to work again (or hell freezes over, something like that...;-))... Appwriters are hard pressed to know themselves how their app will behave in the field under customer load -- let alone, predict it well enough to tell the OS about it...
I note, also, that you've omitted a lot of IO tuning, such as the disk elevator algorithm, queue depth and other matters.
Um... The point was how much free space to leave... not a full
dissertation on all factors affecting disk speed and optimization.
The [home] user was given information that you now say is wrong; i.e. they
had their free space down to 3%, and you said they had plenty of space left...
-------- Original Message --------
Subject: Question re filesystem reserved block percent
Date: Sun, 10 Feb 2013 17:02:44 -0500
From: Anton Aylward
Advice, please . . .
The default filesystem reserved blocks is 5%. IIRC that goes back a long time to when much smaller drives were in use. Is there a formula or rule-of-thumb now for today's large drives/partitions? I have quite a few 100-300GB partitions which I have tuned down to 3%, but it still seems I'm wasting a lot of space. Suggestions?
Run 'df' on the partition that/those filesystem(s) is/are on. Unless you're really s[h]ort of space then you're not wasting space, you've still got plenty to spare. ----------------------------------------
I keep saying Context is Everything and there is no 'one size fits all' solution.
Oh, and "time changes everything". Who knows what BtrFS will be delivering a year from now ...
Yeah... who knows? The claims sound a bit too good to be true... and usually when they sound that way, they are, but occasionally there are diamonds in the rough... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org