Re: [opensuse] experiences with bache / logic of caching

14 Feb 2016

      On 02/13/2016 07:51 PM, Olav Reinert wrote:
...
Because the cache is small, it only serves to speed up random-access
reads of the active working set, and to speed up most writes (writeback
cache mode).
The net effect is that, even though all data (including user home
directories) is stored on a pair of spinning disks, the cache manages
to hide that fact well enough to make the machine feel very snappy in
use - it's close enough to a real SSD for my needs.
Sorry, I can't give any concrete numbers to back up my experiences.
Its worth considering some aspects of the logic of caching.

What might be an extreme case is one I saw in the last century and
discussed at the North York UNIX Users group, back in the days when SCO
UNIX was the dominant system for SMB and personal as an alternative to
Microsoft, certainly if one wanted to support a multi-user environment
or run a "proper" database system.

One SMB manager described the use of a disc accelerator board, which was
really a cache.  It too had write-though.  The write-though basically
released the system call immediately and wrote the block when convenient.

Now historically, one of the changes from classical V6 UNIX to classical
V7 UNIX back in the 1970s was that the file system changed the order of
the writes to preserve the structural integrity.  Data was written to
the disk before the metadata and pointers.  I won't go into why that
ensured that FSCK could work properly, google for that yourself.  But it
is important that it was done that way.    The use of this board could
not guarantee that.  A CompSci type from UofT warned of this.  A couple
of months later the SMB guy reported corrupted and unrecoverable disk.
There was a "I told you so" feeling.

Committing to disk, and committing in the right order is important.

A write cache that 'accelerates' by releases the system call while
deferring the actual transfer from the cache to storage simply cannot
guarantee the integrity of the file system in the event of, for example,
a power loss.  Certainly a SSD, which will retain content, can, if the
supporting software is there to do the job, ameliorate this.  It gets
back to the issue of the window/timing.  But there are a few too many
'ifs" in this to inspire confidence in an old cynic like me.

Caching the read is also something that should be looked at critically.
 All in all the internal algorithms for memory allocation and mapping
can probably make better use of memory than if it is externalized such
as the memory on those accelerator boards.
This is true for a number of reason but most of all that the OS knows
better what it needs to cache based on the process activity.

In the case of "virtual memory" more memory means the page-out happens
less, retrieval from the page-out queue is in effect a cache operation.
 Late model *NIX uses memory mapping for just about all open files, or
can.  Dynamic linking, along with shared libraries, are opened memory
mapped files with a load-on-demand.  Such pages may be on the out queue
but they never need to be re-written.

Adding RAM is often the cheapest way to improve performance with a *NIX
system that can make use of it.  Again there are a few "if" in that.
This may not be your bottleneck.  Profiling is important.  You may also
come up against hardware limits.  Many PC motherboards have too few
memory slots or their chip set won't allow the use of larger capacity
memory cards.   But before you spend >$50 on a SSD
http://www.amazon.com/SanDisk-6-0GB-2-5-Inch-Height-SDSSDP-064G-G25/dp/B007ZWLRSU/ref=sr_1_3?ie=UTF8&qid=1455464006&sr=8-3&keywords=64g+ssd
consider spending the same on RAM
http://www.amazon.com/HP-500662-B21-SDRAM-Server-Memory/dp/B0029L0P9Y/ref=sr_1_4?s=pc&ie=UTF8&qid=1455464146&sr=1-4&keywords=ddr3+ram

I'm sure you can find cheaper SSD and more expensive RAM.  Speed,
capacity, scale, all factor in.

Data pages, mapped files, are a yes-no-maybe situation.  A program might
need to read config files at start-up, but those are never read again
and never written.  Caching them is a waste of time, but how do you tell
the cache that?

The interface to the 'rotating rust' is also a cache in read mode, in
that it reads a whole track at once.  File system abstractions don't
always permit this to be useful.  Its all very well to have 4K pages and
store files as contiguous blocks, but in reality a file access involves
metatadata access as well, and we're talking about a multi-user system
with asynchronous demands.   Sometime I think that the design of file
system such as ReiserFS where small files can have the data in the
metadata block make sense, especially when dealing with small start-up
config files.  Anyway, we are back to the situation where only the
application knows that this "data" file is a "read-once then discard"
start-up config.  even the OS doesn't know there is no need to cache the
mapped pages.  They don't even need to be put on the out queue, they can
be marked 'free' the moment the file is closed.  Oh?  Perhaps they are?
 But how does the OS communicate this to the SSD bcache or the disk
controller buffer?

The discipline of the hierarchy of organization and standardization of
interfaces and separation of function may have led to many good things,
but the old day of the IBM1040 and the like where the application had
control all the way down the stack to the disk had a lot of optimization
that is now simply not possible because the detailed information isn't
available from the higher levels all the way down.

-- 
         A: Yes.
     >   Q: Are you sure?
     >>  A: Because it reverses the logical flow of conversation.
     >>> Q: Why is top posting frowned upon?

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org

Re: [opensuse] experiences with bache / logic of caching

Anton Aylward