On Tue, Jun 11, 2013 at 7:16 AM, Roger Oberholtzer <roger@opq.se> wrote:
Despite being quiet on this, we have not solved the problem. We have:
* Tried other file systems (e.g., ext4) * Tried faster "server-grade" SATA disks. * Tried SATA3 interface as well as SATA2.
The same thing happens. Periodically, write calls are blocking for 4-5 seconds instead of the usual 20-30 msecs.
I have seen one unexpected thing: when running xosview during all this, the MEM usage shows the cache use slowly growing. The machine has 32 GB of RAM. The cache use just grows and grows as file file system is written to. Here is the part I don't get:
* If I close all apps that have a file open on the file system, the cache use remains. * If I run the 'sync(1)' command, the cache use remains. I would have thought that the cache would be freed as there is nothing left to cache. If not immediately, over a decent amount of time. But this is not the case. * Only when I unmount the file system does the cache get freed. Immediately.
Why would the cache grow and grow? Since the delay, when it happens, grows and grows, I get the feeling that this file system cache in RAM is slowly getting bigger and bigger, and each time it needs to be flushed, it takes longer and longer. If the cache is being emptied at some reasonable point, why would it continue to grow? Remember that for each mounted file system there is one process writing to a single file. The disk usage remains 100% constant in terms of what is sent to be written.
Is there some policy or setting that controls how the file system deals with file system cache in RAM? More specifically, is there any way to limit it's size for a file system?
Is there a way to see how much of the RAM cache for a file system is actually containing data waiting to be flushed?
I have seen some reports that using O_SYNC when opening the file makes the write times more even. I guess I could open() a wile with this, and then fdopen() it. fcntl() seems not to support O_SYNC...
Roger, O_SYNC does not bypass the cache, it just flushes continuously, but it is not the same as drop_cache. You need O_DIRECT to bypass the cache. If you want a write buffer and not a cache, why don't you just do that? A very basic attempt would be: - create a named pipe per output file - dd if=named_pipe of=file oflag=direct bs=64K In your program, have it create the named_pipe, then launch dd as required. Hopefully when you close the named_pipe dd will see that and write out the last partial block. When I've actually had to have a dedicated buffer in a real scenario, I used mbuffer instead of dd: http://www.maier-komor.de/mbuffer.html btw, mbuffer is in the opensuse distribution. My use was writing to tape and I wanted to queue up a GB of data before I started sending any of it to the tape, so I was able to have just one invocation of mbuffer, but I think it would work for you as well. If mbuffer, doesn't currently use the O_DIRECT flag in it's open call to the destination file, then you should be easily able to add it or whatever other customizations you need, after all you have the source!. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org