On Tue, 2013-06-11 at 12:14 -0400, Greg Freemyer wrote:
On Tue, Jun 11, 2013 at 7:16 AM, Roger Oberholtzer <roger@opq.se> wrote:
Despite being quiet on this, we have not solved the problem. We have:
* Tried other file systems (e.g., ext4) * Tried faster "server-grade" SATA disks. * Tried SATA3 interface as well as SATA2.
The same thing happens. Periodically, write calls are blocking for 4-5 seconds instead of the usual 20-30 msecs.
I have seen one unexpected thing: when running xosview during all this, the MEM usage shows the cache use slowly growing. The machine has 32 GB of RAM. The cache use just grows and grows as file file system is written to. Here is the part I don't get:
* If I close all apps that have a file open on the file system, the cache use remains. * If I run the 'sync(1)' command, the cache use remains. I would have thought that the cache would be freed as there is nothing left to cache. If not immediately, over a decent amount of time. But this is not the case. * Only when I unmount the file system does the cache get freed. Immediately.
Why would the cache grow and grow? Since the delay, when it happens, grows and grows, I get the feeling that this file system cache in RAM is slowly getting bigger and bigger, and each time it needs to be flushed, it takes longer and longer. If the cache is being emptied at some reasonable point, why would it continue to grow? Remember that for each mounted file system there is one process writing to a single file. The disk usage remains 100% constant in terms of what is sent to be written.
Is there some policy or setting that controls how the file system deals with file system cache in RAM? More specifically, is there any way to limit it's size for a file system?
Is there a way to see how much of the RAM cache for a file system is actually containing data waiting to be flushed?
I have seen some reports that using O_SYNC when opening the file makes the write times more even. I guess I could open() a wile with this, and then fdopen() it. fcntl() seems not to support O_SYNC...
Roger,
O_SYNC does not bypass the cache, it just flushes continuously, but it is not the same as drop_cache. You need O_DIRECT to bypass the cache.
If you want a write buffer and not a cache, why don't you just do that? A very basic attempt would be:
I think everyone is misunderstanding the situation. I am not doing anything with or expecting or manipulating a cache. The cache I see is a totally private thing being done by the OS. The existence of the cache is not the problem. In fact, if there was no cache I would think something was wrong. The problem I am seeing is that the cache is growing and growing to eat all my memory. In addition, as the cache grows, the periodic writes to disk take longer and longer. 100% reproducible. To be clear: I do not ask for, manipulate or in any other way influence the cache through any direct action in my application. I am only writing a single file by a single process. This file is growing at 25 MB a second (more or less). The file is opened with fopen, written to, and then closed with fclose(). Files can be big, but never more than 2GB each. My initial thought was that the file system was doing something that led to the longer write delays. So I asked about XFS, which is the file system we use for this. As I later reported, it seems the issue exists for all block devices (ext4, but not /dev/null as the file). I understand that the cache is there so I can possibly read data that has been recently written. However, I do not see how the kernel can just grow this cache until my memory is gone. Especially if the bigger cache also results in significant and increasingly longer delays in write completions. The workaround that seems to correct the situation is to run this: while [ 1 ] do echo 1 > /proc/sys/vm/drop_caches sleep 60 done & Obviously a brute-force approach that is really only possible on my system as it does not seem to mess up general usage. The rate of 60 seconds is arbitrary. But each time this is run, the cache has grown to almost 3 GB. I wrote a small app that simulates the problem. I will verify that it really does do so and then can post the C source (very tiny) if anyone wants to see what happens on their system.
- create a named pipe per output file - dd if=named_pipe of=file oflag=direct bs=64K
This is an interesting approach to getting direct I/O. I will have to file this for future reference. Yours sincerely, Roger Oberholtzer Ramböll RST / Systems Office: Int +46 10-615 60 20 Mobile: Int +46 70-815 1696 roger.oberholtzer@ramboll.se ________________________________________ Ramböll Sverige AB Krukmakargatan 21 P.O. Box 17009 SE-104 62 Stockholm, Sweden www.rambollrst.se -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org