[opensuse] Re: file and network IO vs/ buffer size --- don't use small buffs unless you want users to suffer.

23 May 2018

      Carlos E. R. wrote:
...
On 2018-05-21 22:27, Linda Walsh wrote:
...
Carlos E. R. wrote:
...
Try playing with options such as "oflag=direct",

I'll second this part, but seriously, 4k at a time??  ----
Do you have to write such small amounts?
...
dd if=/dev/zero of=foo bs=4k count=1K oflag=direct
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0689802 s, 60.8 MB/s  <<4k
blocksize
...
...
dd if=/dev/zero of=foo bs=4M count=1K oflag=direct 
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.24457 s, 819 MB/s <<4M
blocksize
...
...
dd if=/dev/zero of=foo bs=8M count=512 oflag=direct
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.04259 s, 852 MB/  <<8M
dd if=/dev/zero of=foo bs=16M count=256 oflag=direct 4294967296 bytes
(4.3 GB, 4.0 GiB) copied, 4.90653 s, 875 MB/  <<16M
16M is the sweet stop on my system. Yours may vary.
Well, with a small block and direct writing to disk, the kernel cache is
disabled and speed suffers. Increasing the size of the write block acts
like having a cache, but in the application instead than by the kernel.

Not exactly.  Increasing the write size decreases *overhead*
just like sending packets through the network.  If you send 1 packet 
of 1.5kB and wait for it to be transmitted & received by the other end, you
will get very slow performance due to the overhead of sending each
packet.  Vs. if you have 1 write and only need an acknowledgment of
the whole thing having been received, you only need to wait for 1
reply.  Whether you are writing to disk or to a network, the overhead
of handling each packet reduces throughput.

	It depends on how fast the user's application generates
data.  It generates video in real time and can't be paused.  If it
only needs 2.8MB/s, any of these methods would work, but if it needed 
100 times that, then writing 4k blocks makes no sense and wouldn't work
even with oflag=nocache.  Nocache tells the OS that it can throw away
the data -- it doesn't force it to be thrown away.  In writing a
428GB file (then my disk filled), all of memory was filled long before
it filled the disk and overall, only averaged 145MB/s. 

	Compare that to 875MB/s when it used no OS caching.
Using synchronous I/O that does force the memory to be released at 
each write reduced speed to 22MB/s (with 4k blocks).

	In all of the cases, there is no reason to use a 4K
I/O size, which with a RAID may result in sub-optimal I/O.  With
RAID5 or RAID6 based RAID, the results could be abysmal.  

	Even over a local network, a 4K I/O size can result in
less than 10% optimal bandwidth usage.  My network Samba
IO test shows this -- this test only shows network speed,
reading from /dev/zero locally and writing to /dev/null on the
far end, so no file-i/o buffering is being used.

   with a blocksize of 4k:
...
bs=4096 bin/iotest  
Using bs=4.0K, count=524288, iosize=2.0G
R:2147483648 bytes (2.0GB) copied, 93.7515 s, 21.8MB/s
W:2147483648 bytes (2.0GB) copied, 92.1664 s, 22.2MB/s
(vs it's default 16M I/O size):
...
bin/iotest 
Using bs=16.0M, count=128, iosize=2.0G
R:2147483648 bytes (2.0GB) copied, 3.23306 s, 633MB/s
W:2147483648 bytes (2.0GB) copied, 7.37567 s, 278MB/s
The thing that hurts you the most with small block sizes
is the per-block overhead.
...
...
As for your tests:
# time dd if=/dev/zero of=file.txt count=2096576 bs=4096
8587575296 bytes (8.6 GB, 8.0 GiB) copied, 42.8592 s, 200 MB/s
# time dd if=/dev/zero of=file.txt count=1096576 bs=4096
4491575296 bytes (4.5 GB, 4.2 GiB) copied, 2.69905 s, 1.7 GB/s
---
those are no good -- you are writing to ram which eventually gets
full and needs to flush to disk.  It's the pause when flushing to disk
that is killing you.
use direct and it won't buffer into memory first.
Not exactly,
---
	Um...yes.  EXACTLY.  IF you use direct, it won't 
buffer it into file-buffer memory first.
...
because direct disables the cache and you see how that
impacts small file writing

Ok...it impacts small file writes, but how does that support
your disagreeing with the statement that 
direct I/O turns off buffering?

Besides, if you are writing video data to disk as fast as possible,
a say standard HD 1920x1080p 3840 x 2160 * 4bytes/dot - 8.3mega-pixels/frame
* 60 frames, uncompressed, it would take 475MB/s.

There is no way to do that with 4k writes and it wouldn't
make any sense.  Writing whatever size is optimal for your disks
would make sense.  On my 10+yr old setup, that's a 16M I/O size.
...
Or, instead of "direct", try "nocache", which uses the cache, then
empties it, which thus forces writing to disk.

On a uniprocessor machine would like not make so much
difference vs. using 'sync':
...
dd if=/dev/zero of=foo bs=4k count=1k oflag=sync
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.172128 s, 24.4 MB/s
But on a multi-cpu machine, nocache allows the cache to be 
released in background:
...
dd if=/dev/zero of=foo bs=4k count=1k oflag=nocache
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0348474 s, 120 MB/s
...
Or, write a file 10 times bigger than the ram. There may be a 10% error
in the measurement.
The code of the application can call directly a flush of each file when
they are closed. I do not know how to emulate the flags that dd can use:
direct, dsync, nocache... If I were the developer of that code I would
try to find out and experiment. Maybe a flush for every file impacts a lot.
---
	Depends on filesize.
What's more important is keeping up with your data rate and having 
HW that can handle it.
...
From the above real-time, uncompresed 4K video would 
take about 1.85GB (1898MB) /.  That would be a large RAID, maybe
with SSD's.
Small file writes kill performance.

Tbird and FF use 4K i/o on everything -- IMAP, xfer to sendmail,
local I/O and they all get doggy performance on large files.
(those are 32-bit versions, BTW -- dunno what 64-bit versions
so).

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org

[opensuse] Re: file and network IO vs/ buffer size --- don't use small buffs unless you want users to suffer.

L A Walsh