On Tuesday 2023-05-23 09:18, Andrei Borzenkov wrote:
Date: Tue, 23 May 2023 09:18:19 From: Andrei Borzenkov <arvidjaar@gmail.com> To: users@lists.opensuse.org Subject: Re: "background idle" syncing of filesystems
On Tue, May 23, 2023 at 8:53 AM Paul Neuwirth <mail@paul-neuwirth.nl> wrote:
On Tuesday 2023-05-23 06:35, Andrei Borzenkov wrote:
Date: Tue, 23 May 2023 06:35:45 From: Andrei Borzenkov <arvidjaar@gmail.com> To: users@lists.opensuse.org Subject: Re: "background idle" syncing of filesystems
On 23.05.2023 07:02, Paul Neuwirth via openSUSE Users wrote:
and I could recreate the issue, after 6 days uptime with high I/O.
from /proc/vmstat: nr_dirty 571064
That is about 22GiB (4K pages)
nr_dirty_threshold 24894434 nr_dirty_background_threshold 12432019
from top: MiB Mem : 515896.2+total, 6047.250 free, 15834.02+used, 501647.4+buff/cache
and finally: # date; sync; date Tue May 23 05:49:06 AM CEST 2023 Tue May 23 05:59:44 AM CEST 2023
-> syncing took > 10 minutes.
This is appr. 32MiB/s. Assuming small random IO (which is also duplicated due
Sorry, I miscalculated. It is 3.5MiB/s. It sounds small, but the usual sustained random IO rate for SATA disks is around 50 - 70 IOPS (IO operations per second), so it depends on IO size.
maybe this represents my real problem. I triggered a sync and watched iostat. I/O Rate is around 1.2MB/s, between 40 and 46 tps (transactions per second?) Harddisks are SAS 2TB (SPL-3), IBM-XIV, ST2000NM0023 Controller 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2008 [Falcon] (rev 03) SMART data is fine, except for elevated temperature for one HDD (60°C) is that really what to expect? btw. I/O wait increased a lot during sync (which just finished after > 20 minutes). Some pplications got totally inresponsive, X froze for several minutes, load average went from ~1 to 35 (having 32 cpu). not nice :-/
to mirroring) I would not call it something extraordinary with two SATA disks. If this data is on btrfs, you have additional metadata which adds to IO randomness (and the amount of metadata could be quite large as well).
So it is quite normal for your workload. The only way to mitigate it is to lower thresholds to force syncing of dirty pages earlier.
What you observe is the typical buffer bloat. If you let data to accumulate you need to wait for data to drain.
thanks for figuring this out. As part of my original post, would lowering /proc/sys/vm/dirty_background_bytes or /proc/sys/vm/dirty_background_ratio affect system performance noticeably?
Define "system performance". Yes, lowering the dirty background threshold may help to reduce sync time during shutdown. But you may suddenly find out that now drives are kept more busy and it impacts normal application performance.
Paul Neuwirth PaNe Foto Paul Neuwirth Postfach 45 04 54 80904 MüNCHEN DEUTSCHLAND Fax: +49 89 35819624 https://www.swabian.net/ UST-IdNr. (VAT): DE314867715