Re: [opensuse-kernel] after upgrade from 11.2 to 12.1: disk io hog/starvation issue on HW Raid ext4

18 Jan 2012

      ...
I've been directed here by the opensuse forums about a problem we are
having with our server since we upgraded from opensuse 11.2 (kernel
2.6.31 I believe) to 12.1.
The problem is that one process can hog all disk io and starves others.
For example progress database restore of multi GB DB  starves all others
for example mysqld.  We see latencies on fsync for mysqld of 15s + with
cfq block io scheduler. Still 5s+ with deadline block io scheduler and
read_expire reduced to 20ms.
  OK, I presume you used ext4 in both 11.2 and 12.1, didn't you? Also what
were fsync latencies with 11.2? And what is the size of restored file (in
...
Been unable to reduce latency for other processes any further.
Our guess to the culprit is that the improvement that was made in 2.6.37
for smp ext4 block io throughput (300-400% according to Linux 2 6 37 -
Linux Kernel Newbies ) has made it possible for one process to be that
fast and created this starvation problem.
  I don't think that change was the reason (if you mean commit bd2d0210).
The claimed throughput improvement can be observed only for big number of
...
Or maybe some kernel bug.
Anybody have any pointers about how to reign in disk-io hogs in 3.1?
Some info about the Server: 
Dell T710 with 2 Xeon 6 core procs, 48GB Memory. 6x300GB Disks in RAID10
on a H700 Raid Controller.
  If the server has UPS so you are certain power cannot just abruptly fail,
you can mount the filesystem with nobarrier mount option. That will
...
We didn't mess with many default Suse Kernel values. Except swapiness,
default blocksize of Tape Driver, Max Semaphore and Shared Memory Segment
Values ( /proc/sys/kernel/shmmax shmmni shmall). And of course the
ioscheduler as deadline scheduler makes the system less unusable...
I'll gladly provide any other info y'all might need to help us improve
this starvation issue.
  If you cannot use nobarrier or it does not help. You can use 'blktrace' to
record what's going on in the IO scheduler while fsync is hanging. I'm not
sure how reproducible big fsync latencies are but from your report it seems
Hello,

On Wed 18-01-12 10:32:58, rst@suissimage.ch wrote:
particular in comparison with amount of memory)?

threads (in buffer layer they contend more for locks) but that does not seem
to be your problem. So I'd rather suspect changes in fsync() handling (we
send disk cache flush more often and force transaction commit more often in
3.1 kernel - 2.6.31 kernel had bugs and didn't propely assure all data is
on disk after fsync) or maybe some changes in writeback code. 

probably speed up your IO.

they are rather common. So just start:
  blktrace -d <device>
and run DB restore to trigger big latencies and after some long fsync
occurs stop blktrace, pack resulting files and attach them to a bugzilla
you create for this ;) Feel free to assign it to me (jack@suse.com) so that
it does not get missed.

								Honza
-- 
Jan Kara 
SUSE Labs, CR
-- 
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org