
Hello, On Wed 18-01-12 10:32:58, rst@suissimage.ch wrote:
I've been directed here by the opensuse forums about a problem we are having with our server since we upgraded from opensuse 11.2 (kernel 2.6.31 I believe) to 12.1.
The problem is that one process can hog all disk io and starves others. For example progress database restore of multi GB DB starves all others for example mysqld. We see latencies on fsync for mysqld of 15s + with cfq block io scheduler. Still 5s+ with deadline block io scheduler and read_expire reduced to 20ms. OK, I presume you used ext4 in both 11.2 and 12.1, didn't you? Also what were fsync latencies with 11.2? And what is the size of restored file (in particular in comparison with amount of memory)?
Been unable to reduce latency for other processes any further.
Our guess to the culprit is that the improvement that was made in 2.6.37 for smp ext4 block io throughput (300-400% according to Linux 2 6 37 - Linux Kernel Newbies ) has made it possible for one process to be that fast and created this starvation problem. I don't think that change was the reason (if you mean commit bd2d0210). The claimed throughput improvement can be observed only for big number of threads (in buffer layer they contend more for locks) but that does not seem to be your problem. So I'd rather suspect changes in fsync() handling (we send disk cache flush more often and force transaction commit more often in 3.1 kernel - 2.6.31 kernel had bugs and didn't propely assure all data is on disk after fsync) or maybe some changes in writeback code.
Or maybe some kernel bug.
Anybody have any pointers about how to reign in disk-io hogs in 3.1?
Some info about the Server: Dell T710 with 2 Xeon 6 core procs, 48GB Memory. 6x300GB Disks in RAID10 on a H700 Raid Controller. If the server has UPS so you are certain power cannot just abruptly fail, you can mount the filesystem with nobarrier mount option. That will probably speed up your IO.
We didn't mess with many default Suse Kernel values. Except swapiness, default blocksize of Tape Driver, Max Semaphore and Shared Memory Segment Values ( /proc/sys/kernel/shmmax shmmni shmall). And of course the ioscheduler as deadline scheduler makes the system less unusable...
I'll gladly provide any other info y'all might need to help us improve this starvation issue. If you cannot use nobarrier or it does not help. You can use 'blktrace' to record what's going on in the IO scheduler while fsync is hanging. I'm not sure how reproducible big fsync latencies are but from your report it seems they are rather common. So just start: blktrace -d <device> and run DB restore to trigger big latencies and after some long fsync occurs stop blktrace, pack resulting files and attach them to a bugzilla you create for this ;) Feel free to assign it to me (jack@suse.com) so that it does not get missed.
Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org