Hello, On Thu 19-01-12 19:01:18, rst@suissimage.ch wrote:
yes we were using ext4 on 11.2 as well. We had no noticeable latencies with 11.2, so I don't know exact values. The restored DB file is 20GB on a machine with 48GB of RAM. OK.
We are already using deadline scheduler. Noop seemed worse but didn't make any qualitative measurements. Cfq was definitely waaay worse.
Interesting what you wrote about some bug with fsync in 2.6.31. If fsync didn't work as it should have then maybe that's why massive write io by one process didn't impact others as much.
Had already planned to try the nobarrier mount option. Glad y'all are recommending it as well. Seeing we have UPS and BUU on Raid Card we should be fine ;-).
Just remounted our filesystems with barrier=0. Didn't help it much if at all. Ok, so one thing less to care about.
So I did some initial blktrace, blkparse, btt runs and boy does that deliver loads of numbers.
Here the first few lines from a btt over the combined trace and the tail from the blktrace:
Total (sda): Reads Queued: 979, 8,332KiB Writes Queued: 60,185, 17,860MiB Read Dispatches: 943, 8,332KiB Write Dispatches: 58,089, 17,860MiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 941, 8,324KiB Writes Completed: 58,089, 17,860MiB Read Merges: 35, 636KiB Write Merges: 2,096, 10,368KiB IO unplugs: 337 Timer unplugs: 0
Throughput (R/W): 142KiB/s / 305,389KiB/s Events (sda): 423,130 entries Skips: 0 forward (0 - 0.0%)
300MB/s write ain't that bad for a 6x300GB 10KRpm SAS Drives RAID10. I am not sure our system was any faster under 11.2/2.6.31. Yeah, 300MB/s looks reasonable. That's 100MB/s per drive. You could maybe do more with good SAS drives but it's definitely not going to be the difference between "not noticeable latency" and "15 second latency". So I don't think latency is caused by a drop in throughput as such.
==================== All Devices ====================
ALL MIN AVG MAX N --------------- ------------- ------------- ------------- -----------
Q2Q 0.000000163 0.000956222 1.197385582 61163 Q2G 0.000000175 0.000348236 0.061476246 1416792 S2G 0.000854097 0.028974235 0.061474444 16992 G2I 0.000000246 0.000002060 0.003133293 1416792 Q2M 0.000000139 0.000000233 0.000007519 51144 I2D 0.000000118 0.006148337 0.048895925 1416792 M2D 0.000001943 0.017204386 0.041711330 51144 D2C 0.000022381 0.078485527 1.721045923 61162 Q2C 0.000023937 0.085357206 1.721609484 61162
==================== Device Overhead ====================
DEV | Q2G G2I Q2M I2D D2C ---------- | --------- --------- --------- --------- --------- ( 8, 0) | 9.4506% 0.0559% 0.0002% 166.8560% 91.9495% ---------- | --------- --------- --------- --------- --------- Overall | 9.4506% 0.0559% 0.0002% 166.8560% 91.9495%
==================== Device Merge Information ====================
DEV | #Q #D Ratio | BLKmin BLKavg BLKmax Total ---------- | -------- -------- ------- | -------- -------- -------- -------- ( 8, 0) | 1416768 1416768 1.0 | 8 605 640 857713920
What worries/puzzles me here is the Device Merge Ratio of 1.0...
If that means what I fear it means then that might be the cause. Now about fixing that.. Maybe some buffers, queues having wrong values? I don't think that's a problem. Average write request has 302 KB which isn't bad (512 KB is maximum) and throughput isn't bad. If we had too small requests throughput would suffer.
Other numbers look pretty normal as well. So on average we are doing well. It's just that fsync takes longer than it used to.
in the meantime, this first update. Will wait with creating a bug report and uploading the blktrace as that one is 22MB of data. And maybe above merge ratio already is the cause or a good enough pointer towards the cause. One more question - can you run 'echo w >/proc/sysrq-trigger' at the moment fsync is hanging, then take output of 'dmesg' and add it to the bug as well. We should know what exactly is fsync waiting on from that. If you cannot easily detect when fsync is hanging, just sample /proc/<pid>/stack of the process whose fsync sometimes hangs and also of flush-8:0 and jbd2/sda process every second or so.
Honza
--
Jan Kara