So was thinking more about this. In the end I've decided I want to verify more that I understand correctly what's going on and added some more trace points to inform about why transaction starts are being blocked, how big are transaction handles and how come resulting commits are so small in dioread_nolock mode. That revealed that there are usually only ~1500 reserved credits (out of 64k total credits in a transaction) which highlighted even more that the theory about reserved credits causing premature transaction commits was not correct and there must be something else going on - this amount of reserved credits could cause a regression of a few percent but not really 20%. After some more debugging I've found out that when we reserve transaction handle but then don't use it, we do not properly return reserved credits (we remove them from the reserved amount but we forgot to remove them also from the total number of credits tracked in a transaction). This results in transaction having lots of leaked credits that then result it forcing transaction commit early because we think the transaction is full (although it is not in the end). Fixing this leak also fixes the fsmark performance for me: fsmark fsmark lock nolock-fixunrsv Min 1-files/sec 46974.80 ( 0.00%) 47322.70 ( 0.74%) 1st-qrtle 1-files/sec 49614.60 ( 0.00%) 49663.50 ( 0.10%) 2nd-qrtle 1-files/sec 48644.50 ( 0.00%) 49259.20 ( 1.26%) 3rd-qrtle 1-files/sec 47583.90 ( 0.00%) 47966.80 ( 0.80%) Max-1 1-files/sec 50754.30 ( 0.00%) 51919.40 ( 2.30%) Max-5 1-files/sec 50754.30 ( 0.00%) 51919.40 ( 2.30%) Max-10 1-files/sec 50754.30 ( 0.00%) 51919.40 ( 2.30%) Max-90 1-files/sec 47356.50 ( 0.00%) 47473.90 ( 0.25%) Max-95 1-files/sec 47356.50 ( 0.00%) 47473.90 ( 0.25%) Max-99 1-files/sec 47356.50 ( 0.00%) 47473.90 ( 0.25%) Max 1-files/sec 50754.30 ( 0.00%) 51919.40 ( 2.30%) Hmean 1-files/sec 48540.95 ( 0.00%) 48884.32 ( 0.71%) Then there was another revelation for me that in this workload ext4 actually starts lots of reserved transaction handles that are unused. This is due to the way how ext4 writepages code works - it starts a transaction, then inspects page cache and writes one extent if found. Then starts again a transaction and checks whether there's more to write. So for single extent files we always start transaction twice, second time only to find there's nothing more to write. This probably also deserves to be fixed but a simple fix I made seems to break page writeback so I need to dig more into it and it doesn't seem to be a pressing issue. I'll push the jbd2 fix upstream.