http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c17
--- Comment #17 from Nikanth K 2010-02-17 11:42:59 UTC ---
See
http://oss.sgi.com/bugzilla/show_bug.cgi?id=860#c12
http://oss.sgi.com/bugzilla/show_bug.cgi?id=860#c13
-------------------------------------------------
Info copied from the above links:
Comment #12 From Dave Chinner 2010-02-16 17:16:10 CST (-) [reply] -------
Nikanth,
Nice analysis. I think you're on the right track, but it looks to me that the
problem is more complex than you've outlined even though the solution is likely
to be the same.
Log IO (which are the barrier IOs in question) in XFS is completed through the
xfslogd, not the xfsdatad, hence the pdflush threads that are blocked in log
forces are not blocked behind the xfsdatad that is waiting on an IO lock.
Because of these two independent IO completion paths in XFS this lock interplay
is not usually a problem - the xfsdatad blocking on an inode lock does not hold
up log IO completion. Hence the barrier IO will complete, the log force
completes, the inode is unlocked when the transaction completes and then the
data IO can complete.
The problem is that the DM loop barrier implementation has to wait for data IO
to complete as well as the barrier. i.e. instead of there being two separate
completion channels that can block independently, DM loop barriers require both
data IO and log (barrier) IO from the underlying filesystem to complete through
the one channel (dm_flush) before being split back into two again. IOWs, XFS
can't block data IO completion on the lower filesystem while the upper
filesystem waits for log IO to complete because dm_flush() needs the data IO to
complete as well. Without barriers, this dependency between the upper and lower
filesystems does not exist, hence the problem not existing previously.
IOWs, this isn't so much a bug but a reflection on the fact that a new feature
(DM loop barriers) has introduced an implicit IO completion order dependency
that never existed before.
As you suggested this could be fixed by adding per-filesystem xfsdatad threads,
but that is not an option because XFS is used on very large systems (e.g. 2048p
machines) and they often have tens of XFS filesystems mounted. i.e. it is not
feasible to have XFS create tens of thousands of threads on such machines.
Prioritising upper vs lower filesystem IO completion is not really practical,
either, because each filesystem has no context of what dependencies it might
have on other filesystems.
However, we can avoid blocking the xfsdatad on inode locks by using try-lock
semantics and requeuing the IO completion if we'd block on the inode lock. This
should avoid completion order dependent deadlocks like this one. I'll attach a
patch in a few minutes after some sanity testing.
------- Comment #13 From Dave Chinner 2010-02-16 18:21:23 CST (-) [reply]
-------
Created an attachment (id=288)
--> (http://bugzilla.novell.com/attachment.cgi?id=288) [details]
non-blocking file size updates during io completion
This patch should fix the problem being seen with barrier flushes. Can you
please try it?
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.