https://bugzilla.novell.com/show_bug.cgi?id=836107
https://bugzilla.novell.com/show_bug.cgi?id=836107#c13
--- Comment #13 from Lidong Zhong
2013-08-21T18:21:57.154662-05:00 opensuse1 dlm_controld[1998]: 410 fence request 1084752300 pid 3674 nodedown time 1377127317 fence_all dlm_stonith
but fails because of no actor
Here is how dlm_controld works when a node becomes down. The other two nodes will record the alive member nodeid into their fence actors. And after a fence_request is done, it will send the fence results to other nodes. In function receive_fence_result, it will clear the nodeid from the fence actors if it fenced successfully. So all the nodeids in fence actors will be cleared.
2013-08-21T18:21:58.158838-05:00 opensuse1 dlm_controld[1998]: 411 fence request 1084752300 no actor
The actor returned from get_fence_actor here is just used for checking whether this is the local node itself. So it could send a fence request. So I guess this log is normal logic.
On node recovery, it says the recovered node needs fencing 2013-08-21T18:22:28.326046-05:00 opensuse1 dlm_controld[1998]: 442 daemon joined 1084752300 needs fencing
When the node is up, the need_fencing flag is still set because it was once lost.However this flag will cleared when the node is in CLEAN state which it will be. It doesn't really initiate a fence request.
And finally initiates recovery on rejoin.
Goldwyn, I could see the ocfs2 recovery log a few days ago, but there isn't any more today. Have you done some change to ocfs2? In all, it seems like that the dlm works normally from my point of view during the fence. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.