[Bug 838705] New: mkfs.ocfs2/tunefs.ocfs2 hangs on dlm_unlock
https://bugzilla.novell.com/show_bug.cgi?id=838705 https://bugzilla.novell.com/show_bug.cgi?id=838705#c0 Summary: mkfs.ocfs2/tunefs.ocfs2 hangs on dlm_unlock Classification: openSUSE Product: openSUSE Factory Version: 13.1 Milestone 4 Platform: x86-64 OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: High Availability AssignedTo: lzhong@suse.com ReportedBy: rgoldwyn@suse.com QAContact: qa-bugs@suse.de CC: lmb@suse.com, ygao@suse.com Found By: Development Blocker: --- While dealing with existing partitions mkfs.ocfs2/tunefs.ocfs2 hang on dlm_unlock. Surprisingly, it is dlm_unlock which hangs rather than dlm_lock. I have tried producing this outside of the ocfs2-tools, but have failed to narrow it down as yet. In order to reproduce this bug, start the cluster services and run mkfs.ocfs2 twice. It will hang on the second instance (if you started with a zeroed device) or on the first one if it was an existing ocfs2 partition. Explanation of procedure: The tools creates a lockspace, and locks and unlocks each journal file on the ocfs2 partition to ensure the device is not mounted on another partition. It hangs on the first unlock. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c1
Lars Marowsky-Bree
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c2
--- Comment #2 from Goldwyn Rodrigues
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c3
--- Comment #3 from Goldwyn Rodrigues
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c4
--- Comment #4 from Goldwyn Rodrigues
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c5
--- Comment #5 from Goldwyn Rodrigues
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c6
--- Comment #6 from Goldwyn Rodrigues
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c7
--- Comment #7 from Lidong Zhong
mkfs.ocfs2 1.8.2 Cluster stack: pcmk Cluster name: mycluster Stack Flags: 0x0 NOTE: Feature extended slot map may be enabled Overwriting existing ocfs2 partition. ^C Program received signal SIGINT, Interrupt. 0x00007ffff6797924 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ffff6797924 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ffff77ccc04 in sync_write_v6 () from /usr/lib64/libdlm.so.3 #2 0x00007ffff77cdada in dlm_ls_unlock () from /usr/lib64/libdlm.so.3 #3 0x000000000042533b in o2dlm_unlock_lock_res_fsdlm (ctxt=0x642010, lockres=0x63e700) at o2dlm.c:923
Seems strange here. The function on line 923 in o2dlm.c is fsdlm_ls_unlock_wait(). It should be dlm_ls_unlock_wait from libdlm_lt.so.3. I don't know why the stack here shows it is dlm_ls_unlock () from /usr/lib64/libdlm.so.3. The code I checked is the latest branch from network:ha-clustering:Factory. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c8
--- Comment #8 from Lidong Zhong
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c9
--- Comment #9 from Goldwyn Rodrigues
Seems strange here. The function on line 923 in o2dlm.c is fsdlm_ls_unlock_wait(). It should be dlm_ls_unlock_wait from libdlm_lt.so.3. I don't know why the stack here shows it is dlm_ls_unlock () from /usr/lib64/libdlm.so.3. The code I checked is the latest branch from network:ha-clustering:Factory.
fs_dlm_ls_unlock maps to dlm_ls_unlock because the library is opened dynamically using dlopen(). Check the code next to dlopen() -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c10
Goldwyn Rodrigues
Hi Goldwyn, I couldn't reproduce this in your test machines. I booted the machine with kernel 3.11.0-rc7-1.g99e1318-desktop and removed the gfs2 RA. Then I run mkfs.ocfs2 -F --cluster-name mycluster --cluster-stack pcmk /dev/sdb1 and there are no hang at all.
-F disables cluster checks. You have to run without -F. Also ensure ocfs2-kmp is installed. With the dlm service running. Try the following steps: # modprobe ocfs2_stack_user # modprobe ocfs2 # echo "pcmk" > /sys/fs/ocfs2/cluster_stack # mkfs.ocfs2 /dev/sdb1 You may have to execute mkfs.ocfs2 twice if the device is zeroed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c11
--- Comment #11 from Lidong Zhong
https://bugzilla.novell.com/show_bug.cgi?id=838705
https://bugzilla.novell.com/show_bug.cgi?id=838705#c12
Lidong Zhong
participants (1)
-
bugzilla_noreply@novell.com