[Bug 921449] New: [ocfs2] cannot run run_reflink_test() case successfully on opensuse13.2 with ocfs2-tools-1.8.2
http://bugzilla.suse.com/show_bug.cgi?id=921449 Bug ID: 921449 Summary: [ocfs2] cannot run run_reflink_test() case successfully on opensuse13.2 with ocfs2-tools-1.8.2 Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: High Availability Assignee: lmb@suse.com Reporter: ghe@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- This morning, I ran run_reflink_test() case again, and got the same result, the test node (open-nd1) was rebooted (do not know why? it did not look a system crash, just wonder if SBD killed the node? ). The ocfs2 cluster installation steps were written down in setup_ocfs2_cluster.txt. I launched the test with the command "./single_run-WIP.sh -k /data/linux-kernel.tar.gz -m /mnt/shared/ -l /opt/ocfs2-test/log -d /dev/mapper/cluster--vg1-big--lv -s pcmk -n hacluster", the other test cases were commented by me in single_run-WIP.sh script, for more details, please refer to opt.tar.gz file. the system message was dumped into messages.log. if you need more information, please tell me. for the problem that mmap_truncate test case when cluster_size=1M will hanged, I will reproduce it again and send back to you. rpm info: open-nd1:/opt/ocfs2-test/bin # uname -r 3.16.6-2-desktop open-nd1:/opt/ocfs2-test/bin # rpm -qa | grep ocfs ocfs2console-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-tools-debuginfo-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-tools-devel-static-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-tools-devel-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-tools-o2cb-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-tools-1.8.2+git.1361836695.ff84eb5-11.1.4.x86_64 ocfs2-test-1.0.4+git.1423834151.6a0aacd-0.x86_64 open-nd1:/opt/ocfs2-test/bin # -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921449
--- Comment #1 from Gang He
http://bugzilla.suse.com/show_bug.cgi?id=921449
--- Comment #2 from Gang He
http://bugzilla.suse.com/show_bug.cgi?id=921449
--- Comment #3 from Gang He
http://bugzilla.suse.com/show_bug.cgi?id=921449
--- Comment #4 from Gang He
Hi Goldwyn,
Hi Gang,
Hi Goldwyn/Mark,
This morning, I ran run_reflink_test() case again, and got the same result,
On 03/05/2015 01:46 AM, Gang He wrote: the test node (open-nd1) was rebooted (do not know why? it did not look a system crash, just wonder if SBD killed the node? ).
The ocfs2 cluster installation steps were written down in setup_ocfs2_cluster.txt. I launched the test with the command "./single_run-WIP.sh -k /data/linux-kernel.tar.gz -m /mnt/shared/ -l /opt/ocfs2-test/log -d /dev/mapper/cluster--vg1-big--lv -s pcmk -n hacluster", the other test cases were commented by me in single_run-WIP.sh script, for more details, please refer to opt.tar.gz file. the system message was dumped into messages.log. if you need more information, please tell me.
Could you open a bugzilla bug for this? Also, could you extract more information from what is happening. Perhaps dig down to the systemcall level in the test. You could probably add print statements with timestamps to figure out what the process is doing.
Today we will take doc-day program, I will open a bugzilla bug next week. this bug is not a system crash/system hang, but the system is rebooted when running run_reflink_test() case for some hours in my VM cluster node, I wonder the system reboot is triggered by cluster software (such as, SBD).
Yes, most likely the reboot is triggered by SBD, but it is highly possible that it happened because ocfs2 is not responding. The job of the fencing agent is to reboot in case of no response. This is the reason I asked you to dig down deeper and check what was the last operation on ocfs2 before the node reboots. /proc/<pid>/stack is possibly the place you want to look if you know it is hung (just before the reboot). Thanks for the testing so far. -- Goldwyn -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921449
Lars Marowsky-Bree
http://bugzilla.suse.com/show_bug.cgi?id=921449
Goldwyn Rodrigues
http://bugzilla.suse.com/show_bug.cgi?id=921449
--- Comment #5 from Gang He
http://bugzilla.suse.com/show_bug.cgi?id=921449
Gang He
participants (1)
-
bugzilla_noreply@novell.com