[Bug 921494] New: [ocfs2] run run_mmaptruncate() case on opensuse13.2 with ocfs2-tools-1.8.2 will hang in 10G volume
http://bugzilla.suse.com/show_bug.cgi?id=921494 Bug ID: 921494 Summary: [ocfs2] run run_mmaptruncate() case on opensuse13.2 with ocfs2-tools-1.8.2 will hang in 10G volume Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: High Availability Assignee: lmb@suse.com Reporter: ghe@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- This is a very weird problem, running run_mmaptruncate() case on opensuse13.2 with ocfs2-tools-1.8.2 will hang in 10G volume, but running this case in a more bigger volume (e.g. 60G) will be OK. So, I want to see if this behavior art is considered as a bug. the accurate step which is hanged is in case of block-size=4096/cluster-size=1048576 (last step). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 --- Comment #1 from Gang He <ghe@suse.com> --- some installation information/package, please refer to Bug 921449. command line: ./single_run-WIP.sh -k /data/linux-kernel.tar.gz -m /mnt/shared/ -l /opt/ocfs2-test/log -d /dev/mapper/cluster--vg1-big--lv -s pcmk -n hacluster pstack information when this both processes are hanged: root 14516 0.0 0.0 0 0 ? S 15:45 0:00 [kworker/0:1] root 14670 0.0 0.4 45588 8464 pts/2 S+ 15:49 0:01 vi single_run-WIP.sh root 15245 0.0 0.0 0 0 ? D 15:59 0:00 [jbd2/dm-3-523] root 15248 1.6 0.0 14476 1500 pts/1 D+ 15:59 0:09 mmap_truncate -c 20 -s 300 /mnt/shared//mmaptruncate.txt root 15249 1.9 0.0 14476 396 pts/1 D+ 15:59 0:11 mmap_truncate -c 20 -s 300 /mnt/shared//mmaptruncate.txt root 15272 0.0 0.0 0 0 ? S 16:00 0:00 [kworker/u4:3] root 15274 0.0 0.0 0 0 ? S 16:00 0:00 [kworker/1:2] root 15302 0.0 0.0 0 0 ? S 16:01 0:00 [kworker/1:3] root 15439 0.0 0.0 0 0 ? S 16:06 0:00 [kworker/1:0] root 15551 0.0 0.1 23212 2748 pts/3 R+ 16:09 0:00 ps aux open-nd1:/opt/ocfs2-test/log/2015-03-10_15:34 # cat /proc/15248/stack [<ffffffff81149fda>] sleep_on_page+0xa/0x10 [<ffffffff8114a0da>] __lock_page+0x6a/0x70 [<ffffffffa05e1198>] ocfs2_write_begin_nolock+0x1688/0x1fa0 [ocfs2] [<ffffffffa0613275>] ocfs2_page_mkwrite+0x1f5/0x310 [ocfs2] [<ffffffff81171b9e>] do_page_mkwrite+0x3e/0x80 [<ffffffff81174cd5>] do_shared_fault.isra.54+0x65/0x1d0 [<ffffffff81175e34>] handle_mm_fault+0x484/0x1170 [<ffffffff81048ce8>] __do_page_fault+0x158/0x530 [<ffffffff81623808>] async_page_fault+0x28/0x30 [<0000000000400eaa>] 0x400eaa [<ffffffffffffffff>] 0xffffffffffffffff open-nd1:/opt/ocfs2-test/log/2015-03-10_15:34 # cat /proc/15249/stack [<ffffffff81323543>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffffa05fb7f7>] ocfs2_truncate_file+0x117/0xa80 [ocfs2] [<ffffffffa060171d>] ocfs2_setattr+0x52d/0xe20 [ocfs2] [<ffffffff811d2d11>] notify_change+0x241/0x390 [<ffffffff811b55f5>] do_truncate+0x65/0x90 [<ffffffff811b594b>] do_sys_ftruncate.constprop.10+0x10b/0x160 [<ffffffff8162182d>] system_call_fastpath+0x1a/0x1f [<00007f2ecab95207>] 0x7f2ecab95207 [<ffffffffffffffff>] 0xffffffffffffffff -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 --- Comment #2 from Gang He <ghe@suse.com> --- latest system message (=/var/log/message): Mar 10 15:54:46 open-nd1 sudo[15069]: root : TTY=pts/1 ; PWD=/opt/ocfs2-test/bin ; USER=root ; COMMAND=/usr/bin/chown -R root /mnt/shared/ Mar 10 15:54:46 open-nd1 sudo[15069]: pam_unix(sudo:session): session opened for user root by root(uid=0) Mar 10 15:54:46 open-nd1 sudo[15069]: pam_unix(sudo:session): session closed for user root Mar 10 15:59:56 open-nd1 sudo[15212]: root : TTY=pts/1 ; PWD=/opt/ocfs2-test/bin ; USER=root ; COMMAND=/usr/bin/umount /mnt/shared/ Mar 10 15:59:56 open-nd1 sudo[15212]: pam_unix(sudo:session): session opened for user root by root(uid=0) Mar 10 15:59:56 open-nd1 sudo[15212]: pam_unix(sudo:session): session closed for user root Mar 10 15:59:56 open-nd1 sudo[15233]: root : TTY=pts/1 ; PWD=/opt/ocfs2-test/bin ; USER=root ; COMMAND=/sbin/mkfs.ocfs2 -x -b 4096 -C 1048576 --fs-features=sparse,unwritten,inline-data -N 1 -L sing Mar 10 15:59:56 open-nd1 sudo[15233]: pam_unix(sudo:session): session opened for user root by root(uid=0) Mar 10 15:59:56 open-nd1 kernel: ocfs2: Unmounting device (253,3) on (node local) Mar 10 15:59:56 open-nd1 sudo[15233]: pam_unix(sudo:session): session closed for user root Mar 10 15:59:56 open-nd1 sudo[15241]: root : TTY=pts/1 ; PWD=/opt/ocfs2-test/bin ; USER=root ; COMMAND=/usr/bin/mount -o data=writeback /dev/mapper/cluster--vg1-big--lv /mnt/shared/ Mar 10 15:59:56 open-nd1 sudo[15241]: pam_unix(sudo:session): session opened for user root by root(uid=0) Mar 10 15:59:56 open-nd1 kernel: JBD2: Ignoring recovery information on journal Mar 10 15:59:56 open-nd1 kernel: ocfs2: Mounting device (253,3) on (node local, slot 0) with writeback data mode. Mar 10 15:59:56 open-nd1 sudo[15241]: pam_unix(sudo:session): session closed for user root Mar 10 15:59:56 open-nd1 sudo[15246]: root : TTY=pts/1 ; PWD=/opt/ocfs2-test/bin ; USER=root ; COMMAND=/usr/bin/chown -R root /mnt/shared/ Mar 10 15:59:56 open-nd1 sudo[15246]: pam_unix(sudo:session): session opened for user root by root(uid=0) Mar 10 15:59:56 open-nd1 sudo[15246]: pam_unix(sudo:session): session closed for user root Mar 10 16:00:01 open-nd1 cron[15250]: pam_unix(crond:session): session opened for user root by (uid=0) Mar 10 16:00:01 open-nd1 CRON[15250]: pam_unix(crond:session): session closed for user root Mar 10 16:02:22 open-nd1 crmd[1755]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Mar 10 16:02:22 open-nd1 crmd[1755]: notice: do_te_invoke: Processing graph 31 (ref=pe_calc-dc-1425974542-153) derived from /var/lib/pacemaker/pengine/pe-input-153.bz2 Mar 10 16:02:22 open-nd1 crmd[1755]: notice: run_graph: Transition 31 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-153.bz2): Complete Mar 10 16:02:22 open-nd1 crmd[1755]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 10 16:02:22 open-nd1 pengine[1754]: notice: process_pe_message: Calculated Transition 31: /var/lib/pacemaker/pengine/pe-input-153.bz2 Mar 10 16:11:56 open-nd1 sshd[15642]: Accepted publickey for root from 192.168.100.1 port 55261 ssh2: RSA 2a:af:3e:08:7a:b9:0f:29:c0:a6:0b:4b:ab:d8:9f:ba [MD5] Mar 10 16:11:56 open-nd1 sshd[15642]: pam_unix(sshd:session): session opened for user root by (uid=0) Mar 10 16:12:27 open-nd1 crmd[1755]: notice: throttle_handle_load: High CPU load detected: 3.340000 Mar 10 16:12:57 open-nd1 crmd[1755]: notice: throttle_handle_load: High CPU load detected: 3.360000 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Lars Marowsky-Bree <lmb@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P3 - Medium CC| |lmb@suse.com, | |rgoldwyn@suse.com Assignee|lmb@suse.com |ghe@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Gang He <ghe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zren@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c3 --- Comment #3 from Gang He <ghe@suse.com> --- Hello Eric, I remember that this bug was ever reproduced by you, but it can be fixed via increasing the disk volume. Do you still encounter this bug? if yes, I will look at it. Thanks Gang -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c4 --- Comment #4 from zhen ren <zren@suse.com> --- (In reply to Gang He from comment #3)
Hello Eric,
I remember that this bug was ever reproduced by you, but it can be fixed via increasing the disk volume. Do you still encounter this bug? if yes, I will look at it.
I think this issue should be resolved by this patch: ``` commit c33f0785bf292cf1d15f4fbe42869c63e205b21c Author: Eric Ren <zren@suse.com> Date: Fri Sep 30 15:11:32 2016 -0700 ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: __wait_on_bit_lock+0x50/0xa0 __lock_page+0xb7/0xc0 ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] do_page_mkwrite+0x66/0xc0 handle_mm_fault+0x685/0x1350 __do_page_fault+0x1d8/0x4d0 trace_do_page_fault+0x37/0xf0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if -ENOSPC is returned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into consideration. Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: He Gang <ghe@suse.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> ``` Eric
Thanks Gang
-- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c5 --- Comment #5 from Gang He <ghe@suse.com> --- The problem is still reproduced on sles12sp2,I will verify the patch. If it is effective, I will back-port the patch. ocfs2te+ 18245 0.5 0.0 6108 1348 pts/0 D+ 14:33 0:08 mmap_truncate -c 20 -s 300 /mnt/shared/mmaptruncate.txt ocfs2te+ 18246 0.8 0.0 6108 84 pts/0 D+ 14:33 0:11 mmap_truncate -c 20 -s 300 /mnt/shared/mmaptruncate.txt root 18249 0.0 0.0 0 0 ? S 14:33 0:00 [kworker/3:2] root 18376 0.0 0.0 0 0 ? S< 14:35 0:00 [kworker/2:0H] root 18925 0.0 0.0 0 0 ? S 14:45 0:00 [kworker/3:0] root 19371 0.0 0.0 0 0 ? S< 14:52 0:00 [kworker/2:2H] root 19584 0.0 0.1 35592 3268 pts/2 R+ 14:56 0:00 ps aux sles12sp2-nd1:/usr/local/ocfs2-test/bin # cat /proc/18245/stack [<ffffffff81183b19>] __lock_page+0xa9/0xb0 [<ffffffffa0573f8b>] ocfs2_write_begin_nolock+0x131b/0x1830 [ocfs2] [<ffffffffa059902b>] ocfs2_page_mkwrite+0x1ab/0x260 [ocfs2] [<ffffffff811aea89>] do_page_mkwrite+0x69/0xb0 [<ffffffff811b14cd>] handle_pte_fault+0xfd/0x14f0 [<ffffffff811b37ae>] handle_mm_fault+0x29e/0x550 [<ffffffff810645ba>] __do_page_fault+0x18a/0x410 [<ffffffff810648ec>] trace_do_page_fault+0x3c/0x120 [<ffffffff815e3818>] async_page_fault+0x28/0x30 [<ffffffffffffffff>] 0xffffffffffffffff sles12sp2-nd1:/usr/local/ocfs2-test/bin # cat /proc/18246/stack [<ffffffff8131b4c3>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffffa0589717>] ocfs2_truncate_file+0x127/0x6c0 [ocfs2] [<ffffffffa058c1a8>] ocfs2_setattr+0x698/0xa90 [ocfs2] [<ffffffffa05843f3>] ocfs2_inode_unlock+0x33/0x80 [ocfs2] [<ffffffff8121587e>] notify_change+0x1ae/0x380 [<ffffffff811f8bfe>] do_truncate+0x5e/0x90 [<ffffffff811f8f58>] do_sys_ftruncate.constprop.11+0x108/0x160 [<ffffffff815e142e>] entry_SYSCALL_64_fastpath+0x12/0x6d [<ffffffffffffffff>] 0xffffffffffffffff -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c6 --- Comment #6 from Gang He <ghe@suse.com> --- The patch was applied on sles12sp2, the bug was fixed. I will back-port this patch to sles12sp2,sles12sp3 and penSUSE-42.2 branches. Thanks Gang -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c7 Gang He <ghe@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Gang He <ghe@suse.com> --- The patch has been in these three branches, I will close this bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard| |obs:running:6362:important -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|obs:running:6362:important |obs:running:6362:important | |ibs:running:4163:important -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|obs:running:6362:important |ibs:running:4163:important |ibs:running:4163:important | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c8 --- Comment #8 from Swamp Workflow Management <swamp@suse.de> --- openSUSE-SU-2017:0456-1: An update that solves 11 vulnerabilities and has 98 fixes is now available. Category: security (important) Bug References: 1000092,1000619,1003077,1003253,1005918,1006469,1006472,1007729,1008742,1009546,1009674,1009718,1009911,1009969,1010612,1010690,1011176,1011250,1011602,1011660,1011913,1012422,1012829,1012910,1013000,1013001,1013273,1013531,1013540,1013542,1013792,1013994,1014120,1014392,1014410,1014701,1014710,1015038,1015212,1015359,1015367,1015416,1015840,1016250,1016403,1016517,1016884,1016979,1017164,1017170,1017410,1017589,1018100,1018316,1018358,1018385,1018446,1018813,1018913,1019061,1019148,1019260,1019351,1019594,1019630,1019631,1019784,1019851,1020214,1020488,1020602,1020685,1020817,1020945,1020975,1021248,1021251,1021258,1021260,1021294,1021455,1021474,1022304,1022429,1022476,1022547,1022559,1022971,1023101,1023175,921494,959709,960561,964944,966170,966172,966186,966191,969474,969475,969756,971975,974215,979378,981709,985561,987192,987576,991273 CVE References: CVE-2015-8709,CVE-2016-7117,CVE-2016-8645,CVE-2016-9793,CVE-2016-9806,CVE-2016-9919,CVE-2017-2583,CVE-2017-2584,CVE-2017-5551,CVE-2017-5576,CVE-2017-5577 Sources used: openSUSE Leap 42.2 (src): kernel-debug-4.4.46-11.1, kernel-default-4.4.46-11.1, kernel-docs-4.4.46-11.3, kernel-obs-build-4.4.46-11.1, kernel-obs-qa-4.4.46-11.1, kernel-source-4.4.46-11.1, kernel-syms-4.4.46-11.1, kernel-vanilla-4.4.46-11.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 http://bugzilla.suse.com/show_bug.cgi?id=921494#c9 --- Comment #9 from Swamp Workflow Management <swamp@suse.de> --- SUSE-SU-2017:0575-1: An update that solves 11 vulnerabilities and has 95 fixes is now available. Category: security (important) Bug References: 1000092,1000619,1003077,1005918,1006469,1006472,1007729,1008742,1009546,1009674,1009718,1009911,1010612,1010690,1010933,1011176,1011602,1011660,1011913,1012382,1012422,1012829,1012910,1013000,1013001,1013273,1013540,1013792,1013994,1014120,1014410,1015038,1015367,1015840,1016250,1016403,1016517,1016884,1016979,1017164,1017170,1017410,1018100,1018316,1018358,1018446,1018813,1018913,1019061,1019148,1019168,1019260,1019351,1019594,1019630,1019631,1019784,1019851,1020048,1020214,1020488,1020602,1020685,1020817,1020945,1020975,1021082,1021248,1021251,1021258,1021260,1021294,1021455,1021474,1022304,1022429,1022476,1022547,1022559,1022971,1023101,1023175,1023762,1023884,1023888,1024081,1024234,1024508,1024938,1025235,921494,959709,964944,969476,969477,969479,971975,974215,981709,982783,985561,987192,987576,989056,991273,998106 CVE References: CVE-2015-8709,CVE-2016-7117,CVE-2016-9806,CVE-2017-2583,CVE-2017-2584,CVE-2017-5551,CVE-2017-5576,CVE-2017-5577,CVE-2017-5897,CVE-2017-5970,CVE-2017-5986 Sources used: SUSE Linux Enterprise Workstation Extension 12-SP2 (src): kernel-default-4.4.49-92.11.1 SUSE Linux Enterprise Software Development Kit 12-SP2 (src): kernel-docs-4.4.49-92.11.3, kernel-obs-build-4.4.49-92.11.1 SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src): kernel-default-4.4.49-92.11.1, kernel-source-4.4.49-92.11.1, kernel-syms-4.4.49-92.11.1 SUSE Linux Enterprise Server 12-SP2 (src): kernel-default-4.4.49-92.11.1, kernel-source-4.4.49-92.11.1, kernel-syms-4.4.49-92.11.1 SUSE Linux Enterprise Live Patching 12 (src): kgraft-patch-SLE12-SP2_Update_5-1-6.1 SUSE Linux Enterprise High Availability 12-SP2 (src): kernel-default-4.4.49-92.11.1 SUSE Linux Enterprise Desktop 12-SP2 (src): kernel-default-4.4.49-92.11.1, kernel-source-4.4.49-92.11.1, kernel-syms-4.4.49-92.11.1 OpenStack Cloud Magnum Orchestration 7 (src): kernel-default-4.4.49-92.11.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=921494 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|ibs:running:4163:important | -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com