[Bug 561368] New: System hang on accessing a mounted directory over NFS
http://bugzilla.novell.com/show_bug.cgi?id=561368 http://bugzilla.novell.com/show_bug.cgi?id=561368#c0 Summary: System hang on accessing a mounted directory over NFS Classification: openSUSE Product: openSUSE 11.3 Version: Factory Platform: x86-64 OS/Version: SLES 11 Status: NEW Severity: Major Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: dmitri.zoguine@sun.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 Hello, I have SLES11 distro installed and the linux kernel 2.6.27.29-0.1 I am getting constantly systems hangs. Note that I have /home directory over NFS. The typical stack traces are: crash> bt -a PID: 4815 TASK: ffff8800e255e800 CPU: 0 COMMAND: "bash" #0 [ffffffff80a3fea0] crash_nmi_callback at ffffffff8021f403 #1 [ffffffff80a3feb0] notifier_call_chain at ffffffff8049fcb8 #2 [ffffffff80a3fee0] notify_die at ffffffff80253654 #3 [ffffffff80a3ff10] default_do_nmi at ffffffff8049de4f #4 [ffffffff80a3ff40] do_nmi at ffffffff8049e015 #5 [ffffffff80a3ff50] nmi at ffffffff8049d8bf [exception RIP: xs_tcp_send_request+30] RIP: ffffffffa00040f0 RSP: ffff8800e41f99c8 RFLAGS: 00000206 RAX: 000000000000006c RBX: ffff8800e25c4e00 RCX: 0000000000000000 RDX: ffff8800e192b808 RSI: ffff8800e006f80e RDI: ffff8800e25c4e00 RBP: ffff8800e344b2d0 R8: ffff8800e259f000 R9: ffff8800e192b870 R10: ffff88011388c000 R11: ffffffffa0008500 R12: ffff8800e25c4e00 R13: ffff88011388c000 R14: ffff8800e344b2d8 R15: ffff8800e259f000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <exception stack> --- #6 [ffff8800e41f99c8] xs_tcp_send_request at ffffffffa00040f0 #7 [ffff8800e41f99f0] xprt_transmit at ffffffffa00030e6 #8 [ffff8800e41f9a30] call_transmit at ffffffffa0000bf4 #9 [ffff8800e41f9a40] __rpc_execute at ffffffffa0006eb2 #10 [ffff8800e41f9a70] rpc_run_task at ffffffffa0001482 #11 [ffff8800e41f9a90] rpc_call_sync at ffffffffa0001571 #12 [ffff8800e41f9ae0] nfs3_rpc_wrapper at ffffffffa005e792 #13 [ffff8800e41f9b00] nfs3_proc_access at ffffffffa005ecf9 #14 [ffff8800e41f9c30] nfs_do_access at ffffffffa004ff18 #15 [ffff8800e41f9ca0] nfs_permission at ffffffffa0050059 #16 [ffff8800e41f9cd0] __inode_permission at ffffffff802b8e11 #17 [ffff8800e41f9cf0] path_permission at ffffffff802b8e77 #18 [ffff8800e41f9d10] __link_path_walk at ffffffff802ba67f #19 [ffff8800e41f9d90] path_walk at ffffffff802bb486 #20 [ffff8800e41f9dc0] do_path_lookup at ffffffff802bb644 #21 [ffff8800e41f9e00] user_path_at at ffffffff802bbfdd #22 [ffff8800e41f9ec0] vfs_stat_fd at ffffffff802b4a4e #23 [ffff8800e41f9ef0] sys_newstat at ffffffff802b4ad6 #24 [ffff8800e41f9f80] system_call_fastpath at ffffffff8020bfbb crash> bt -a PID: 4450 TASK: ffff88007e5ba280 CPU: 0 COMMAND: "bash" #0 [ffffffff80a3fea0] crash_nmi_callback at ffffffff8021f403 #1 [ffffffff80a3feb0] notifier_call_chain at ffffffff8049fcb8 #2 [ffffffff80a3fee0] notify_die at ffffffff80253654 #3 [ffffffff80a3ff10] default_do_nmi at ffffffff8049de4f #4 [ffffffff80a3ff40] do_nmi at ffffffff8049e015 #5 [ffffffff80a3ff50] nmi at ffffffff8049d8bf [exception RIP: aa_revalidate_sk+60] RIP: ffffffff8032fb36 RSP: ffff880067cf7958 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88006102e080 RCX: 0000000000000001 RDX: 0000000000000068 RSI: ffffffff805cf9d4 RDI: ffff88007dcf8800 RBP: 0000000000000068 R8: 0000000000000068 R9: 000000000000000e R10: ffff88007d5a4000 R11: ffffffff80331acb R12: ffff88007dcf8800 R13: ffffffff805cf9d4 R14: ffff88007d404008 R15: ffff88007d404008 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <exception stack> --- #6 [ffff880067cf7958] aa_revalidate_sk at ffffffff8032fb36 #7 [ffff880067cf7980] sock_sendmsg at ffffffff8041e148 #8 [ffff880067cf7b20] kernel_sendmsg at ffffffff8041e442 #9 [ffff880067cf7b40] xs_send_kvec at ffffffffa0003e8b #10 [ffff880067cf7bb0] xs_sendpages at ffffffffa0003f1b #11 [ffff880067cf7c00] xs_tcp_send_request at ffffffffa0004116 #12 [ffff880067cf7c30] xprt_transmit at ffffffffa00030e6 #13 [ffff880067cf7c70] call_transmit at ffffffffa0000bf4 #14 [ffff880067cf7c80] __rpc_execute at ffffffffa0006eb2 #15 [ffff880067cf7cb0] rpc_run_task at ffffffffa0001482 #16 [ffff880067cf7cd0] rpc_call_sync at ffffffffa0001571 #17 [ffff880067cf7d20] nfs3_rpc_wrapper at ffffffffa005e792 #18 [ffff880067cf7d40] nfs3_proc_getattr at ffffffffa005ef31 #19 [ffff880067cf7d80] __nfs_revalidate_inode at ffffffffa0052664 #20 [ffff880067cf7e90] nfs_getattr at ffffffffa0052bc3 #21 [ffff880067cf7ec0] vfs_stat_fd at ffffffff802b4a65 #22 [ffff880067cf7ef0] sys_newstat at ffffffff802b4ad6 #23 [ffff880067cf7f80] system_call_fastpath at ffffffff8020bfbb RIP: 00007f6eca014145 RSP: 00007fffb83b1470 RFLAGS: 00010297 RAX: 0000000000000004 RBX: ffffffff8020bfbb RCX: 00007fffb83b15b0 RDX: 00007fffb83b1660 RSI: 00007fffb83b1660 RDI: 00007f6ecabd27c0 RBP: 00007fffb83b17d0 R8: 00007f6ecabd3ad0 R9: 00007f6ecabd4c10 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6ecabd27c0 That reminds the known issue http://linux-nfs.org/pipermail/nfsv4/2009-May/010601.html but it should be fixed in Sles11 kernel according to the source. One of the recent hangs I got slightly different stack traces: crash> bt 4068 PID: 4068 TASK: ffff8800604b23c0 CPU: 1 COMMAND: "login" #0 [ffff88006857bae8] schedule at ffffffff8049c1bf #1 [ffff88006857bbf0] io_schedule at ffffffff8049c2de #2 [ffff88006857bc10] sync_page at ffffffff80282bb9 #3 [ffff88006857bc20] __lock_page at ffffffff80282c2c #4 [ffff88006857bc80] find_lock_page at ffffffff80282e2c #5 [ffff88006857bca0] filemap_fault at ffffffff802838f9 #6 [ffff88006857bd10] __do_fault at ffffffff80291c2c #7 [ffff88006857bd90] handle_mm_fault at ffffffff80293d94 #8 [ffff88006857bdf0] do_page_fault at ffffffff8049f7c6 #9 [ffff88006857bf50] error_exit at ffffffff8049d649 RIP: 00007ffed8407fd7 RSP: 00007fff2ac518a0 RFLAGS: 00010213 RAX: 0000000000000000 RBX: 000000000061bfb0 RCX: 00007ffed8407fcd RDX: 0000000000000001 RSI: 000000000000018a RDI: 0000000000000004 RBP: 00007fff2ac51a50 R8: 0000000000000004 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000213 R12: 0000000000000004 R13: 000000000061bb50 R14: 000000000061bc90 R15: 00007ffed8e07000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b crash> bt 5300 PID: 5300 TASK: ffff88005c11c7c0 CPU: 1 COMMAND: "lc_watchdogd" #0 [ffff88005ed61dc0] schedule at ffffffff8049c1bf #1 [ffff88005ed61ec8] lcw_dispatch_main at ffffffffa040c000 #2 [ffff88005ed61f48] kernel_thread at ffffffff8020cf79 crash> bt 5523 PID: 5523 TASK: ffff88005c092880 CPU: 1 COMMAND: "bash" #0 [ffff88005dc13ab8] schedule at ffffffff8049c1bf #1 [ffff88005dc13bc0] rpc_wait_bit_killable at ffffffffa00067e9 #2 [ffff88005dc13bd0] __wait_on_bit at ffffffff8049c5ef #3 [ffff88005dc13c10] out_of_line_wait_on_bit at ffffffff8049c689 #4 [ffff88005dc13c80] __rpc_execute at ffffffffa0006f1c #5 [ffff88005dc13cb0] rpc_run_task at ffffffffa0001482 #6 [ffff88005dc13cd0] rpc_call_sync at ffffffffa0001571 #7 [ffff88005dc13d20] nfs3_rpc_wrapper at ffffffffa005e792 #8 [ffff88005dc13d40] nfs3_proc_getattr at ffffffffa005ef31 #9 [ffff88005dc13d80] __nfs_revalidate_inode at ffffffffa0052664 #10 [ffff88005dc13e90] nfs_getattr at ffffffffa0052bc3 #11 [ffff88005dc13ec0] vfs_stat_fd at ffffffff802b4a65 #12 [ffff88005dc13ef0] sys_newstat at ffffffff802b4ad6 #13 [ffff88005dc13f80] system_call_fastpath at ffffffff8020bfbb RIP: 00007fdecb35d145 RSP: 00007fff58fd6ff0 RFLAGS: 00010297 RAX: 0000000000000004 RBX: ffffffff8020bfbb RCX: 00007fff58fd7130 RDX: 00007fff58fd71e0 RSI: 00007fff58fd71e0 RDI: 00007fdecbf27a90 RBP: 00007fff58fd7350 R8: 00007fdecbf2d3c0 R9: 00007fdecbf32790 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdecbf27a90 R13: 0000000000000000 R14: 00007fff58fd70b0 R15: 0000000000000000 ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b Reproducible: Always Steps to Reproduce: Create an NFS /home directory, Try to use mount, umount, df, etc. Actual Results: system hang, no login is possible. But the system is pingable. I can provide you with the crashdumps - please let know the location you want to ftp them. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=561368 http://bugzilla.novell.com/show_bug.cgi?id=561368#c Dmitry Zogin <dmitri.zoguine@sun.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=561368 http://bugzilla.novell.com/show_bug.cgi?id=561368#c shuang qiu <sqiu@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sqiu@novell.com AssignedTo|bnc-team-screening@forge.pr |nfbrown@novell.com |ovo.novell.com | Severity|Major |Normal -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=561368 http://bugzilla.novell.com/show_bug.cgi?id=561368#c1 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |dmitri.zoguine@sun.com --- Comment #1 from Neil Brown <nfbrown@novell.com> 2009-12-09 21:48:23 UTC --- The stack traces are all quite different. They all show a thread trying to transmit an NFS3 request, but they are all at different points on the process which seems to suggest looping higher up the stack, maybe even in nfs3_rpc_wrapper. So this could easily be a problem with the NFS server, or possibly with the network. Is it possible to get a network trace (e.g. tcpdump -s 0) of traffic between the NFS client and server? That would help isolate where the problem is. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=561368 http://bugzilla.novell.com/show_bug.cgi?id=561368#c2 diana klashman <diana.klashman@sun.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED CC| |diana.klashman@sun.com Info Provider|dmitri.zoguine@sun.com | Resolution| |FIXED --- Comment #2 from diana klashman <diana.klashman@sun.com> 2010-02-02 23:34:04 UTC --- In an email exchange from January 5, 2010, the submitter states "We have not seen any problems after upgrading to 2.6.27.39-0.3 kernel version." This issue is resolved. Thanks! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com