[Bug 325913] New: NFS4 stress test raises "Call Trace" in opensuse10. 3alpha5 on power5
https://bugzilla.novell.com/show_bug.cgi?id=325913#c1 Summary: NFS4 stress test raises "Call Trace" in opensuse10.3alpha5 on power5 Product: openSUSE 10.3 Version: Beta 2 Platform: PowerPC-64 OS/Version: openSUSE 10.3 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: power-bugs@forge.provo.novell.com ReportedBy: bugproxy@us.ibm.com QAContact: qa@suse.de CC: power-bugs@forge.provo.novell.com Found By: --- Partner ID: LTC 36199 LTC Owner is: suzukikp@in.ibm.com LTC Originator is: mohd.omar@in.ibm.com ---problem description---------- nfs4 stress(fsstress) test raises "Call Trace"and "Oops" in opensuse10.3alpha5 on power5 ........... Jul 9 05:59:35 p630b kernel: RESERVE_SPACE(761) failed in function encode_lookup Jul 9 05:59:35 p630b kernel: ------------[ cut here ]------------ Jul 9 05:59:35 p630b kernel: kernel BUG at fs/nfs/nfs4xdr.c:849! Jul 9 05:59:35 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1] Jul 9 05:59:35 p630b kernel: SMP NR_CPUS=128 NUMA pSeries Jul 9 05:59:35 p630b kernel: Modules linked in: nfs lockd nfs_acl sunrpc ipv6 apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx scsi_transport_spi pata_sl82c105 libata scsi_mod Jul 9 05:59:35 p630b kernel: NIP: d00000000086fc68 LR: d00000000086fc58 CTR: c00000000008e408 Jul 9 05:59:35 p630b kernel: REGS: c0000000d7fe3210 TRAP: 0700 Tainted: G N (2.6.22-rc4-git3-2-ppc64) Jul 9 05:59:35 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR> CR: 28000484 XER: 00000000 Jul 9 05:59:35 p630b kernel: TASK = c00000002fc02040[4120] 'fsstress' THREAD: c0000000d7fe0000 CPU: 3 Jul 9 05:59:35 p630b kernel: GPR00: 0000000000000001 c0000000d7fe3490 d0000000008b3000 0000000000000037 Jul 9 05:59:35 p630b kernel: GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Jul 9 05:59:35 p630b kernel: GPR08: c0000001fe7a8040 0000000000000000 c000000005c87bf8 c0000000006493b0 Jul 9 05:59:35 p630b kernel: GPR12: 000000000000000a c0000000005e8e80 0000000000000000 0000000000000000 Jul 9 05:59:35 p630b kernel: GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Jul 9 05:59:35 p630b kernel: GPR20: 0000000000000000 0000000000000000 c0000000d7fe3a30 d0000000007c6424 Jul 9 05:59:35 p630b kernel: GPR24: c0000001d31409a0 c0000000f064e618 c0000000d7fe3920 c0000001d31409a0 Jul 9 05:59:35 p630b kernel: GPR28: 00000000000002f1 0000000028000444 d0000000008ae980 0000000000000000 Jul 9 05:59:35 p630b kernel: NIP [d00000000086fc68] .encode_lookup+0x6c/0xc0 [nfs] Jul 9 05:59:35 p630b kernel: LR [d00000000086fc58] .encode_lookup+0x5c/0xc0 [nfs] Jul 9 05:59:35 p630b kernel: Call Trace: Jul 9 05:59:35 p630b kernel: [c0000000d7fe3490] [d00000000086fc58] encode_lookup+0x5c/0xc0 [nfs] (unreliable) Jul 9 05:59:35 p630b kernel: [c0000000d7fe3530] [d000000000871b38] nfs4_xdr_enc_lookup+0x78/0xbc [nfs] Jul 9 05:59:35 p630b kernel: [c0000000d7fe3600] [d00000000078b634] rpcauth_wrap_req+0xd8/0x108 [sunrpc] Jul 9 05:59:35 p630b kernel: [c0000000d7fe36b0] [d000000000782688] call_transmit+0x228/0x2d8 [sunrpc] Jul 9 05:59:35 p630b kernel: [c0000000d7fe3750] [d00000000078a330] __rpc_execute+0xd4/0x35c [sunrpc] Jul 9 05:59:35 p630b kernel: [c0000000d7fe37f0] [d000000000782fc0] rpc_call_sync+0xb4/0x104 [sunrpc] Jul 9 05:59:35 p630b kernel: [c0000000d7fe38a0] [d00000000086a3bc] nfs4_proc_lookup+0x110/0x2cc [nfs] Jul 9 05:59:35 p630b kernel: [c0000000d7fe39c0] [d00000000084d92c] nfs_lookup+0x158/0x33c [nfs] Jul 9 05:59:35 p630b kernel: [c0000000d7fe3bc0] [c0000000000faf14] lookup_hash+0xfc/0x140 Jul 9 05:59:35 p630b kernel: [c0000000d7fe3c60] [c0000000000fd7ec] sys_renameat+0x164/0x230 Jul 9 05:59:35 p630b kernel: [c0000000d7fe3e30] [c0000000000086c8] syscall_exit+0x0/0x40 Jul 9 05:59:35 p630b kernel: Instruction dump: Jul 9 05:59:35 p630b kernel: e8410028 7fa4eb78 7c7f1b79 7fa00026 40820014 e8be83a8 e87e8350 4800c245 Jul 9 05:59:35 p630b kernel: e8410028 7fa80120 7c000026 54001ffe <0b000000> 3800000f 7b850020 387f0008 Jul 9 06:45:34 p630b syslog-ng[2750]: STATS: dropped 0 ---steps followed::----- 1. Install ltp-full-20070630.tgz in client(p630b) 2. create ext2,ext3 partitions, export file systems and setup nfs-server in server(p520b) and mount via nfs4 in client (p630b). In client run the following.. 3. ltp-full-20070630/testcases/kernel/fs/fsstress/fsstress -d /NFSVCLI/NFSV1 -l 0 -n 1000 -p 2 -r 4. ltp-full-20070630/testcases/kernel/fs/fsstress/fsstress -d /NFSVCLI/NFSV3 -l 0 -n 1000 -p 2 -r ---uname -a --------- server:: Linux linux-0gxvrp 2.6.22-rc4-git3-2-ppc64 #1 SMP 2007/06/12 15:00:04 UTC ppc64 ppc64 ppc64 GNU/Linux Client:: Linux p630b 2.6.22-rc4-git3-2-ppc64 #1 SMP 2007/06/12 15:00:04 UTC ppc64 ppc64 ppc64 GNU/Linux -----machine info---- server:: p520 (power5) client:: p630 (power4) ----Did the system produce an OOPS message on the console?------ Jul 9 05:59:35 p630b kernel: kernel BUG at fs/nfs/nfs4xdr.c:849! Jul 9 05:59:35 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1] Created an attachment (id=29104) --> (https://bugzilla.novell.com/attachment.cgi?id=29104) var-log-messages Created an attachment (id=29105) --> (https://bugzilla.novell.com/attachment.cgi?id=29105) dmesg-output hi suzuki, I executed fsstress with NFSv4 mounted partitions in opensuse10.3alpha6 with kernel : 2.6.22.1-2-ppc64 machine: p520b(p5) Here the results ------------------ kernel BUG at fs/nfs/nfs4xdr.c:480! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop dm_mod fuse e1000 sg sd_mod pata_pdc2027x ipr libata scsi_mod firmware_class NIP: d0000000006441e0 LR: d0000000006441d4 CTR: d00000000041e934 REGS: c00000006af87730 TRAP: 0700 Tainted: G N (2.6.22.1-2-ppc64) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24008044 XER: 00000000 TASK = c00000007198e820[3106] 'rpciod/0' THREAD: c00000006af84000 CPU: 0 GPR00: 0000000000000001 c00000006af879b0 d000000000683080 0000000000000000 GPR04: 0000000000000240 c000000065b34400 bc91000000000000 bc918cf6f2f258f6 GPR08: 0005000000000000 c00000006698ab0c c00000006698a8c8 c00000006af87b68 GPR12: d00000000064df98 c0000000005f5880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000001c00000 GPR20: c0000000005bfea0 00000000021c0110 0000000000272000 c0000000005c0110 GPR24: 00000000021bfea0 c00000006444e138 c000000063395808 c00000006af87b68 GPR28: c000000065b34400 0000000000000239 d00000000067ea00 c000000065352250 NIP [d0000000006441e0] .encode_string+0x34/0x60 [nfs] LR [d0000000006441d4] .encode_string+0x28/0x60 [nfs] Call Trace: [c00000006af87a40] [d000000000644784] .encode_open+0x3b0/0x3ec [nfs] [c00000006af87ae0] [d0000000006448f4] .nfs4_xdr_enc_open+0x88/0xf0 [nfs] [c00000006af87bb0] [d000000000413634] .rpcauth_wrap_req+0xd8/0x108 [sunrpc] [c00000006af87c60] [d00000000040a688] .call_transmit+0x228/0x2d8 [sunrpc] [c00000006af87d00] [d000000000412330] .__rpc_execute+0xd4/0x35c [sunrpc] [c00000006af87da0] [c00000000008f864] .run_workqueue+0x118/0x208 [c00000006af87e40] [c0000000000908b4] .worker_thread+0xf0/0x114 [c00000006af87f00] [c0000000000953f4] .kthread+0x78/0xc4 [c00000006af87f90] [c000000000028a00] .kernel_thread+0x4c/0x68 Instruction dump: fb81ffe0 fba1ffe8 7c9d2378 38840004 f8010010 f821ff71 7cbc2b78 78840020 48009dc9 e8410028 7c600074 7800d182 <0b000000> 7f84e378 7fa5eb78 4800908d thanks omar hi suzuki, test is still running and throwed one more Call trace, kernel BUG at fs/dcache.c:652! Oops: Exception in kernel mode, sig: 5 [#2] SMP NR_CPUS=128 NUMA pSeries Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop dm_mod fuse e1000 sg sd_mod pata_pdc2027x ipr libata scsi_mod firmware_class NIP: c00000000010dc24 LR: c00000000010dc20 CTR: c0000000000955b4 REGS: c000000072f97540 TRAP: 0700 Tainted: G N (2.6.22.1-2-ppc64) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000084 XER: 00000006 TASK = c000000072f8a040[8] 'events/1' THREAD: c000000072f94000 CPU: 1 GPR00: c00000000010dc20 c000000072f977c0 c00000000072ae40 0000000000000288 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: c0000000021bb090 c000000000760478 c000000072993bf8 0000000000000000 GPR12: 0000000000004000 c0000000005f5a80 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000001c00000 GPR20: c0000000005bfea0 00000000021c0110 c000000072f97ce0 d00000000066e938 GPR24: c000000072f97c10 c000000000752980 0000000000000000 0000000000000000 GPR28: 0000000000000006 c000000065352220 c000000000679e88 c0000000659ef7e0 NIP [c00000000010dc24] .shrink_dcache_for_umount_subtree+0x240/0x3a4 LR [c00000000010dc20] .shrink_dcache_for_umount_subtree+0x23c/0x3a4 Call Trace: [c000000072f977c0] [c00000000010dc20] shrink_dcache_for_umount_subtree+0x23c/0x3a4 (unreliable) [c000000072f97850] [c00000000010e9e4] .shrink_dcache_for_umount+0x54/0x78 [c000000072f978d0] [c0000000000f7e20] .generic_shutdown_super+0x3c/0x1b8 [c000000072f97960] [c0000000000f800c] .kill_anon_super+0x24/0x88 [c000000072f979f0] [d00000000062574c] .nfs_kill_super+0x20/0x44 [nfs] [c000000072f97a80] [c0000000000f81a8] .deactivate_super+0xf0/0x144 [c000000072f97b10] [c000000000114824] .mntput_no_expire+0x9c/0x138 [c000000072f97ba0] [c000000000115050] .expire_mount_list+0x1b8/0x22c [c000000072f97c70] [c000000000116d18] .mark_mounts_for_expiry+0xf8/0x13c [c000000072f97d10] [d00000000062f350] .nfs_expire_automounts+0x24/0x60 [nfs] [c000000072f97da0] [c00000000008f864] .run_workqueue+0x118/0x208 [c000000072f97e40] [c0000000000908b4] .worker_thread+0xf0/0x114 [c000000072f97f00] [c0000000000953f4] .kthread+0x78/0xc4 [c000000072f97f90] [c000000000028a00] .kernel_thread+0x4c/0x68 Instruction dump: e8a90040 e93d0090 80fd0000 7fa4eb78 e87e80c0 e8dd0038 e9690030 7ce707b4 39290228 e90b0000 4bf6bd09 60000000 <0fe00000> 48000000 ebfd0028 7fbfe800 thanks --omar static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str) { __be32 *p; p = xdr_reserve_space(xdr, 4 + len); BUG_ON(p == NULL); <<<--- BUG_ON hit here .. __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes) { __be32 *p = xdr->p; __be32 *q; /* align nbytes on the next 32-bit boundary */ nbytes += 3; nbytes &= ~3; q = p + (nbytes >> 2); if (unlikely(q > xdr->end || q < p)) return NULL; <<<-- NULL is returned from here. This occurs when there is not enough space in the XDR buffer to reserve for the requested size. Omar, Is it possible to enable xmon and drop the system to xmon when you hit the BUG_ON ? Thanks Suzuki Created an attachment (id=30198) --> (https://bugzilla.novell.com/attachment.cgi?id=30198) Debug patch to get more information Omar, Please build the kernel with the debug patch attached here and run the tests. When you hit the problem, please upload the dmesg o/p. Thanks Suzuki Created an attachment (id=30437) --> (https://bugzilla.novell.com/attachment.cgi?id=30437) dmesg-output I added the patch and compiled the kernel. Performed the NFS4 stress tests, It generated a Call Trace.. ------------[ cut here ]------------ kernel BUG at /usr/src/linux-2.6.22.1-10/net/sunrpc/xdr.c:480! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop dm_mod fuse e100 mii sl82c105 sg sd_mod sym53c8xx scsi_transport_spi pata_sl82c105 libata scsi_mod NIP: d000000000113994 LR: d000000000113990 CTR: c0000000000955b4 REGS: c0000001eed9f190 TRAP: 0700 Tainted: G N (2.6.22.1-debug-10-debug) MSR: 9000000000029032 <EE,ME,IR,DR> CR: 28000484 XER: 00000000 TASK = c0000001fd103820[30423] 'fsstress' THREAD: c0000001eed9c000 CPU: 0 GPR00: d000000000113990 c0000001eed9f410 d000000000147600 0000000000000065 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: c000000004048090 c00000000071c478 c0000001fc0fbbf8 0000000000000000 GPR12: 000000000000000a c0000000005b8880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000001eed9fa30 d0000000001439ac GPR24: c0000000fa56f730 c0000000f9176270 c0000001eed9f920 c0000000fa56f730 GPR28: 00000000000002ef 00000000000002f7 d000000000144d80 c0000001eed9f5b8 NIP [d000000000113994] .xdr_reserve_space+0x60/0xb8 [sunrpc] LR [d000000000113990] .xdr_reserve_space+0x5c/0xb8 [sunrpc] Call Trace: [c0000001eed9f410] [d000000000113990] .xdr_reserve_space+0x5c/0xb8 [sunrpc] (unreliable) [c0000001eed9f490] [d000000000228c38] .encode_lookup+0x3c/0xc0 [nfs] [c0000001eed9f530] [d00000000022ab38] .nfs4_xdr_enc_lookup+0x78/0xbc [nfs] [c0000001eed9f600] [d000000000108634] .rpcauth_wrap_req+0xd8/0x108 [sunrpc] [c0000001eed9f6b0] [d0000000000ff688] .call_transmit+0x228/0x2d8 [sunrpc] [c0000001eed9f750] [d000000000107330] .__rpc_execute+0xd4/0x35c [sunrpc] [c0000001eed9f7f0] [d0000000000fffc0] .rpc_call_sync+0xb4/0x104 [sunrpc] [c0000001eed9f8a0] [d0000000002233bc] .nfs4_proc_lookup+0x110/0x2cc [nfs] [c0000001eed9f9c0] [d00000000020692c] .nfs_lookup+0x158/0x33c [nfs] [c0000001eed9fbc0] [c000000000101ac0] .lookup_hash+0xfc/0x140 [c0000001eed9fc60] [c000000000104398] .sys_renameat+0x164/0x230 [c0000001eed9fe30] [c0000000000086c8] syscall_exit+0x0/0x40 Instruction dump: 7d2b2a14 7fa90040 419d000c 7fa95840 40bc0028 e92a0008 e87e8018 7d645b78 80e90034 80c90030 48005d05 e8410028 <0fe00000> 48000000 f9230000 60000000 ---------------- thanks Here is the relevant information I got from the debug patch : xdr_reserve_space : FAILs : P: c00000000fd39084, nbytes: 760, xdr.buflen: 560, current len : 124 ------------[ cut here ]------------ kernel BUG at /usr/src/linux-2.6.22.1-10/net/sunrpc/xdr.c:480! Oops: Exception in kernel mode, sig: 5 [#1] XDR is trying to reserve 760 bytes in the XDR Buffer which has a capacity fo 560, out of which 124 bytes are already occupied. Debugging further. Thanks Suzuki Created an attachment (id=30498) --> (https://bugzilla.novell.com/attachment.cgi?id=30498) Debug patch to find who is exceeding the buffer limits We are facing a problem with buffer running out. This could be either from the RPC headers which gets added before the control reaches the NFS code or may be from the NFS code itself. The attached debug patch will detect if the RPC is exceeding its limits. Omar, Could you please run the tests with the two debug patches ? ( including this one and the previous one ) Please let us know the test results. Thanks Suzuki The problem was hit again : Aug 21 17:18:37 p630b kernel: xdr_reserve_space : FAILs : P: c0000001ef152884, nbytes: 468, xdr.buflen: 560, current len : 124 Aug 21 17:18:37 p630b kernel: ------------[ cut here ]------------ Aug 21 17:18:37 p630b kernel: kernel BUG at /usr/src/linux-2.6.22.1-16/net/sunrpc/xdr.c:480! With the debug kernel. This indicates that, the NFS is passing a larger data to encode which can not be held in the buffer. Debugging further. Thanks Suzuki I have built a new kernel with probes to check if the NFS is not checking the size of the name being looked up/opened. Will update the test results. Thanks Suzuki Created an attachment (id=30570) --> (https://bugzilla.novell.com/attachment.cgi?id=30570) var-log-message Hi suzuki, (In reply to comment #15)
I have built a new kernel with probes to check if the NFS is not checking the
size of the name being looked up/opened.
New build kernel throws Aug 22 11:48:19 p630b kernel: kernel BUG at /usr/src/linux-2.6.22.1-16/net/sunrpc/xdr.c:480! Aug 22 11:48:19 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1] Aug 22 11:48:19 p630b kernel: SMP NR_CPUS=128 NUMA pSeries Aug 22 11:48:20 p630b kernel: Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx scsi_transport_spi pata_sl82c105 libata scsi_mod Aug 22 11:48:20 p630b kernel: NIP: d0000000006e6a08 LR: d0000000006e6a04 CTR: 0000000000000001 Aug 22 11:48:20 p630b kernel: REGS: c0000001e6142fa0 TRAP: 0700 Tainted: G N (2.6.22.1-16-debug) Aug 22 11:48:20 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR> CR: 28008424 XER: 00000000 --thanks -omar Here is the debug information that I got from the debug probes. As you can see, we are trying to pass large names for file/dir, which is not checked anywhere else. Aug 22 11:48:16 p630b kernel: nfs4 lookup: name too large(285) : lfXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Aug 22 11:48:16 p630b kernel: Call Trace: Aug 22 11:48:16 p630b kernel: [c0000001e611f800] [c00000000000f33c] show_stack+0x68/0x1b0 (unreliable) Aug 22 11:48:16 p630b kernel: [c0000001e611f8a0] [d000000000905328] nfs4_proc_lookup+0x7c/0x2f0 [nfs] Aug 22 11:48:16 p630b kernel: [c0000001e611f9c0] [d0000000008e892c] nfs_lookup+0x158/0x33c [nfs] Aug 22 11:48:16 p630b kernel: [c0000001e611fbc0] [c000000000101ab0] lookup_hash+0xfc/0x140 Aug 22 11:48:16 p630b kernel: [c0000001e611fc60] [c000000000104388] sys_renameat+0x164/0x230 Aug 22 11:48:16 p630b kernel: [c0000001e611fe30] [c0000000000086c8] syscall_exit+0x0/0x40 Aug 22 11:48:18 p630b kernel: nfs4 open: Name too long to encode : 799 ( f1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ) Aug 22 11:48:18 p630b kernel: Call Trace: Aug 22 11:48:18 p630b kernel: [c0000001e6143330] [c00000000000f33c] show_stack+0x68/0x1b0 (unreliable) Aug 22 11:48:18 p630b kernel: [c0000001e61433d0] [d00000000090f900] nfs4_xdr_enc_open+0x70/0x138 [nfs] Aug 22 11:48:18 p630b kernel: [c0000001e61434b0] [d0000000006db6a8] rpcauth_wrap_req+0xd8/0x108 [sunrpc] Aug 22 11:48:18 p630b kernel: [c0000001e6143560] [d0000000006d26f8] call_transmit+0x298/0x34c [sunrpc] Aug 22 11:48:18 p630b kernel: [c0000001e6143610] [d0000000006da3a4] __rpc_execute+0xd4/0x35c [sunrpc] Aug 22 11:48:18 p630b kernel: [c0000001e61436b0] [d0000000006da6d0] rpc_run_task+0x54/0x70 [sunrpc] Aug 22 11:48:18 p630b kernel: [c0000001e6143740] [d000000000904a44] _nfs4_proc_open+0x70/0x238 [nfs] Aug 22 11:48:18 p630b kernel: [c0000001e61437f0] [d000000000904f30] nfs4_do_open+0xe8/0x208 [nfs] Aug 22 11:48:18 p630b kernel: [c0000001e61438c0] [d0000000009069a4] nfs4_atomic_open+0xd8/0x19c [nfs] Aug 22 11:48:18 p630b kernel: [c0000001e61439c0] [d0000000008ea50c] nfs_atomic_lookup+0x128/0x228 [nfs] Aug 22 11:48:19 p630b kernel: [c0000001e6143a60] [c000000000101ab0] lookup_hash+0xfc/0x140 Aug 22 11:48:19 p630b kernel: [c0000001e6143b00] [c0000000001050f8] open_namei+0x134/0x86c Aug 22 11:48:19 p630b kernel: [c0000001e6143bf0] [c0000000000f3848] do_filp_open+0x38/0x70 Aug 22 11:48:19 p630b kernel: [c0000001e6143d10] [c0000000000f38f4] do_sys_open+0x74/0x130 Aug 22 11:48:19 p630b kernel: [c0000001e6143dc0] [c000000000015270] compat_sys_creat+0x14/0x28 Aug 22 11:48:19 p630b kernel: [c0000001e6143e30] [c0000000000086c8] syscall_exit+0x0/0x40 Created an attachment (id=30651) --> (https://bugzilla.novell.com/attachment.cgi?id=30651) dmesg-output suzuki , check the logs for yesterdays test. thanks --omar Hi, I believe this is due to a bug in the NFS server information. As you can see : nfs4 lookup: name too large(436) : server: -1, The server is the namelen permitted by the server, which is -1 !!! So something is going wrong with the Server connection. Debugging further. Thanks Suzuki log-message for debugged kernel. Aug 24 16:52:36 p630b kernel: ------------[ cut here ]------------ Aug 24 16:52:36 p630b kernel: kernel BUG at /usr/src/linux-2.6.22.1-16/fs/nfs/client.c:729! Aug 24 16:52:36 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1] Aug 24 16:52:36 p630b kernel: SMP NR_CPUS=128 NUMA pSeries Aug 24 16:52:36 p630b kernel: Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx scsi_transport_spi pata_sl82c105 libata scsi_mod Aug 24 16:52:36 p630b kernel: NIP: d0000000008e5c48 LR: d0000000008e5c44 CTR: c0000000000955a4 Aug 24 16:52:36 p630b kernel: REGS: c0000000fbe3b290 TRAP: 0700 Tainted: G N (2.6.22.1-16-debug) Aug 24 16:52:36 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR> CR: 24008484 XER: 00000000 Aug 24 16:52:36 p630b kernel: TASK = c000000005aea040[3888] 'mount.nfs4' THREAD: c0000000fbe38000 CPU: 0 Aug 24 16:52:36 p630b kernel: GPR00: d0000000008e5c44 c0000000fbe3b510 d00000000094e920 0000000000000033 Aug 24 16:52:36 p630b kernel: GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Aug 24 16:52:36 p630b kernel: GPR08: c000000004048090 c00000000071c478 c000000004e33bf8 0000000000000000 Aug 24 16:52:36 p630b kernel: GPR12: 000000000000000a c0000000005b6880 0000000000000000 0000000000000000 Aug 24 16:52:36 p630b kernel: GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000003 Aug 24 16:52:36 p630b kernel: GPR20: 0000000000000000 0000000000000000 c0000001fe6ef5a0 0000000000000001 Aug 24 16:52:36 p630b kernel: GPR24: c0000000fbe3b824 d0000000007169c4 c0000000fbe3b824 c0000000fbe3b670 Aug 24 16:52:36 p630b kernel: GPR28: c0000001fe629c00 c00000010392e880 d000000000948f00 0000000000000000 Aug 24 16:52:36 p630b kernel: NIP [d0000000008e5c48] nfs_probe_fsinfo+0x5c0/0x670 [nfs] Aug 24 16:52:36 p630b kernel: LR [d0000000008e5c44] nfs_probe_fsinfo+0x5bc/0x670 [nfs] Aug 24 16:52:36 p630b kernel: Call Trace: Aug 24 16:52:36 p630b kernel: [c0000000fbe3b510] [d0000000008e5c44] nfs_probe_fsinfo+0x5bc/0x670 [nfs] (unreliable) Aug 24 16:52:36 p630b kernel: [c0000000fbe3b600] [d0000000008e75bc] nfs4_create_server+0x30c/0x47c [nfs] Aug 24 16:52:36 p630b kernel: [c0000000fbe3b790] [d0000000008f131c] nfs4_get_sb+0x258/0x39c [nfs] Aug 24 16:52:36 p630b kernel: [c0000000fbe3b910] [c0000000000f82dc] vfs_kern_mount+0xf0/0x1dc Aug 24 16:52:36 p630b kernel: [c0000000fbe3b9c0] [c0000000000f8438] do_kern_mount+0x60/0x144 Aug 24 16:52:36 p630b kernel: --- Exception: 301 at copy_mount_options+0xf8/0x1ac Aug 24 16:52:36 p630b kernel: LR = .copy_mount_options+0xf0/0x1ac Aug 24 16:52:36 p630b kernel: [c0000000fbe3ba70] [c000000000116390] do_mount+0x740/0x7e8 (unreliable) Aug 24 16:52:36 p630b kernel: [c0000000fbe3bd60] [c0000000001384e0] compat_sys_mount+0x228/0x2a4 Aug 24 16:52:36 p630b kernel: [c0000000fbe3be30] [c0000000000086c8] syscall_exit+0x0/0x40 Aug 24 16:52:36 p630b kernel: Instruction dump: Aug 24 16:52:36 p630b kernel: 4e800421 e8410028 2f830000 419c002c 8001007c 2b8000ff 78040020 909d00b4 Aug 24 16:52:36 p630b kernel: 409d0018 e87e80d8 48031311 e8410028 <0fe00000> 48000000 e93e80a0 3be00000 thanks --omar The problem is coming from the server side metadata information passed to the client : NFS Server has invalid namelen set : 4294967295 ( 0xffffffff = (int) -1 ) which allows to pass the lookup/open requests containing filenames <= the value above, which leads to a ENOSPC buffer problem. Debugging the NFS server. Thanks Suzuki I have found the root cause of the problem in the nfs4 client code : The problem is with the nfs_create_server code while trying to get the server handle. This problem occurs only if the mounting remote-directory is "/". In this case we don't try to get the file handle properly. I am testing a patch to fix the issue. Thanks Suzuki The NFS4 (at least nfs4) client does the following : 1) get the root dir handle for the remote server ( which would work fine ) 2) try to get the file handle for each directory in the path using lookup requests. for eg: mount -t nfs4 server:/IMAGES/Server/Linux tmp then: there will be a root handle lookup and then followed by lookup requests for IMAGES, Server & Linux. Now while trying to get the file handles for each of these components, the "/" are *eaten up* to avoid the server overheads. Here if we specify "/" as the remote server dir, the path is completely eaten up and hence, the client thinks that it has successfully finished finding the file handle for the remote directory and returns. But we haven't established a proper server connection with the server nor we have a proper file handle ! Now the bug submitter was trying : mount -t nfs4 <server>:/ <DIR> where the "/" dir was not exported at all ! Now the "/" path makes the client think that it has got a proper connection with the server and forwards the requests to the server. Due to the lack of proper initialization, the max_namelen allowed for the files is set to -1 (which is stored as unsigned int) in the client side. So, this makes any lookup /open requests to just pass through to the nfs server. But while trying to encode the request we run out of space for large names. The fix here would be to get the file handle for "/" also, to decide whether that particular directory is exported or not ! Will attach the patch later Thanks Suzuki It seems the mounting semantics is entirely different in NFS4, when compared to the previous versions. Reading through the NFS4 protocol ( RFC ) to get more information on what should be the ideal behaviour. Thanks Suzuki Created an attachment (id=31146) --> (https://bugzilla.novell.com/attachment.cgi?id=31146) Patch to fix the problem Hi, Omar has verified this patch on p630b to fix the problem that we have been facing. Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c1 --- Comment #1 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:31:21 MST --- Created an attachment (id=172956) --> (https://bugzilla.novell.com/attachment.cgi?id=172956) var-log-messages -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913 Olaf Hering <olh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |olh@novell.com AssignedTo|power-bugs@forge.provo.novell.com |kernel-maintainers@forge.provo.novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c2 --- Comment #2 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:32:37 MST --- Created an attachment (id=172957) --> (https://bugzilla.novell.com/attachment.cgi?id=172957) dmesg-output -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c3 --- Comment #3 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:35:06 MST --- Created an attachment (id=172958) --> (https://bugzilla.novell.com/attachment.cgi?id=172958) Debug patch to get more information -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c4 --- Comment #4 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:37:29 MST --- Created an attachment (id=172959) --> (https://bugzilla.novell.com/attachment.cgi?id=172959) dmesg-output -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c5 --- Comment #5 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:39:27 MST --- Created an attachment (id=172961) --> (https://bugzilla.novell.com/attachment.cgi?id=172961) Debug patch to find who is exceeding the buffer limits -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c6 --- Comment #6 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:42:54 MST --- Created an attachment (id=172963) --> (https://bugzilla.novell.com/attachment.cgi?id=172963) var-log-message -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c7 --- Comment #7 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:44:13 MST --- Created an attachment (id=172964) --> (https://bugzilla.novell.com/attachment.cgi?id=172964) dmesg-output -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c8 --- Comment #8 from LTC BugProxy <bugproxy@us.ibm.com> 2007-09-18 02:45:36 MST --- Created an attachment (id=172965) --> (https://bugzilla.novell.com/attachment.cgi?id=172965) Patch to fix the problem -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c10 LTC BugProxy <bugproxy@us.ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- URL| |http:// --- Comment #10 from LTC BugProxy <bugproxy@us.ibm.com> 2007-10-24 05:36:11 MST --- ------- Comment From suzukikp@in.ibm.com 2007-10-24 07:31 EDT------- The following link contains a patch by Trond, which fixes the same problem. http://lkml.org/lkml/2007/9/24/402 Omar, Could you please test this patch ? Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c11 --- Comment #11 from LTC BugProxy <bugproxy@us.ibm.com> 2007-10-25 06:25:51 MST --- ------- Comment From mohd.omar@in.ibm.com 2007-10-25 08:24 EDT------- Hi suzuki,
The following link contains a patch by Trond, which fixes the same problem. http://lkml.org/lkml/2007/9/24/402 Could you please test this patch ?
Patch works fine , except some messages in /var/log/messages nfs: couldn't resolve uid 562 to string nfs: couldn't resolve uid 0 to string nfs: couldn't resolve uid 0 to string nfs: couldn't resolve uid 14279 to string nfs: couldn't resolve uid 27909767 to string --thanks omar -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c12 --- Comment #12 from LTC BugProxy <bugproxy@us.ibm.com> 2007-10-29 03:40:53 MST --- ------- Comment From suzukikp@in.ibm.com 2007-10-29 05:39 EDT------- You may have to start the idmapd service for getting rid of those messages. I would suggest we go by the patch which is making into mainline. So, I am changing this to FIX_ALREADY_AVAIL. Novell, Please let us know your thoughts on picking up the patch mentioned in the lkml link Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c13 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Status Whiteboard| |kernel:sl10.3 kernel:sles10-sp2 --- Comment #13 from Neil Brown <nfbrown@novell.com> 2007-10-29 18:10:09 MST --- The patch looks appropriate. It will be in the next update for 10.3 and I have included a similar fix for SLES10-SP2. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c14 --- Comment #14 from LTC BugProxy <bugproxy@us.ibm.com> 2007-10-30 02:36:00 MST --- ------- Comment From suzukikp@in.ibm.com 2007-10-30 04:31 EDT------- (In reply to comment #37)
------- Comment From nfbrown@novell.com 2007-10-29 18:10:09 MST------- The patch looks appropriate. It will be in the next update for 10.3 and I have included a similar fix for SLES10-SP2.
Thanks.
Just a double check. Novell could you confirm which patch has been picked up for the next update ? The patch from lkml ? OR The one which was attached initially with this report ? Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913#c17 --- Comment #17 from Neil Brown <nfbrown@novell.com> 2007-10-30 05:13:53 MST --- (In reply to comment #14 from LTC BugProxy)
Just a double check. Novell could you confirm which patch has been picked up for the next update ?
The patch from lkml ? OR The one which was attached initially with this report ?
That patch I applied to OpenSUSE 10.3 is git commit 54af3bb543c071769141387a42deaaab5074da55 which is the same as the patch referenced in comment #10 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913 User kgw@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=325913#c18 Klaus Wagner <kgw@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status Whiteboard|kernel:sl10.3 kernel:sles10-sp2 |kernel:sles10-sp2 released:kernel:sl10.3 --- Comment #18 from Klaus Wagner <kgw@novell.com> 2008-01-16 02:49:03 MST --- Just for the record (and replying to comment 14): Patch: patches.fixes/nfs-name-len-limit included, enabled and active in 10.3 update kernel 2.6.22.12-0.1 dated Nov 7, 2007 and released Nov 9, 2007 Setting Whiteboard status for 10.3 --> released. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=325913#c19 --- Comment #19 from LTC BugProxy <bugproxy@us.ibm.com> 2008-03-12 01:57:11 MST --- ------- Comment From mohd.omar@in.ibm.com 2008-03-12 03:55 EDT------- As per Comment #33. I tested on 10.3 GA release with no luck. And 11.0-Alpha2 had nfs-kernel(#42566) issue, which got fixed in factory. Need to check in Alpha3, which is suppose to release in third week of March. --Regards Omar M -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=325913#c20 --- Comment #20 from LTC BugProxy <bugproxy@us.ibm.com> 2009-02-16 04:03:47 MST --- ------- Comment From suzuki@in.ibm.com 2009-02-16 05:53 EDT------- Pavan, Could you please verify and close this one ? Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=325913 User bugproxy@us.ibm.com added comment https://bugzilla.novell.com/show_bug.cgi?id=325913#c21 --- Comment #21 from LTC BugProxy <bugproxy@us.ibm.com> 2009-02-22 22:31:36 MST --- ------- Comment From suzuki@in.ibm.com 2009-02-23 00:22 EDT------- Pavan Thanks a lot for your efforts. Closing this one. Thanks Suzuki -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com