New subject: [Bug 325913] NFS4 stress test raises "Call Trace" in opensuse10. 3alpha5 on power5

18 Sep 2007

      https://bugzilla.novell.com/show_bug.cgi?id=325913#c1

           Summary: NFS4 stress test raises "Call Trace" in
                    opensuse10.3alpha5 on power5
           Product: openSUSE 10.3
           Version: Beta 2
          Platform: PowerPC-64
        OS/Version: openSUSE 10.3
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: Kernel
        AssignedTo: power-bugs@forge.provo.novell.com
        ReportedBy: bugproxy@us.ibm.com
         QAContact: qa@suse.de
                CC: power-bugs@forge.provo.novell.com
          Found By: ---
        Partner ID: LTC 36199

LTC Owner is: suzukikp@in.ibm.com
LTC Originator is: mohd.omar@in.ibm.com

---problem description----------
nfs4 stress(fsstress) test raises "Call Trace"and "Oops" in opensuse10.3alpha5
on power5
...........

Jul  9 05:59:35 p630b kernel: RESERVE_SPACE(761) failed in function
encode_lookup
Jul  9 05:59:35 p630b kernel: ------------[ cut here ]------------
Jul  9 05:59:35 p630b kernel: kernel BUG at fs/nfs/nfs4xdr.c:849!
Jul  9 05:59:35 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1]
Jul  9 05:59:35 p630b kernel: SMP NR_CPUS=128 NUMA pSeries
Jul  9 05:59:35 p630b kernel: Modules linked in: nfs lockd nfs_acl sunrpc ipv6
apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx scsi_transport_spi
pata_sl82c105 libata scsi_mod
Jul  9 05:59:35 p630b kernel: NIP: d00000000086fc68 LR: d00000000086fc58 CTR:
c00000000008e408
Jul  9 05:59:35 p630b kernel: REGS: c0000000d7fe3210 TRAP: 0700   Tainted: G   
  N  (2.6.22-rc4-git3-2-ppc64)
Jul  9 05:59:35 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR>  CR: 28000484
 XER: 00000000
Jul  9 05:59:35 p630b kernel: TASK = c00000002fc02040[4120] 'fsstress' THREAD:
c0000000d7fe0000 CPU: 3
Jul  9 05:59:35 p630b kernel: GPR00: 0000000000000001 c0000000d7fe3490
d0000000008b3000 0000000000000037
Jul  9 05:59:35 p630b kernel: GPR04: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Jul  9 05:59:35 p630b kernel: GPR08: c0000001fe7a8040 0000000000000000
c000000005c87bf8 c0000000006493b0
Jul  9 05:59:35 p630b kernel: GPR12: 000000000000000a c0000000005e8e80
0000000000000000 0000000000000000
Jul  9 05:59:35 p630b kernel: GPR16: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Jul  9 05:59:35 p630b kernel: GPR20: 0000000000000000 0000000000000000
c0000000d7fe3a30 d0000000007c6424
Jul  9 05:59:35 p630b kernel: GPR24: c0000001d31409a0 c0000000f064e618
c0000000d7fe3920 c0000001d31409a0
Jul  9 05:59:35 p630b kernel: GPR28: 00000000000002f1 0000000028000444
d0000000008ae980 0000000000000000
Jul  9 05:59:35 p630b kernel: NIP [d00000000086fc68] .encode_lookup+0x6c/0xc0
[nfs]
Jul  9 05:59:35 p630b kernel: LR [d00000000086fc58] .encode_lookup+0x5c/0xc0
[nfs]
Jul  9 05:59:35 p630b kernel: Call Trace:
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3490] [d00000000086fc58]
encode_lookup+0x5c/0xc0 [nfs] (unreliable)
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3530] [d000000000871b38]
nfs4_xdr_enc_lookup+0x78/0xbc [nfs]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3600] [d00000000078b634]
rpcauth_wrap_req+0xd8/0x108 [sunrpc]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe36b0] [d000000000782688]
call_transmit+0x228/0x2d8 [sunrpc]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3750] [d00000000078a330]
__rpc_execute+0xd4/0x35c [sunrpc]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe37f0] [d000000000782fc0]
rpc_call_sync+0xb4/0x104 [sunrpc]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe38a0] [d00000000086a3bc]
nfs4_proc_lookup+0x110/0x2cc [nfs]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe39c0] [d00000000084d92c]
nfs_lookup+0x158/0x33c [nfs]
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3bc0] [c0000000000faf14]
lookup_hash+0xfc/0x140
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3c60] [c0000000000fd7ec]
sys_renameat+0x164/0x230
Jul  9 05:59:35 p630b kernel: [c0000000d7fe3e30] [c0000000000086c8]
syscall_exit+0x0/0x40
Jul  9 05:59:35 p630b kernel: Instruction dump:
Jul  9 05:59:35 p630b kernel: e8410028 7fa4eb78 7c7f1b79 7fa00026 40820014
e8be83a8 e87e8350 4800c245
Jul  9 05:59:35 p630b kernel: e8410028 7fa80120 7c000026 54001ffe <0b000000>
3800000f 7b850020 387f0008
Jul  9 06:45:34 p630b syslog-ng[2750]: STATS: dropped 0

---steps followed::-----

     1. Install ltp-full-20070630.tgz in client(p630b)
     2. create ext2,ext3 partitions, export file systems and setup nfs-server
in
server(p520b) and mount via nfs4 in client (p630b). In client run the
following..
     3. ltp-full-20070630/testcases/kernel/fs/fsstress/fsstress -d
/NFSVCLI/NFSV1 -l 0 -n 1000 -p 2 -r
     4. ltp-full-20070630/testcases/kernel/fs/fsstress/fsstress -d
/NFSVCLI/NFSV3 -l 0 -n 1000 -p 2 -r 

---uname -a ---------
server::
Linux linux-0gxvrp 2.6.22-rc4-git3-2-ppc64 #1 SMP 2007/06/12 15:00:04 UTC ppc64
ppc64 ppc64 GNU/Linux

Client:: 
Linux p630b 2.6.22-rc4-git3-2-ppc64 #1 SMP 2007/06/12 15:00:04 UTC ppc64 ppc64
ppc64 GNU/Linux

-----machine info----

server:: p520  (power5)
client:: p630  (power4)

----Did the system produce an OOPS message on the console?------

Jul  9 05:59:35 p630b kernel: kernel BUG at fs/nfs/nfs4xdr.c:849!
Jul  9 05:59:35 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1]

Created an attachment (id=29104)
 --> (https://bugzilla.novell.com/attachment.cgi?id=29104)
var-log-messages

Created an attachment (id=29105)
 --> (https://bugzilla.novell.com/attachment.cgi?id=29105)
dmesg-output

hi suzuki, 

 I executed fsstress with NFSv4 mounted partitions  in opensuse10.3alpha6  with
kernel : 2.6.22.1-2-ppc64 
machine: p520b(p5)

Here the results
------------------

kernel BUG at fs/nfs/nfs4xdr.c:480!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop
dm_mod fuse e1000 sg sd_mod pata_pdc2027x ipr libata scsi_mod firmware_class
NIP: d0000000006441e0 LR: d0000000006441d4 CTR: d00000000041e934
REGS: c00000006af87730 TRAP: 0700   Tainted: G      N  (2.6.22.1-2-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24008044  XER: 00000000
TASK = c00000007198e820[3106] 'rpciod/0' THREAD: c00000006af84000 CPU: 0
GPR00: 0000000000000001 c00000006af879b0 d000000000683080 0000000000000000 
GPR04: 0000000000000240 c000000065b34400 bc91000000000000 bc918cf6f2f258f6 
GPR08: 0005000000000000 c00000006698ab0c c00000006698a8c8 c00000006af87b68 
GPR12: d00000000064df98 c0000000005f5880 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000001c00000 
GPR20: c0000000005bfea0 00000000021c0110 0000000000272000 c0000000005c0110 
GPR24: 00000000021bfea0 c00000006444e138 c000000063395808 c00000006af87b68 
GPR28: c000000065b34400 0000000000000239 d00000000067ea00 c000000065352250 
NIP [d0000000006441e0] .encode_string+0x34/0x60 [nfs]
LR [d0000000006441d4] .encode_string+0x28/0x60 [nfs]
Call Trace:
[c00000006af87a40] [d000000000644784] .encode_open+0x3b0/0x3ec [nfs]
[c00000006af87ae0] [d0000000006448f4] .nfs4_xdr_enc_open+0x88/0xf0 [nfs]
[c00000006af87bb0] [d000000000413634] .rpcauth_wrap_req+0xd8/0x108 [sunrpc]
[c00000006af87c60] [d00000000040a688] .call_transmit+0x228/0x2d8 [sunrpc]
[c00000006af87d00] [d000000000412330] .__rpc_execute+0xd4/0x35c [sunrpc]
[c00000006af87da0] [c00000000008f864] .run_workqueue+0x118/0x208
[c00000006af87e40] [c0000000000908b4] .worker_thread+0xf0/0x114
[c00000006af87f00] [c0000000000953f4] .kthread+0x78/0xc4
[c00000006af87f90] [c000000000028a00] .kernel_thread+0x4c/0x68
Instruction dump:
fb81ffe0 fba1ffe8 7c9d2378 38840004 f8010010 f821ff71 7cbc2b78 78840020 
48009dc9 e8410028 7c600074 7800d182 <0b000000> 7f84e378 7fa5eb78 4800908d 

thanks
omar

hi suzuki,

test is still running and throwed one more Call trace,

kernel BUG at fs/dcache.c:652!
Oops: Exception in kernel mode, sig: 5 [#2]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop
dm_mod fuse e1000 sg sd_mod pata_pdc2027x ipr libata scsi_mod firmware_class
NIP: c00000000010dc24 LR: c00000000010dc20 CTR: c0000000000955b4
REGS: c000000072f97540 TRAP: 0700   Tainted: G      N  (2.6.22.1-2-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000084  XER: 00000006
TASK = c000000072f8a040[8] 'events/1' THREAD: c000000072f94000 CPU: 1
GPR00: c00000000010dc20 c000000072f977c0 c00000000072ae40 0000000000000288 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR08: c0000000021bb090 c000000000760478 c000000072993bf8 0000000000000000 
GPR12: 0000000000004000 c0000000005f5a80 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000001c00000 
GPR20: c0000000005bfea0 00000000021c0110 c000000072f97ce0 d00000000066e938 
GPR24: c000000072f97c10 c000000000752980 0000000000000000 0000000000000000 
GPR28: 0000000000000006 c000000065352220 c000000000679e88 c0000000659ef7e0 
NIP [c00000000010dc24] .shrink_dcache_for_umount_subtree+0x240/0x3a4
LR [c00000000010dc20] .shrink_dcache_for_umount_subtree+0x23c/0x3a4
Call Trace:
[c000000072f977c0] [c00000000010dc20]
shrink_dcache_for_umount_subtree+0x23c/0x3a4 (unreliable)
[c000000072f97850] [c00000000010e9e4] .shrink_dcache_for_umount+0x54/0x78
[c000000072f978d0] [c0000000000f7e20] .generic_shutdown_super+0x3c/0x1b8
[c000000072f97960] [c0000000000f800c] .kill_anon_super+0x24/0x88
[c000000072f979f0] [d00000000062574c] .nfs_kill_super+0x20/0x44 [nfs]
[c000000072f97a80] [c0000000000f81a8] .deactivate_super+0xf0/0x144
[c000000072f97b10] [c000000000114824] .mntput_no_expire+0x9c/0x138
[c000000072f97ba0] [c000000000115050] .expire_mount_list+0x1b8/0x22c
[c000000072f97c70] [c000000000116d18] .mark_mounts_for_expiry+0xf8/0x13c
[c000000072f97d10] [d00000000062f350] .nfs_expire_automounts+0x24/0x60 [nfs]
[c000000072f97da0] [c00000000008f864] .run_workqueue+0x118/0x208
[c000000072f97e40] [c0000000000908b4] .worker_thread+0xf0/0x114
[c000000072f97f00] [c0000000000953f4] .kthread+0x78/0xc4
[c000000072f97f90] [c000000000028a00] .kernel_thread+0x4c/0x68
Instruction dump:
e8a90040 e93d0090 80fd0000 7fa4eb78 e87e80c0 e8dd0038 e9690030 7ce707b4 
39290228 e90b0000 4bf6bd09 60000000 <0fe00000> 48000000 ebfd0028 7fbfe800 

thanks
--omar

static void encode_string(struct xdr_stream *xdr, unsigned int len, const char
*str)
{
        __be32 *p;

        p = xdr_reserve_space(xdr, 4 + len);
        BUG_ON(p == NULL); <<<--- BUG_ON hit here ..

__be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
{
        __be32 *p = xdr->p;
        __be32 *q;

        /* align nbytes on the next 32-bit boundary */
        nbytes += 3;
        nbytes &= ~3;
        q = p + (nbytes >> 2);
        if (unlikely(q > xdr->end || q < p))
                return NULL; <<<-- NULL is returned from here.

This occurs when there is not enough space in the XDR buffer to reserve for the
requested size.

Omar,

Is it possible to enable xmon and drop the system to xmon when you hit the
BUG_ON ?

Thanks

Suzuki

Created an attachment (id=30198)
 --> (https://bugzilla.novell.com/attachment.cgi?id=30198)
Debug patch to get more information

Omar,

Please build the kernel with the debug patch attached here and run the tests.
When you hit the problem, please upload the dmesg o/p.

Thanks

Suzuki

Created an attachment (id=30437)
 --> (https://bugzilla.novell.com/attachment.cgi?id=30437)
dmesg-output

I added the patch and compiled the kernel. Performed the NFS4 stress tests, It
generated a Call Trace..

------------[ cut here ]------------
kernel BUG at /usr/src/linux-2.6.22.1-10/net/sunrpc/xdr.c:480!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: nfs ipv6 nfsd exportfs nfs_acl lockd sunrpc apparmor loop
dm_mod fuse e100 mii sl82c105 sg sd_mod sym53c8xx scsi_transport_spi
pata_sl82c105 libata scsi_mod
NIP: d000000000113994 LR: d000000000113990 CTR: c0000000000955b4
REGS: c0000001eed9f190 TRAP: 0700   Tainted: G      N 
(2.6.22.1-debug-10-debug)
MSR: 9000000000029032 <EE,ME,IR,DR>  CR: 28000484  XER: 00000000
TASK = c0000001fd103820[30423] 'fsstress' THREAD: c0000001eed9c000 CPU: 0
GPR00: d000000000113990 c0000001eed9f410 d000000000147600 0000000000000065 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR08: c000000004048090 c00000000071c478 c0000001fc0fbbf8 0000000000000000 
GPR12: 000000000000000a c0000000005b8880 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 c0000001eed9fa30 d0000000001439ac 
GPR24: c0000000fa56f730 c0000000f9176270 c0000001eed9f920 c0000000fa56f730 
GPR28: 00000000000002ef 00000000000002f7 d000000000144d80 c0000001eed9f5b8 
NIP [d000000000113994] .xdr_reserve_space+0x60/0xb8 [sunrpc]
LR [d000000000113990] .xdr_reserve_space+0x5c/0xb8 [sunrpc]
Call Trace:
[c0000001eed9f410] [d000000000113990] .xdr_reserve_space+0x5c/0xb8 [sunrpc]
(unreliable)
[c0000001eed9f490] [d000000000228c38] .encode_lookup+0x3c/0xc0 [nfs]
[c0000001eed9f530] [d00000000022ab38] .nfs4_xdr_enc_lookup+0x78/0xbc [nfs]
[c0000001eed9f600] [d000000000108634] .rpcauth_wrap_req+0xd8/0x108 [sunrpc]
[c0000001eed9f6b0] [d0000000000ff688] .call_transmit+0x228/0x2d8 [sunrpc]
[c0000001eed9f750] [d000000000107330] .__rpc_execute+0xd4/0x35c [sunrpc]
[c0000001eed9f7f0] [d0000000000fffc0] .rpc_call_sync+0xb4/0x104 [sunrpc]
[c0000001eed9f8a0] [d0000000002233bc] .nfs4_proc_lookup+0x110/0x2cc [nfs]
[c0000001eed9f9c0] [d00000000020692c] .nfs_lookup+0x158/0x33c [nfs]
[c0000001eed9fbc0] [c000000000101ac0] .lookup_hash+0xfc/0x140
[c0000001eed9fc60] [c000000000104398] .sys_renameat+0x164/0x230
[c0000001eed9fe30] [c0000000000086c8] syscall_exit+0x0/0x40
Instruction dump:
7d2b2a14 7fa90040 419d000c 7fa95840 40bc0028 e92a0008 e87e8018 7d645b78 
80e90034 80c90030 48005d05 e8410028 <0fe00000> 48000000 f9230000 60000000 
----------------
thanks

Here is the relevant information I got from the debug patch :

 xdr_reserve_space : FAILs : P: c00000000fd39084, nbytes: 760, xdr.buflen: 560,
current len : 124
------------[ cut here ]------------
kernel BUG at /usr/src/linux-2.6.22.1-10/net/sunrpc/xdr.c:480!
Oops: Exception in kernel mode, sig: 5 [#1]

XDR is trying to reserve 760 bytes in the XDR Buffer which has a capacity fo
560, out of which 124 bytes are already occupied.

Debugging further.

Thanks

Suzuki

Created an attachment (id=30498)
 --> (https://bugzilla.novell.com/attachment.cgi?id=30498)
Debug patch to find who is exceeding the buffer limits

We are facing a problem with buffer running out. This could be either from the
RPC headers which gets added before the control reaches the NFS code or may be
from the NFS code itself. 

The attached debug patch will detect if the RPC is exceeding its limits.

Omar,

Could you please run the tests with the two debug patches ? ( including this
one and the previous one )

Please let us know the test results.

Thanks

Suzuki

The problem was hit again :

Aug 21 17:18:37 p630b kernel:  xdr_reserve_space : FAILs : P: c0000001ef152884,
nbytes: 468, xdr.buflen: 560, current len : 124
Aug 21 17:18:37 p630b kernel: ------------[ cut here ]------------
Aug 21 17:18:37 p630b kernel: kernel BUG at
/usr/src/linux-2.6.22.1-16/net/sunrpc/xdr.c:480!

With the debug kernel.

This indicates that, the NFS is passing a larger data to encode which can not
be
held in the buffer.

Debugging further.

Thanks

Suzuki

I have built a new kernel with probes to check if the NFS is not checking the
size of the name being looked up/opened.

Will update the test results.

Thanks

Suzuki

Created an attachment (id=30570)
 --> (https://bugzilla.novell.com/attachment.cgi?id=30570)
var-log-message

Hi suzuki,
(In reply to comment #15)
...
I have built a new kernel with probes to check if the NFS is not checking the
...
size of the name being looked up/opened.
New build kernel throws 

Aug 22 11:48:19 p630b kernel: kernel BUG at
/usr/src/linux-2.6.22.1-16/net/sunrpc/xdr.c:480!
Aug 22 11:48:19 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1]
Aug 22 11:48:19 p630b kernel: SMP NR_CPUS=128 NUMA pSeries
Aug 22 11:48:20 p630b kernel: Modules linked in: nfs ipv6 nfsd exportfs nfs_acl
lockd sunrpc apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx
scsi_transport_spi pata_sl82c105 libata scsi_mod
Aug 22 11:48:20 p630b kernel: NIP: d0000000006e6a08 LR: d0000000006e6a04 CTR:
0000000000000001
Aug 22 11:48:20 p630b kernel: REGS: c0000001e6142fa0 TRAP: 0700   Tainted: G   
  N  (2.6.22.1-16-debug)
Aug 22 11:48:20 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR>  CR: 28008424
 XER: 00000000

--thanks
-omar

Here is the debug information that I got  from the debug probes. As you can
see,
we are trying to pass large names for file/dir, which is not checked anywhere
else.

Aug 22 11:48:16 p630b kernel: nfs4 lookup: name too large(285) :
lfXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Aug 22 11:48:16 p630b kernel: Call Trace:
Aug 22 11:48:16 p630b kernel: [c0000001e611f800] [c00000000000f33c]
show_stack+0x68/0x1b0 (unreliable)
Aug 22 11:48:16 p630b kernel: [c0000001e611f8a0] [d000000000905328]
nfs4_proc_lookup+0x7c/0x2f0 [nfs]
Aug 22 11:48:16 p630b kernel: [c0000001e611f9c0] [d0000000008e892c]
nfs_lookup+0x158/0x33c [nfs]
Aug 22 11:48:16 p630b kernel: [c0000001e611fbc0] [c000000000101ab0]
lookup_hash+0xfc/0x140
Aug 22 11:48:16 p630b kernel: [c0000001e611fc60] [c000000000104388]
sys_renameat+0x164/0x230
Aug 22 11:48:16 p630b kernel: [c0000001e611fe30] [c0000000000086c8]
syscall_exit+0x0/0x40
Aug 22 11:48:18 p630b kernel: nfs4 open: Name too long to encode : 799 (
f1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX )
Aug 22 11:48:18 p630b kernel: Call Trace:
Aug 22 11:48:18 p630b kernel: [c0000001e6143330] [c00000000000f33c]
show_stack+0x68/0x1b0 (unreliable)
Aug 22 11:48:18 p630b kernel: [c0000001e61433d0] [d00000000090f900]
nfs4_xdr_enc_open+0x70/0x138 [nfs]
Aug 22 11:48:18 p630b kernel: [c0000001e61434b0] [d0000000006db6a8]
rpcauth_wrap_req+0xd8/0x108 [sunrpc]
Aug 22 11:48:18 p630b kernel: [c0000001e6143560] [d0000000006d26f8]
call_transmit+0x298/0x34c [sunrpc]
Aug 22 11:48:18 p630b kernel: [c0000001e6143610] [d0000000006da3a4]
__rpc_execute+0xd4/0x35c [sunrpc]
Aug 22 11:48:18 p630b kernel: [c0000001e61436b0] [d0000000006da6d0]
rpc_run_task+0x54/0x70 [sunrpc]
Aug 22 11:48:18 p630b kernel: [c0000001e6143740] [d000000000904a44]
_nfs4_proc_open+0x70/0x238 [nfs]
Aug 22 11:48:18 p630b kernel: [c0000001e61437f0] [d000000000904f30]
nfs4_do_open+0xe8/0x208 [nfs]
Aug 22 11:48:18 p630b kernel: [c0000001e61438c0] [d0000000009069a4]
nfs4_atomic_open+0xd8/0x19c [nfs]
Aug 22 11:48:18 p630b kernel: [c0000001e61439c0] [d0000000008ea50c]
nfs_atomic_lookup+0x128/0x228 [nfs]
Aug 22 11:48:19 p630b kernel: [c0000001e6143a60] [c000000000101ab0]
lookup_hash+0xfc/0x140
Aug 22 11:48:19 p630b kernel: [c0000001e6143b00] [c0000000001050f8]
open_namei+0x134/0x86c
Aug 22 11:48:19 p630b kernel: [c0000001e6143bf0] [c0000000000f3848]
do_filp_open+0x38/0x70
Aug 22 11:48:19 p630b kernel: [c0000001e6143d10] [c0000000000f38f4]
do_sys_open+0x74/0x130
Aug 22 11:48:19 p630b kernel: [c0000001e6143dc0] [c000000000015270]
compat_sys_creat+0x14/0x28
Aug 22 11:48:19 p630b kernel: [c0000001e6143e30] [c0000000000086c8]
syscall_exit+0x0/0x40

Created an attachment (id=30651)
 --> (https://bugzilla.novell.com/attachment.cgi?id=30651)
dmesg-output

suzuki , 
 check the logs for yesterdays test.

thanks
--omar

Hi,

I believe this is due to a bug in the NFS server information.

As you can see :

nfs4 lookup: name too large(436) : server: -1,

The server is the namelen permitted by the server, which is -1 !!! So something
is going wrong with the Server connection. 

Debugging further.

Thanks

Suzuki

log-message for debugged kernel.

Aug 24 16:52:36 p630b kernel: ------------[ cut here ]------------
Aug 24 16:52:36 p630b kernel: kernel BUG at
/usr/src/linux-2.6.22.1-16/fs/nfs/client.c:729!
Aug 24 16:52:36 p630b kernel: Oops: Exception in kernel mode, sig: 5 [#1]
Aug 24 16:52:36 p630b kernel: SMP NR_CPUS=128 NUMA pSeries
Aug 24 16:52:36 p630b kernel: Modules linked in: nfs ipv6 nfsd exportfs nfs_acl
lockd sunrpc apparmor loop dm_mod fuse e100 mii sg sd_mod sym53c8xx
scsi_transport_spi pata_sl82c105 libata scsi_mod
Aug 24 16:52:36 p630b kernel: NIP: d0000000008e5c48 LR: d0000000008e5c44 CTR:
c0000000000955a4
Aug 24 16:52:36 p630b kernel: REGS: c0000000fbe3b290 TRAP: 0700   Tainted: G   
  N  (2.6.22.1-16-debug)
Aug 24 16:52:36 p630b kernel: MSR: 9000000000029032 <EE,ME,IR,DR>  CR: 24008484
 XER: 00000000
Aug 24 16:52:36 p630b kernel: TASK = c000000005aea040[3888] 'mount.nfs4'
THREAD:
c0000000fbe38000 CPU: 0
Aug 24 16:52:36 p630b kernel: GPR00: d0000000008e5c44 c0000000fbe3b510
d00000000094e920 0000000000000033 
Aug 24 16:52:36 p630b kernel: GPR04: 0000000000000000 0000000000000000
0000000000000000 0000000000000000 
Aug 24 16:52:36 p630b kernel: GPR08: c000000004048090 c00000000071c478
c000000004e33bf8 0000000000000000 
Aug 24 16:52:36 p630b kernel: GPR12: 000000000000000a c0000000005b6880
0000000000000000 0000000000000000 
Aug 24 16:52:36 p630b kernel: GPR16: 0000000000000000 0000000000000000
0000000000000000 0000000000000003 
Aug 24 16:52:36 p630b kernel: GPR20: 0000000000000000 0000000000000000
c0000001fe6ef5a0 0000000000000001 
Aug 24 16:52:36 p630b kernel: GPR24: c0000000fbe3b824 d0000000007169c4
c0000000fbe3b824 c0000000fbe3b670 
Aug 24 16:52:36 p630b kernel: GPR28: c0000001fe629c00 c00000010392e880
d000000000948f00 0000000000000000 
Aug 24 16:52:36 p630b kernel: NIP [d0000000008e5c48]
nfs_probe_fsinfo+0x5c0/0x670 [nfs]
Aug 24 16:52:36 p630b kernel: LR [d0000000008e5c44]
nfs_probe_fsinfo+0x5bc/0x670 [nfs]
Aug 24 16:52:36 p630b kernel: Call Trace:
Aug 24 16:52:36 p630b kernel: [c0000000fbe3b510] [d0000000008e5c44]
nfs_probe_fsinfo+0x5bc/0x670 [nfs] (unreliable)
Aug 24 16:52:36 p630b kernel: [c0000000fbe3b600] [d0000000008e75bc]
nfs4_create_server+0x30c/0x47c [nfs]
Aug 24 16:52:36 p630b kernel: [c0000000fbe3b790] [d0000000008f131c]
nfs4_get_sb+0x258/0x39c [nfs]
Aug 24 16:52:36 p630b kernel: [c0000000fbe3b910] [c0000000000f82dc]
vfs_kern_mount+0xf0/0x1dc
Aug 24 16:52:36 p630b kernel: [c0000000fbe3b9c0] [c0000000000f8438]
do_kern_mount+0x60/0x144
Aug 24 16:52:36 p630b kernel: --- Exception: 301 at
copy_mount_options+0xf8/0x1ac
Aug 24 16:52:36 p630b kernel:     LR = .copy_mount_options+0xf0/0x1ac
Aug 24 16:52:36 p630b kernel: [c0000000fbe3ba70] [c000000000116390]
do_mount+0x740/0x7e8 (unreliable)
Aug 24 16:52:36 p630b kernel: [c0000000fbe3bd60] [c0000000001384e0]
compat_sys_mount+0x228/0x2a4
Aug 24 16:52:36 p630b kernel: [c0000000fbe3be30] [c0000000000086c8]
syscall_exit+0x0/0x40
Aug 24 16:52:36 p630b kernel: Instruction dump:
Aug 24 16:52:36 p630b kernel: 4e800421 e8410028 2f830000 419c002c 8001007c
2b8000ff 78040020 909d00b4 
Aug 24 16:52:36 p630b kernel: 409d0018 e87e80d8 48031311 e8410028 <0fe00000>
48000000 e93e80a0 3be00000 

thanks
--omar

The problem is coming from the server side metadata information passed to the
client :

NFS Server has invalid namelen set : 4294967295 ( 0xffffffff = (int) -1  )

which allows to pass the lookup/open requests containing filenames <= the value
above, which leads to a ENOSPC buffer problem.

Debugging the NFS server.

Thanks

Suzuki

I have found the root cause of the problem in the nfs4 client code :

The problem is with the nfs_create_server code while trying to get the server
handle. This problem occurs only if the mounting remote-directory is "/". In
this case we don't try to get the file handle properly. I am testing a patch to
fix the issue. 

Thanks

Suzuki

The NFS4 (at least nfs4) client does the following :

1) get the root dir handle for the remote server ( which would work fine )
2) try to get the file handle for each directory in the path using lookup
requests.

 for eg: mount -t nfs4 server:/IMAGES/Server/Linux tmp

 then: there will be a root handle lookup and then followed by lookup requests
for IMAGES, Server & Linux.

Now while trying to get the file handles for each of these components, the "/"
are *eaten up* to avoid the server overheads. Here if we specify "/" as the
remote server dir, the path is completely eaten up and hence, the client thinks
that it has successfully finished finding the file handle for the remote
directory and returns. But we haven't established a proper server connection
with the server nor we have a proper file handle !

Now the bug submitter was trying :

mount -t nfs4 <server>:/ <DIR>

where the "/" dir was not exported at all ! 

Now the "/" path makes the client think that it has got a proper connection
with
the server and forwards the requests to the server. Due to the lack of proper
initialization, the max_namelen allowed for the files is set to -1 (which is
stored as unsigned int) in the client side. 

So, this makes any lookup /open requests to just pass through to the nfs
server.
But while trying to encode the request we run out of space for large names.

The fix here would be to get the file handle for "/" also, to decide whether
that particular directory is exported or not !

Will attach the patch later 

Thanks

Suzuki

It seems the mounting semantics is entirely different in NFS4, when compared to
the previous versions. 

Reading through the NFS4 protocol ( RFC ) to get more information on what
should
be the ideal behaviour.

Thanks

Suzuki

Created an attachment (id=31146)
 --> (https://bugzilla.novell.com/attachment.cgi?id=31146)
Patch to fix the problem

Hi,

Omar has verified this patch on p630b to fix the problem that we have been
facing.

Thanks

Suzuki

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[Bug 325913] New: NFS4 stress test raises "Call Trace" in opensuse10. 3alpha5 on power5

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

bugzilla_noreply＠novell.com

tags

participants (1)