[Bug 227613] nfsd oops in vfs_statfs

16 Feb 2007

      https://bugzilla.novell.com/show_bug.cgi?id=227613

eric@analyticinnovations.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO
      Info Provider|                            |eric@analyticinnovations.com

------- Comment #9 from eric@analyticinnovations.com  2007-02-16 15:05 MST -------
I finally came up with a way to trigger the oops.  First, here is the oops:

Feb 16 12:52:40 lxnfs00 kernel: Unable to handle kernel paging request at
virtual address 2bfae12d
Feb 16 12:52:40 lxnfs00 kernel:  printing eip:
Feb 16 12:52:40 lxnfs00 kernel: c016384d
Feb 16 12:52:40 lxnfs00 kernel: *pde = 00000000
Feb 16 12:52:40 lxnfs00 kernel: Oops: 0000 [#1]
Feb 16 12:52:40 lxnfs00 kernel: SMP
Feb 16 12:52:40 lxnfs00 kernel: last sysfs file:
/firmware/edd/int13_dev82/extensions
Feb 16 12:52:40 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl
sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod
hw_random uhci_hcd usbcore shpchp i2c_i801 i8xx_tco e1000 pci_hotplug ide_cd
i2c_core cdrom lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix
sd_mod scsi_mod ide_disk ide_core
Feb 16 12:52:40 lxnfs00 kernel: CPU:    0
Feb 16 12:52:40 lxnfs00 kernel: EIP:    0060:[<c016384d>]    Not tainted VLI
Feb 16 12:52:40 lxnfs00 kernel: EFLAGS: 00010246   (2.6.16.21-0.25-smp #1)
Feb 16 12:52:40 lxnfs00 kernel: EIP is at vfs_getattr+0x39/0x9f
Feb 16 12:52:40 lxnfs00 kernel: eax: 2bfae0f1   ebx: f1cdb6c7   ecx: fa1bcc80  
edx: c71362a0
Feb 16 12:52:40 lxnfs00 kernel: esi: e3901f70   edi: c71362a0   ebp: e3901f70  
esp: e3901ef8
Feb 16 12:52:40 lxnfs00 kernel: ds: 007b   es: 007b   ss: 0068
Feb 16 12:52:40 lxnfs00 kernel: Process rpc.mountd (pid: 14593,
threadinfo=e3900000 task=f714ecb0)
Feb 16 12:52:40 lxnfs00 kernel: Stack: <0>f6d41440 00000000 e3901f70 e3901f10
e3900000 c01638da c71362a0 f6d41440
Feb 16 12:52:40 lxnfs00 kernel:        00000000 00000000 00000000 00000000
00000001 00000000 00001000 00000008
Feb 16 12:52:40 lxnfs00 kernel:        00000000 45d5d535 00000000 45d1c909
00000000 45d1c909 00000000 00000002
Feb 16 12:52:40 lxnfs00 kernel: Call Trace:
Feb 16 12:52:40 lxnfs00 kernel:  [<c01638da>] vfs_lstat_fd+0x27/0x39
Feb 16 12:52:40 lxnfs00 kernel:  [<c0163931>] sys_lstat64+0xf/0x23
Feb 16 12:52:40 lxnfs00 kernel:  [<c01441ed>] mempool_free+0x32/0x64
Feb 16 12:52:40 lxnfs00 kernel:  [<c0103bdb>] sysenter_past_esp+0x54/0x79
Feb 16 12:52:40 lxnfs00 kernel: Code: 8b 5a 0c f6 83 4d 01 00 00 02 75 19 83 3d
44 ff 3b c0 00 74 10 8b 0d 40 ff 3b c0 ff 91 c4 00 00 00 85 c0 75 66 8b 83 94
00 00 00 <8b> 70 3c 85 f6 74 0b 8b 04 24 89 e9 89 fa ff d6 eb 4e 89 d8 89
Feb 16 12:55:00 lxnfs00 kernel:  <1>Unable to handle kernel paging request at
virtual address 2bfae125
Feb 16 12:55:00 lxnfs00 kernel:  printing eip:
Feb 16 12:55:00 lxnfs00 kernel: c0167043
Feb 16 12:55:00 lxnfs00 kernel: *pde = 00000000

Here is what I did:

    - rpc.mountd on the NFS server was started with -N 2 flag (I was trying to
disable it 
      from doing NFSv2, but I obviously didn't understand what this flag really
did)

    - On the NFS client, I added nfsvers=2 as a mount option (thinking the
mount would fail
      since I had started the rpc.mountd with -N 2).

    - On the client, I ran the mount command.  It gave no error and exited with
zero.  

    - On the client, I ran an ls on the mounted directory, and it incorrectly
appeared to be empty. 

    - On the client, I ran a df on the mounted directory.  This caused the NFS
server to oops.

    - I repeated this twice with the same results.  I also repeated it using
nfsvers=3, and
      it worked fine.

Finally, here is some annotated tethereal output:

# mount ... no errors and exits 0 (acts like it worked)
1704412 4585.960581 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call
MOUNT(100005) V:2 UDP
1704414 4585.964579 192.9.206.232 -> 192.9.206.238 MOUNT V2 NULL Call
1704416 4585.964953 192.9.206.232 -> 192.9.206.238 MOUNT V1 NULL Call
1704419 4585.966203 192.9.206.232 -> 192.9.206.238 MOUNT V1 MNT Call[Packet
size limited during capture]
1704462 4585.986441 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call
NLM(100021) V:1 UDP
1704464 4585.987191 192.9.206.232 -> 192.9.206.238 NLM V1 GRANTED Call[Packet
size limited during capture]
1704473 4585.990187 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call
NFS(100003) V:2 TCP
1704486 4585.992562 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [SYN] Seq=0
Ack=0 Win=49640 Len=0 MSS=1460
1704488 4585.992812 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=1
Ack=1 Win=49640 Len=0
1704489 4585.993186 192.9.206.232 -> 192.9.206.238 NFS V2 NULL Call[Packet size
limited during capture]
1704492 4585.993436 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK]
Seq=121 Ack=29 Win=49640 Len=0
1704493 4585.993686 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [FIN, ACK]
Seq=121 Ack=29 Win=49640 Len=0
1704495 4585.993936 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK]
Seq=122 Ack=30 Win=49640 Len=0
# ls -l on the mount point ... directory looks empty (incorrect)
1704512 4586.005808 192.9.206.232 -> 192.9.206.238 NFSACL V2 GETATTR
Call[Packet size limited during capture]
# df on the mount point ... the NFS server has oops above
1704514 4586.006303 192.9.206.232 -> 192.9.206.238 NFS V2 STATFS Call[Packet
size limited during capture]
1704576 4586.047029 192.9.206.232 -> 192.9.206.238 TCP 1009 > nfs [ACK] Seq=865
Ack=553 Win=49640 Len=0

What do you make of this?  Can all the different oops be explained by one
bug which I've currently found one way to trigger, or do you think I'm
looking at more than one bug?

Another idea to explain some of what I observed.  Remember I said the oops
would happen several times in a row?  I found a number of clients which are
set to use NFSv3, but which (according to man page) will fall back to
NFSv2 (and had done so a little bit according to nfsstat).  I wonder if
after the NFS server crashed and I brought it back up, the clients were
trying to connect with NFSv3, but some failed b/c the server wasn't quite up
yet, and then a second later succeeded on their fallback to NFSv2.  Then,
they proceeded to crash the server again due to whatever NFSv2 bug we're
looking at.  

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 227613] nfsd oops in vfs_statfs

bugzilla_noreply＠novell.com