https://bugzilla.novell.com/show_bug.cgi?id=227613 eric@analyticinnovations.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |eric@analyticinnovations.com ------- Comment #9 from eric@analyticinnovations.com 2007-02-16 15:05 MST ------- I finally came up with a way to trigger the oops. First, here is the oops: Feb 16 12:52:40 lxnfs00 kernel: Unable to handle kernel paging request at virtual address 2bfae12d Feb 16 12:52:40 lxnfs00 kernel: printing eip: Feb 16 12:52:40 lxnfs00 kernel: c016384d Feb 16 12:52:40 lxnfs00 kernel: *pde = 00000000 Feb 16 12:52:40 lxnfs00 kernel: Oops: 0000 [#1] Feb 16 12:52:40 lxnfs00 kernel: SMP Feb 16 12:52:40 lxnfs00 kernel: last sysfs file: /firmware/edd/int13_dev82/extensions Feb 16 12:52:40 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod hw_random uhci_hcd usbcore shpchp i2c_i801 i8xx_tco e1000 pci_hotplug ide_cd i2c_core cdrom lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core Feb 16 12:52:40 lxnfs00 kernel: CPU: 0 Feb 16 12:52:40 lxnfs00 kernel: EIP: 0060:[<c016384d>] Not tainted VLI Feb 16 12:52:40 lxnfs00 kernel: EFLAGS: 00010246 (2.6.16.21-0.25-smp #1) Feb 16 12:52:40 lxnfs00 kernel: EIP is at vfs_getattr+0x39/0x9f Feb 16 12:52:40 lxnfs00 kernel: eax: 2bfae0f1 ebx: f1cdb6c7 ecx: fa1bcc80 edx: c71362a0 Feb 16 12:52:40 lxnfs00 kernel: esi: e3901f70 edi: c71362a0 ebp: e3901f70 esp: e3901ef8 Feb 16 12:52:40 lxnfs00 kernel: ds: 007b es: 007b ss: 0068 Feb 16 12:52:40 lxnfs00 kernel: Process rpc.mountd (pid: 14593, threadinfo=e3900000 task=f714ecb0) Feb 16 12:52:40 lxnfs00 kernel: Stack: <0>f6d41440 00000000 e3901f70 e3901f10 e3900000 c01638da c71362a0 f6d41440 Feb 16 12:52:40 lxnfs00 kernel: 00000000 00000000 00000000 00000000 00000001 00000000 00001000 00000008 Feb 16 12:52:40 lxnfs00 kernel: 00000000 45d5d535 00000000 45d1c909 00000000 45d1c909 00000000 00000002 Feb 16 12:52:40 lxnfs00 kernel: Call Trace: Feb 16 12:52:40 lxnfs00 kernel: [<c01638da>] vfs_lstat_fd+0x27/0x39 Feb 16 12:52:40 lxnfs00 kernel: [<c0163931>] sys_lstat64+0xf/0x23 Feb 16 12:52:40 lxnfs00 kernel: [<c01441ed>] mempool_free+0x32/0x64 Feb 16 12:52:40 lxnfs00 kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79 Feb 16 12:52:40 lxnfs00 kernel: Code: 8b 5a 0c f6 83 4d 01 00 00 02 75 19 83 3d 44 ff 3b c0 00 74 10 8b 0d 40 ff 3b c0 ff 91 c4 00 00 00 85 c0 75 66 8b 83 94 00 00 00 <8b> 70 3c 85 f6 74 0b 8b 04 24 89 e9 89 fa ff d6 eb 4e 89 d8 89 Feb 16 12:55:00 lxnfs00 kernel: <1>Unable to handle kernel paging request at virtual address 2bfae125 Feb 16 12:55:00 lxnfs00 kernel: printing eip: Feb 16 12:55:00 lxnfs00 kernel: c0167043 Feb 16 12:55:00 lxnfs00 kernel: *pde = 00000000 Here is what I did: - rpc.mountd on the NFS server was started with -N 2 flag (I was trying to disable it from doing NFSv2, but I obviously didn't understand what this flag really did) - On the NFS client, I added nfsvers=2 as a mount option (thinking the mount would fail since I had started the rpc.mountd with -N 2). - On the client, I ran the mount command. It gave no error and exited with zero. - On the client, I ran an ls on the mounted directory, and it incorrectly appeared to be empty. - On the client, I ran a df on the mounted directory. This caused the NFS server to oops. - I repeated this twice with the same results. I also repeated it using nfsvers=3, and it worked fine. Finally, here is some annotated tethereal output: # mount ... no errors and exits 0 (acts like it worked) 1704412 4585.960581 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call MOUNT(100005) V:2 UDP 1704414 4585.964579 192.9.206.232 -> 192.9.206.238 MOUNT V2 NULL Call 1704416 4585.964953 192.9.206.232 -> 192.9.206.238 MOUNT V1 NULL Call 1704419 4585.966203 192.9.206.232 -> 192.9.206.238 MOUNT V1 MNT Call[Packet size limited during capture] 1704462 4585.986441 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call NLM(100021) V:1 UDP 1704464 4585.987191 192.9.206.232 -> 192.9.206.238 NLM V1 GRANTED Call[Packet size limited during capture] 1704473 4585.990187 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call NFS(100003) V:2 TCP 1704486 4585.992562 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [SYN] Seq=0 Ack=0 Win=49640 Len=0 MSS=1460 1704488 4585.992812 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=1 Ack=1 Win=49640 Len=0 1704489 4585.993186 192.9.206.232 -> 192.9.206.238 NFS V2 NULL Call[Packet size limited during capture] 1704492 4585.993436 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=121 Ack=29 Win=49640 Len=0 1704493 4585.993686 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [FIN, ACK] Seq=121 Ack=29 Win=49640 Len=0 1704495 4585.993936 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=122 Ack=30 Win=49640 Len=0 # ls -l on the mount point ... directory looks empty (incorrect) 1704512 4586.005808 192.9.206.232 -> 192.9.206.238 NFSACL V2 GETATTR Call[Packet size limited during capture] # df on the mount point ... the NFS server has oops above 1704514 4586.006303 192.9.206.232 -> 192.9.206.238 NFS V2 STATFS Call[Packet size limited during capture] 1704576 4586.047029 192.9.206.232 -> 192.9.206.238 TCP 1009 > nfs [ACK] Seq=865 Ack=553 Win=49640 Len=0 What do you make of this? Can all the different oops be explained by one bug which I've currently found one way to trigger, or do you think I'm looking at more than one bug? Another idea to explain some of what I observed. Remember I said the oops would happen several times in a row? I found a number of clients which are set to use NFSv3, but which (according to man page) will fall back to NFSv2 (and had done so a little bit according to nfsstat). I wonder if after the NFS server crashed and I brought it back up, the clients were trying to connect with NFSv3, but some failed b/c the server wasn't quite up yet, and then a second later succeeded on their fallback to NFSv2. Then, they proceeded to crash the server again due to whatever NFSv2 bug we're looking at. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.