[Bug 227613] New: nfsd oops in vfs_statfs
https://bugzilla.novell.com/show_bug.cgi?id=227613 Summary: nfsd oops in vfs_statfs Product: SUSE Linux 10.1 Version: Final Platform: i686 OS/Version: SuSE Linux 10.1 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: eric@analyticinnovations.com QAContact: qa@suse.de I have an NFS server running 2.6.16.21-0.25-smp on i686 which oopses frequently. Some days it happens repeatedly, other times there might be several days between failures. The underlying filesystems are reiserfs. I am not a kernel programmer, but I think the following contains an illegal value by the time nfsd_statfs calls vfs_statfs (and probably earlier): fhp->fh_dentry->d_inode->i_sb I have checked that I do not have memory or disk errors in the system. The filesystems are clean according to reiserfsck. Eric Kamm Analytic Innovations ============== Unable to handle kernel paging request at virtual address 6100061f printing eip: c015a18a *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /firmware/edd/int13_dev82/extensions Modules linked in: nfsd exportfs lockd nfs_acl sunrpc autofs4 edd ipv6 af_packet apparmor aamatch_pcr e ext3 jbd loop dm_mod i8xx_tco hw_random i2c_i801 shpchp i2c_core uhci_hcd pci_hotplug usbcore ide_cd cdrom e1000 lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core CPU: 0 EIP: 0060:[<c015a18a>] Not tainted VLI EFLAGS: 00010286 (2.6.16.21-0.25-smp #1) EIP is at vfs_statfs+0x15/0x62 eax: 610005eb ebx: f4326004 ecx: 00000000 edx: ffffffda esi: c59c00f9 edi: f4326000 ebp: f01bd000 esp: f43aff3c ds: 007b es: 007b ss: 0068 Process nfsd (pid: 5373, threadinfo=f43ae000 task=c22831d0) Stack: <0>f4326800 f4326004 f4326000 fa484c06 f7096200 f4326800 fa48b512 f7096200 0000001c fa4a94a8 fa4820cb f01bd018 f7096200 f7096264 f432605c fa4a94a8 fa43233c 00000017 f59ec280 fa4a9200 f7096240 f01bd018 000186a3 00000012 Call Trace: [<fa484c06>] nfsd_statfs+0x2a/0x39 [nfsd] [<fa48b512>] nfsd3_proc_fsstat+0x32/0x43 [nfsd] [<fa4820cb>] nfsd_dispatch+0xbb/0x170 [nfsd] [<fa43233c>] svc_process+0x366/0x5b2 [sunrpc] [<fa48257b>] nfsd+0x18e/0x2eb [nfsd] [<fa4823ed>] nfsd+0x0/0x2eb [nfsd] [<c0102005>] kernel_thread_helper+0x5/0xb Code: 14 0b 2c c0 8b 42 0c 89 3c 08 b0 01 86 86 b0 01 00 00 5b 5e 5f c3 57 85 c0 56 89 c6 53 89 d3 ba ed ff ff ff 74 4c 8b 40 20 b2 da <83> 78 34 00 74 41 31 c0 b9 15 00 00 00 89 df f3 ab 83 3d 44 ff -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |eric@analyticinnovations.com ------- Comment #1 from gregkh@novell.com 2006-12-14 13:22 MST ------- Is this still seen on the 10.2 release? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 ------- Comment #3 from lmb@novell.com 2007-01-09 10:48 MST ------- Eric, can you please provide the requested feedback? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |eric@analyticinnovations.com ------- Comment #5 from nfbrown@novell.com 2007-01-28 22:18 MST ------- Setting to NEEDINFO - see comments 1 and 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 eric@analyticinnovations.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|eric@analyticinnovations.com| ------- Comment #6 from eric@analyticinnovations.com 2007-01-29 10:26 MST ------- Sorry, I have not been unable to test this using 10.2. I have redeployed the NFS shares on 10.1 using Ext3 to back the shares instead of ReiserFS. The server ran smoothly for 35 days after making this change. After this long stable period, it has started oops'ing again: Jan 18 12:04:47 lxnfs00 kernel: Oops: 0000 [#1] Jan 18 12:19:59 lxnfs00 kernel: Oops: 0000 [#1] Jan 18 13:24:40 lxnfs00 kernel: Oops: 0000 [#1] Jan 18 13:43:01 lxnfs00 kernel: Oops: 0000 [#1] Jan 22 15:18:06 lxnfs00 kernel: Oops: 0000 [#1] Jan 22 15:48:48 lxnfs00 kernel: Oops: 0000 [#1] Jan 22 15:56:23 lxnfs00 kernel: Oops: 0000 [#1] Jan 22 16:09:26 lxnfs00 kernel: Oops: 0000 [#1] Jan 25 09:56:41 lxnfs00 kernel: Oops: 0000 [#1] Jan 25 10:24:16 lxnfs00 kernel: Oops: 0000 [#1] Jan 27 22:27:03 lxnfs00 kernel: Oops: 0000 [#1] As you can see, it is not uncommon for the machine to oops again right after I reboot it and start up rcnfsserver again. I do not have the NFS server starting automatically at boot time. I can say confidently that the system is stable after the reboot, but that many times moments (sometimes in less than five seconds) after I run "rcnfsserver start," the system oops again. I've been trying to imagine what sort of problem could cause these oops's. The system is an NFS server for about fifty systems including RH8, Suse 10, and Solaris 6/7/8/9/10. It is pretty heavily used; there are twenty people using it 8am-6pm weekdays and cron jobs at all hours causing automounts to occur. With all this use, I find it interesting that it will run for weeks at a time without error, and then oops. It is also interesting that many times after an oops, it will oops over and over after each restart for a few minutes up to a couple hours, and then will eventually become stable again for a long period. Here are a couple ideas: 1) a misbehaved (or older) client system triggers some infrequent operation to happen on the NFS server which triggers an nfsd or mountd bug, or 2) there is some corruption of one of the filesystems being served, the corrupted directory/file is infrequently accessed, and the error case is not being handled by the NFS server properly. Here is the most recent oops: Jan 27 22:27:03 lxnfs00 kernel: Unable to handle kernel paging request at virtual address 2c8ae132 Jan 27 22:27:03 lxnfs00 kernel: printing eip: Jan 27 22:27:03 lxnfs00 kernel: c016384d Jan 27 22:27:03 lxnfs00 kernel: *pde = 00000000 Jan 27 22:27:03 lxnfs00 kernel: Oops: 0000 [#1] Jan 27 22:27:03 lxnfs00 kernel: SMP Jan 27 22:27:03 lxnfs00 kernel: last sysfs file: /firmware/edd/int13_dev82/extensions Jan 27 22:27:03 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod hw_random ide_cd i8xx_tco e1000 i2c_i801 uhci_hcd shpchp usbcore cdrom pci_hotplug i2c_core lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core Jan 27 22:27:03 lxnfs00 kernel: CPU: 0 Jan 27 22:27:03 lxnfs00 kernel: EIP: 0060:[<c016384d>] Not tainted VLI Jan 27 22:27:03 lxnfs00 kernel: EFLAGS: 00010246 (2.6.16.21-0.25-smp #1) Jan 27 22:27:03 lxnfs00 kernel: EIP is at vfs_getattr+0x39/0x9f Jan 27 22:27:03 lxnfs00 kernel: eax: 2c8ae0f6 ebx: f6e057c7 ecx: fa1bfc80 edx: f7ff4b64 Jan 27 22:27:03 lxnfs00 kernel: esi: f5a8ff70 edi: f7ff4b64 ebp: f5a8ff70 esp: f5a8fef8 Jan 27 22:27:03 lxnfs00 kernel: ds: 007b es: 007b ss: 0068 Jan 27 22:27:03 lxnfs00 kernel: Process rpc.mountd (pid: 5795, threadinfo=f5a8e000 task=c22766d0) Jan 27 22:27:03 lxnfs00 kernel: Stack: <0>dfd39f40 00000000 f5a8ff70 f5a8ff10 f5a8e000 c01638da f7ff4b64 dfd39f40 Jan 27 22:27:03 lxnfs00 kernel: 00000000 00000000 00000000 00000000 00000001 00000000 00001000 00000002 Jan 27 22:27:03 lxnfs00 kernel: 00000000 45bbcc01 00000000 457ec34f 00000000 457ec34f 00000000 00036e62 Jan 27 22:27:03 lxnfs00 kernel: Call Trace: Jan 27 22:27:03 lxnfs00 kernel: [<c01638da>] vfs_lstat_fd+0x27/0x39 Jan 27 22:27:03 lxnfs00 kernel: [<c0163931>] sys_lstat64+0xf/0x23 Jan 27 22:27:03 lxnfs00 kernel: [<c01441ed>] mempool_free+0x32/0x64 Jan 27 22:27:03 lxnfs00 kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79 Jan 27 22:27:03 lxnfs00 kernel: Code: 8b 5a 0c f6 83 4d 01 00 00 02 75 19 83 3d 44 ff 3b c0 00 74 10 8b 0d 40 ff 3b c0 ff 91 c4 00 00 00 85 c0 75 66 8b 83 94 00 00 00 <8b> 70 3c 85 f6 74 0b 8b 04 24 89 e9 89 fa ff d6 eb 4e 89 d8 89 And the one prior: Jan 25 10:24:16 lxnfs00 kernel: Unable to handle kernel paging request at virtual address 2c1ae132 Jan 25 10:24:16 lxnfs00 kernel: printing eip: Jan 25 10:24:16 lxnfs00 kernel: c016384d Jan 25 10:24:16 lxnfs00 kernel: *pde = 00000000 Jan 25 10:24:16 lxnfs00 kernel: Oops: 0000 [#1] Jan 25 10:24:16 lxnfs00 kernel: SMP Jan 25 10:24:16 lxnfs00 kernel: last sysfs file: /firmware/edd/int13_dev82/extensions Jan 25 10:24:16 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod hw_random i2c_i801 shpchp pci_hotplug ide_cd cdrom uhci_hcd usbcore i8xx_tco i2c_core e1000 lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core Jan 25 10:24:16 lxnfs00 kernel: CPU: 0 Jan 25 10:24:16 lxnfs00 kernel: EIP: 0060:[<c016384d>] Not tainted VLI Jan 25 10:24:16 lxnfs00 kernel: EFLAGS: 00010246 (2.6.16.21-0.25-smp #1) Jan 25 10:24:16 lxnfs00 kernel: EIP is at vfs_getattr+0x39/0x9f Jan 25 10:24:16 lxnfs00 kernel: eax: 2c1ae0f6 ebx: f6f9934f ecx: fa1bec80 edx: f6f199d8 Jan 25 10:24:16 lxnfs00 kernel: esi: fa4b2622 edi: f6f199d8 ebp: f2309eec esp: f2309ed0 Jan 25 10:24:16 lxnfs00 kernel: ds: 007b es: 007b ss: 0068 Jan 25 10:24:16 lxnfs00 kernel: Process nfsd (pid: 5891, threadinfo=f2308000 task=f2f75a30) Jan 25 10:24:16 lxnfs00 kernel: Stack: <0>f7210840 f6fc4020 fa4b2622 f1cb7004 f7016a00 fa4b14de f1cb7010 00000002 Jan 25 10:24:16 lxnfs00 kernel: f1cb700c 00000000 00000002 00000000 f1cb7004 0000001c f1cb71d4 f1cb7004 Jan 25 10:24:16 lxnfs00 kernel: fa4ac5a1 00000001 dfc23d1c f7016a00 fa46dc80 fa46f72c 00000000 f1cb7004 Jan 25 10:24:16 lxnfs00 kernel: Call Trace: Jan 25 10:24:16 lxnfs00 kernel: [<fa4b2622>] nfs3svc_encode_diropres+0x0/0x95 [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4b14de>] encode_post_op_attr+0x37/0x21b [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4ac5a1>] nfsd_lookup+0x47/0x36a [nfsd] Jan 25 10:24:16 lxnfs00 rpc.mountd: authenticated unmount request from lx05:643 for /export/homes/mattp (/export/homes) Jan 25 10:24:16 lxnfs00 kernel: [<fa4b2622>] nfs3svc_encode_diropres+0x0/0x95 [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4b269c>] nfs3svc_encode_diropres+0x7a/0x95 [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4b2622>] nfs3svc_encode_diropres+0x0/0x95 [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4a7135>] nfsd_dispatch+0x125/0x170 [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa45733c>] svc_process+0x366/0x5b2 [sunrpc] Jan 25 10:24:16 lxnfs00 kernel: [<fa4a757b>] nfsd+0x18e/0x2eb [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<fa4a73ed>] nfsd+0x0/0x2eb [nfsd] Jan 25 10:24:16 lxnfs00 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb Jan 25 10:24:16 lxnfs00 kernel: Code: 8b 5a 0c f6 83 4d 01 00 00 02 75 19 83 3d 44 ff 3b c0 00 74 10 8b 0d 40 ff 3b c0 ff 91 c4 00 00 00 85 c0 75 66 8b 83 94 00 00 00 <8b> 70 3c 85 f6 74 0b 8b 04 24 89 e9 89 fa ff d6 eb 4e 89 d8 89 And one prior to that: Jan 25 09:56:41 lxnfs00 kernel: Unable to handle kernel paging request at virtual address 2b4ae105 Jan 25 09:56:41 lxnfs00 kernel: printing eip: Jan 25 09:56:41 lxnfs00 kernel: c0167043 Jan 25 09:56:41 lxnfs00 kernel: *pde = 00000000 Jan 25 09:56:41 lxnfs00 kernel: Oops: 0000 [#1] Jan 25 09:56:41 lxnfs00 kernel: SMP Jan 25 09:56:41 lxnfs00 kernel: last sysfs file: /firmware/edd/int13_dev82/extensions Jan 25 09:56:41 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod i2c_i801 i2c_core i8xx_tco hw_random shpchp pci_hotplug uhci_hcd ide_cd cdrom usbcore e1000 lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core Jan 25 09:56:41 lxnfs00 kernel: CPU: 0 Jan 25 09:56:41 lxnfs00 kernel: EIP: 0060:[<c0167043>] Not tainted VLI Jan 25 09:56:41 lxnfs00 kernel: EFLAGS: 00010206 (2.6.16.21-0.25-smp #1) Jan 25 09:56:41 lxnfs00 kernel: EIP is at permission+0x57/0xa3 Jan 25 09:56:41 lxnfs00 kernel: eax: 2b4ae0d1 ebx: d134b1d3 ecx: 00000000 edx: 00000004 Jan 25 09:56:41 lxnfs00 kernel: esi: d134b1d3 edi: 00000004 ebp: 00000000 esp: f6b23f08 Jan 25 09:56:41 lxnfs00 kernel: ds: 007b es: 007b ss: 0068 Jan 25 09:56:41 lxnfs00 kernel: Process nfsd (pid: 5796, threadinfo=f6b22000 task=f6bae1b0) Jan 25 09:56:41 lxnfs00 kernel: Stack: <0>00000004 d134b1d3 f3db6e40 00000001 fa4a6ac2 fa4cae60 00000000 00000000 Jan 25 09:56:41 lxnfs00 kernel: fa4a6b94 f5de60ec f3db6e40 e5bfe21c 0000001f f5de6004 f5dd9800 f5de6000 Jan 25 09:56:41 lxnfs00 kernel: f600f400 fa4accec 00000000 f600f400 0000001c fa4cb1d0 eea1d000 fa4a40cb Jan 25 09:56:41 lxnfs00 kernel: Call Trace: Jan 25 09:56:41 lxnfs00 kernel: [<fa4a6ac2>] nfsd_permission+0x89/0xd7 [nfsd] Jan 25 09:56:41 lxnfs00 kernel: [<fa4a6b94>] nfsd_access+0x84/0xcc [nfsd] Jan 25 09:56:42 lxnfs00 kernel: [<fa4accec>] nfsacld_proc_access+0x86/0x8c [nfsd] Jan 25 09:56:42 lxnfs00 kernel: [<fa4a40cb>] nfsd_dispatch+0xbb/0x170 [nfsd] Jan 25 09:56:42 lxnfs00 kernel: [<fa45433c>] svc_process+0x366/0x5b2 [sunrpc] Jan 25 09:56:42 lxnfs00 kernel: [<fa4a457b>] nfsd+0x18e/0x2eb [nfsd] Jan 25 09:56:42 lxnfs00 kernel: [<fa4a43ed>] nfsd+0x0/0x2eb [nfsd] Jan 25 09:56:42 lxnfs00 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb Jan 25 09:56:42 lxnfs00 kernel: Code: 3d 00 40 00 00 74 66 3d 00 a0 00 00 74 5f f6 83 4c 01 00 00 08 b8 f3 ff ff ff 75 56 8b 83 94 00 00 00 89 fa 83 e2 f7 85 c0 74 0f <8b> 70 34 85 f6 74 08 89 e9 89 d8 ff d6 eb 09 31 c9 89 d8 e8 cd Is there some other kind of information gathering I can do to help track down the problem? Thank you, Eric -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |eric@analyticinnovations.com ------- Comment #7 from nfbrown@novell.com 2007-02-12 17:57 MST ------- Do you have any clients using NFSv2? (there is a suggestion in one of the traces that you are). If so, can you try remounting them using NFSv3? I have had another another report with similar symptoms and one common aspect is both seem to be using NFSv2, which I suspect is fairly uncommon these days. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 eric@analyticinnovations.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |eric@analyticinnovations.com Status|NEEDINFO |ASSIGNED Info Provider|eric@analyticinnovations.com| ------- Comment #8 from eric@analyticinnovations.com 2007-02-15 09:00 MST ------- I do have some clients using NFSv2: lxnfs00:~ # nfsstat -sn Server nfs v2: null getattr setattr root lookup readlink 3 0% 9090 36% 0 0% 0 0% 15688 63% 0 0% read wrcache write create remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% link symlink mkdir rmdir readdir fsstat 0 0% 0 0% 0 0% 0 0% 0 0% 12 0% Server nfs v3: null getattr setattr lookup access readlink 10794 0% 4039039 75% 13103 0% 544850 10% 450929 8% 439 0% read write create mkdir symlink mknod 212332 3% 22024 0% 8569 0% 13 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 8218 0% 14 0% 684 0% 1 0% 2159 0% 5072 0% fsstat fsinfo pathconf commit 6890 0% 4283 0% 353 0% 9941 0% I just learned to use tcpdump/tethereal to maybe help gather more information about what leads up to the oops. I am planning to set up a rolling tcpdump capture and then I should have the network traffic preceding the next oops. Does this make sense to do? I am tracking down all NFSv2 clients right now and trying to figure out how to make them only use NFSv3. What is it that makes you suspect NFSv2 is involved? The stack dumps have a bunch of nfs3* functions in them. Are some of the functions you see also NFSv2 specific? Just trying to learn something if you have the time to teach. Thank you! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 eric@analyticinnovations.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |eric@analyticinnovations.com ------- Comment #9 from eric@analyticinnovations.com 2007-02-16 15:05 MST ------- I finally came up with a way to trigger the oops. First, here is the oops: Feb 16 12:52:40 lxnfs00 kernel: Unable to handle kernel paging request at virtual address 2bfae12d Feb 16 12:52:40 lxnfs00 kernel: printing eip: Feb 16 12:52:40 lxnfs00 kernel: c016384d Feb 16 12:52:40 lxnfs00 kernel: *pde = 00000000 Feb 16 12:52:40 lxnfs00 kernel: Oops: 0000 [#1] Feb 16 12:52:40 lxnfs00 kernel: SMP Feb 16 12:52:40 lxnfs00 kernel: last sysfs file: /firmware/edd/int13_dev82/extensions Feb 16 12:52:40 lxnfs00 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc edd autofs4 ipv6 af_packet apparmor aamatch_pcre ext3 jbd loop dm_mod hw_random uhci_hcd usbcore shpchp i2c_i801 i8xx_tco e1000 pci_hotplug ide_cd i2c_core cdrom lp parport_pc ppdev parport reiserfs processor sg 3w_xxxx piix sd_mod scsi_mod ide_disk ide_core Feb 16 12:52:40 lxnfs00 kernel: CPU: 0 Feb 16 12:52:40 lxnfs00 kernel: EIP: 0060:[<c016384d>] Not tainted VLI Feb 16 12:52:40 lxnfs00 kernel: EFLAGS: 00010246 (2.6.16.21-0.25-smp #1) Feb 16 12:52:40 lxnfs00 kernel: EIP is at vfs_getattr+0x39/0x9f Feb 16 12:52:40 lxnfs00 kernel: eax: 2bfae0f1 ebx: f1cdb6c7 ecx: fa1bcc80 edx: c71362a0 Feb 16 12:52:40 lxnfs00 kernel: esi: e3901f70 edi: c71362a0 ebp: e3901f70 esp: e3901ef8 Feb 16 12:52:40 lxnfs00 kernel: ds: 007b es: 007b ss: 0068 Feb 16 12:52:40 lxnfs00 kernel: Process rpc.mountd (pid: 14593, threadinfo=e3900000 task=f714ecb0) Feb 16 12:52:40 lxnfs00 kernel: Stack: <0>f6d41440 00000000 e3901f70 e3901f10 e3900000 c01638da c71362a0 f6d41440 Feb 16 12:52:40 lxnfs00 kernel: 00000000 00000000 00000000 00000000 00000001 00000000 00001000 00000008 Feb 16 12:52:40 lxnfs00 kernel: 00000000 45d5d535 00000000 45d1c909 00000000 45d1c909 00000000 00000002 Feb 16 12:52:40 lxnfs00 kernel: Call Trace: Feb 16 12:52:40 lxnfs00 kernel: [<c01638da>] vfs_lstat_fd+0x27/0x39 Feb 16 12:52:40 lxnfs00 kernel: [<c0163931>] sys_lstat64+0xf/0x23 Feb 16 12:52:40 lxnfs00 kernel: [<c01441ed>] mempool_free+0x32/0x64 Feb 16 12:52:40 lxnfs00 kernel: [<c0103bdb>] sysenter_past_esp+0x54/0x79 Feb 16 12:52:40 lxnfs00 kernel: Code: 8b 5a 0c f6 83 4d 01 00 00 02 75 19 83 3d 44 ff 3b c0 00 74 10 8b 0d 40 ff 3b c0 ff 91 c4 00 00 00 85 c0 75 66 8b 83 94 00 00 00 <8b> 70 3c 85 f6 74 0b 8b 04 24 89 e9 89 fa ff d6 eb 4e 89 d8 89 Feb 16 12:55:00 lxnfs00 kernel: <1>Unable to handle kernel paging request at virtual address 2bfae125 Feb 16 12:55:00 lxnfs00 kernel: printing eip: Feb 16 12:55:00 lxnfs00 kernel: c0167043 Feb 16 12:55:00 lxnfs00 kernel: *pde = 00000000 Here is what I did: - rpc.mountd on the NFS server was started with -N 2 flag (I was trying to disable it from doing NFSv2, but I obviously didn't understand what this flag really did) - On the NFS client, I added nfsvers=2 as a mount option (thinking the mount would fail since I had started the rpc.mountd with -N 2). - On the client, I ran the mount command. It gave no error and exited with zero. - On the client, I ran an ls on the mounted directory, and it incorrectly appeared to be empty. - On the client, I ran a df on the mounted directory. This caused the NFS server to oops. - I repeated this twice with the same results. I also repeated it using nfsvers=3, and it worked fine. Finally, here is some annotated tethereal output: # mount ... no errors and exits 0 (acts like it worked) 1704412 4585.960581 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call MOUNT(100005) V:2 UDP 1704414 4585.964579 192.9.206.232 -> 192.9.206.238 MOUNT V2 NULL Call 1704416 4585.964953 192.9.206.232 -> 192.9.206.238 MOUNT V1 NULL Call 1704419 4585.966203 192.9.206.232 -> 192.9.206.238 MOUNT V1 MNT Call[Packet size limited during capture] 1704462 4585.986441 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call NLM(100021) V:1 UDP 1704464 4585.987191 192.9.206.232 -> 192.9.206.238 NLM V1 GRANTED Call[Packet size limited during capture] 1704473 4585.990187 192.9.206.232 -> 192.9.206.238 Portmap V2 GETPORT Call NFS(100003) V:2 TCP 1704486 4585.992562 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [SYN] Seq=0 Ack=0 Win=49640 Len=0 MSS=1460 1704488 4585.992812 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=1 Ack=1 Win=49640 Len=0 1704489 4585.993186 192.9.206.232 -> 192.9.206.238 NFS V2 NULL Call[Packet size limited during capture] 1704492 4585.993436 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=121 Ack=29 Win=49640 Len=0 1704493 4585.993686 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [FIN, ACK] Seq=121 Ack=29 Win=49640 Len=0 1704495 4585.993936 192.9.206.232 -> 192.9.206.238 TCP 62807 > nfs [ACK] Seq=122 Ack=30 Win=49640 Len=0 # ls -l on the mount point ... directory looks empty (incorrect) 1704512 4586.005808 192.9.206.232 -> 192.9.206.238 NFSACL V2 GETATTR Call[Packet size limited during capture] # df on the mount point ... the NFS server has oops above 1704514 4586.006303 192.9.206.232 -> 192.9.206.238 NFS V2 STATFS Call[Packet size limited during capture] 1704576 4586.047029 192.9.206.232 -> 192.9.206.238 TCP 1009 > nfs [ACK] Seq=865 Ack=553 Win=49640 Len=0 What do you make of this? Can all the different oops be explained by one bug which I've currently found one way to trigger, or do you think I'm looking at more than one bug? Another idea to explain some of what I observed. Remember I said the oops would happen several times in a row? I found a number of clients which are set to use NFSv3, but which (according to man page) will fall back to NFSv2 (and had done so a little bit according to nfsstat). I wonder if after the NFS server crashed and I brought it back up, the clients were trying to connect with NFSv3, but some failed b/c the server wasn't quite up yet, and then a second later succeeded on their fallback to NFSv2. Then, they proceeded to crash the server again due to whatever NFSv2 bug we're looking at. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 ------- Comment #10 from nfbrown@novell.com 2007-02-19 16:18 MST ------- Created an attachment (id=120007) --> (https://bugzilla.novell.com/attachment.cgi?id=120007&action=view) Patch to fix problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=227613 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|eric@analyticinnovations.com| Resolution| |FIXED ------- Comment #11 from nfbrown@novell.com 2007-02-19 16:19 MST ------- Thanks for the extra detail. I know what is happening now. That above attachment will fix the problem. This patch will be in the next security update for the kernel. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com