https://bugzilla.novell.com/show_bug.cgi?id=425056 Summary: oops caused by kernel-based NFS server Product: openSUSE 11.1 Version: Factory Platform: x86-64 OS/Version: openSUSE 11.0 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: hwit@a-domani.nl QAContact: qa@suse.de Found By: Beta-Customer Created an attachment (id=238776) --> (https://bugzilla.novell.com/attachment.cgi?id=238776) syslog, containing oops Three days ago I re-installed a HP-DL380, that was running 10.2 for a long, long time withou a hickup, with openSUSE_11.0, and applied all the latest patches. This machine is a XEN DOM-0, and one of the DOM-u clients is doing all the syncing from gwdg.de. The DOM-0 is the NFS-server, the DOM-U's are NFS-clients. I noticed that a couple of times, after running hours perfectly, the clients were frozen solid as soon as they used the mounted area. I found on the net this advice for troubleshooting nfs, do: echo "2048" > /proc/sys/sunrpc/rpc_debug echo "1" > /proc/sys/sunrpc/nfs_debug and examine syslog All i noticed was a repeating message: Sep 9 17:12:40 gwdg6 kernel: RPC: Want update, refage=120, age=69 After several hours, the messages stopped coming, and all the clients complained that the nfs-server was unreachable. I performed a /etc/rc.d/portmap restart, and the trace messages re-appeared and the nfs-clients were happy again. (not nice but can live with it, can script around it) ----------------------------------------------------------------------------- Sep 9 16:44:35 kc1002 kernel: [16700.533837] nfs: server nfs not responding, still trying io timeout after 300 seconds -- exiting Sep 9 16:53:05 kc1002 kernel: [17210.421426] nfs: server nfs OK rsync error: timeout in data send/receive (code 30) at io.c(165) [receiver=2.6.9] ---------------------------------------------------------------------------- However, last evening i got a kernel oops (see attachement) Info from proc: /proc/net/rpc/auth.unix.gid/content:#uid cnt: gids... /proc/net/rpc/auth.unix.ip/content:#class IP domain /proc/net/rpc/auth.unix.ip/content:# expiry=2147483647 refcnt=1 flags=1 /proc/net/rpc/auth.unix.ip/content:nfsd 0.0.0.0 -test-client- /proc/net/rpc/auth.unix.ip/content:# expiry=1220972829 refcnt=2 flags=1 /proc/net/rpc/auth.unix.ip/content:nfsd 192.168.0.202 192.168.0.0/24 /proc/net/rpc/auth.unix.ip/content:# expiry=1220973002 refcnt=2 flags=1 /proc/net/rpc/auth.unix.ip/content:nfsd 192.168.0.201 192.168.0.0/24 /proc/net/rpc/nfs4.idtoname/content:#domain type id [name] /proc/net/rpc/nfs4.nametoid/content:#domain type name [id] /proc/net/rpc/nfsd.export/content:#path domain(flags) /proc/net/rpc/nfsd.export/content:# expiry=1220972862 refcnt=1 flags=1 /proc/net/rpc/nfsd.export/content:/srv/distro 192.168.0.0/24(rw,no_root_squash,sync,wdelay,no_subtree_check,fsid=0,uuid=2b2e46e0:abb9467f:986d104a:ef15c89a) /proc/net/rpc/nfsd.fh/content:#domain fsidtype fsid [path] /proc/net/rpc/nfsd.fh/content:# expiry=2147483647 refcnt=1 flags=1 /proc/net/rpc/nfsd.fh/content:192.168.0.0/24 7 0x0000000c00000000e0462e2b7f46b9ab4a106d989ac815ef /srv/distro --------------------------------------------------------------------------
From the syslog: ep 9 21:37:28 gwdg6 kernel: BUG: unable to handle kernel paging request at ffff8800f134d008 Sep 9 21:37:28 gwdg6 kernel: IP: [<ffffffff80262cf1>] iov_iter_advance+0x68/0x7f Sep 9 21:37:28 gwdg6 kernel: PGD 1e54067 PUD 2458067 PMD 25e2067 PTE 0 Sep 9 21:37:28 gwdg6 kernel: Oops: 0000 [1] SMP
Normally i would suspect firstly the memmory, but as those systems have error correction and not a single error occured while in 10.2, i fear something else. side note (probably not relevant) Alls the system does (up till ow atleast) was rsyncing. Which means that at any moment, just a very small number of files are open, but during the day many, many files are accessed (how much does gwdg holds? ;-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.