Hi, My nfsserver is stable, but sometimes (at least 1 or time per week) the service just hangs (ONLY the service nfsserver) When this happens, "rcnfsserver stop/start/restart/reload" or "exportfs -a" don't work (just hangs), can't do a "cat /proc/fs/nfs/exports" or "cat /etc/exports" (hangs to) and cpu load is very high (but all others services on server is OK) and all clients hangs up (home's users is a nfs mounted folder...) My /var/log/messages reports only this: saobkp rpc.statd[1152]: Can't callback saobkp (100021,4), giving up. The last resource is a hard boot (power off) !!! My configuration: #saobkp is the nfsserver #Server Suse 9.0 #Linux saobkp 2.4.21-273-smp4G #1 SMP i686 i686 i386 GNU/Linux #nfs-utils-1.0.6-114 # /etc/exports /home *(insecure,sync,rw) /cdrom *(insecure,sync,ro) # /etc/hosts 127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts 172.22.50.6 saobkp.sao.jamef saobkp # eth0 = 172.22.50.1 # host saobkp saobkp.sao.jamef has address 172.22.50.6 # /etc/sysconfig/nfs USE_KERNEL_NFSD_NUMBER="10" STATD_HOSTNAME="saobkp" # CLIENTS CONFIGURATION uses autofs (automount) # /etc/auto.master /server/u /etc/auto.u # /etc/auto.u * -fstype=nfs nfs:/home/u/& # And "host nfs" nfs.sao.jamef has address 172.22.50.6 What is wrong?
Hi,
My nfsserver is stable, but sometimes (at least 1 or time per week)
Hi, about my problems with nfsserver (see message "nfsserver hangs randomly" on this list) I can see this errors on /proc on all nfsd process: # cat /proc/proc_number/Status | grep -v State State: D (disk sleep) What's it? My problems wiht nfsserver is related below: Em Ter, 2005-10-18 às 10:00, Rejaine Monteiro escreveu: the
service just hangs (ONLY the service nfsserver)
When this happens, "rcnfsserver stop/start/restart/reload" or "exportfs -a" don't work (just hangs), can't do a "cat /proc/fs/nfs/exports" or "cat /etc/exports" (hangs to) and cpu load is very high (but all others services on server is OK) and all clients hangs up (home's users is a nfs mounted folder...)
My /var/log/messages reports only this:
saobkp rpc.statd[1152]: Can't callback saobkp (100021,4), giving up.
The last resource is a hard boot (power off) !!!
My configuration:
#saobkp is the nfsserver #Server Suse 9.0 #Linux saobkp 2.4.21-273-smp4G #1 SMP i686 i686 i386 GNU/Linux #nfs-utils-1.0.6-114
# /etc/exports
/home *(insecure,sync,rw) /cdrom *(insecure,sync,ro)
# /etc/hosts
127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts 172.22.50.6 saobkp.sao.jamef saobkp
# eth0 = 172.22.50.1
# host saobkp saobkp.sao.jamef has address 172.22.50.6
# /etc/sysconfig/nfs
USE_KERNEL_NFSD_NUMBER="10" STATD_HOSTNAME="saobkp"
# CLIENTS CONFIGURATION uses autofs (automount)
# /etc/auto.master
/server/u /etc/auto.u
# /etc/auto.u
* -fstype=nfs nfs:/home/u/&
# And "host nfs" nfs.sao.jamef has address 172.22.50.6
What is wrong? Rejaine da Silveira Monteiro Tecnologia da Informação Tel: (31) 3419-8854 Fax: (31) 3419-8803 www.jamef.com.br
Em Ter, 2005-10-18 às 10:00, Rejaine Monteiro escreveu:
Hi,
My nfsserver is stable, but sometimes (at least 1 or time per week) the service just hangs (ONLY the service nfsserver)
When this happens, "rcnfsserver stop/start/restart/reload" or "exportfs -a" don't work (just hangs), can't do a "cat /proc/fs/nfs/exports" or "cat /etc/exports" (hangs to) and cpu load is very high (but all others services on server is OK) and all clients hangs up (home's users is a nfs mounted folder...)
My /var/log/messages reports only this:
saobkp rpc.statd[1152]: Can't callback saobkp (100021,4), giving up.
The last resource is a hard boot (power off) !!!
My configuration:
#saobkp is the nfsserver #Server Suse 9.0 #Linux saobkp 2.4.21-273-smp4G #1 SMP i686 i686 i386 GNU/Linux #nfs-utils-1.0.6-114
# /etc/exports
/home *(insecure,sync,rw) /cdrom *(insecure,sync,ro)
# /etc/hosts
127.0.0.1 localhost # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts 172.22.50.6 saobkp.sao.jamef saobkp
# eth0 = 172.22.50.1
# host saobkp saobkp.sao.jamef has address 172.22.50.6
# /etc/sysconfig/nfs
USE_KERNEL_NFSD_NUMBER="10" STATD_HOSTNAME="saobkp"
# CLIENTS CONFIGURATION uses autofs (automount)
# /etc/auto.master
/server/u /etc/auto.u
# /etc/auto.u
* -fstype=nfs nfs:/home/u/&
# And "host nfs" nfs.sao.jamef has address 172.22.50.6
What is wrong?
Rejaine, On Tuesday 18 October 2005 08:24, Rejaine Monteiro wrote:
Hi,
about my problems with nfsserver (see message "nfsserver hangs randomly" on this list) I can see this errors on /proc on all nfsd process:
# cat /proc/proc_number/Status | grep -v State
State: D (disk sleep)
What's it?
It's what the parenthetical says: A process sleeps in a D state when it's waiting for disk I/O. In practice, it is not necessarily a disk per se that's being waited for, but what all use of D sleeps have in common is that the event that ends the sleep will come very quickly (well under one second). When this doesn't happen and a process is seen to persist in a D sleep, it's a de facto sign of a problem. D sleeps and all sleeps at internal priority less than 0 cannot be interrupted by signals.
My problems wiht nfsserver is related below:
Randall Schulz
Randall R Schulz wrote:
Rejaine,
On Tuesday 18 October 2005 08:24, Rejaine Monteiro wrote:
Hi,
about my problems with nfsserver (see message "nfsserver hangs randomly" on this list) I can see this errors on /proc on all nfsd process:
# cat /proc/proc_number/Status | grep -v State
State: D (disk sleep)
What's it?
It's what the parenthetical says: A process sleeps in a D state when it's waiting for disk I/O. In practice, it is not necessarily a disk per se that's being waited for, but what all use of D sleeps have in common is that the event that ends the sleep will come very quickly (well under one second). When this doesn't happen and a process is seen to persist in a D sleep, it's a de facto sign of a problem. D sleeps and all sleeps at internal priority less than 0 cannot be interrupted by signals.
My problems wiht nfsserver is related below:
Randall Schulz
What I've wondered about for a long time is why a method to nuke such tasks can't be found. There must be a reason or it would have been done a long time ago. A reboot is drastic and disruptive, If such an occurrence happened on a mainframe running z/OS, they'd be hell to pay when some users find they are without service for the hour and some it takes to completely feed such a greedy beast and it aint nice to be at the inevitable inquest, not nice for anyone. Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist Microsoft Windows Free Zone - Linux used for all Computing Tasks
Sid, On Tuesday 18 October 2005 13:07, Sid Boyce wrote:
Randall R Schulz wrote:
...
# cat /proc/proc_number/Status | grep -v State
State: D (disk sleep)
What's it?
It's what the parenthetical says: A process sleeps in a D state when it's waiting for disk I/O. In practice, it is not necessarily a disk per se that's being waited for, but what all use of D sleeps have in common is that the event that ends the sleep will come very quickly (well under one second). When this doesn't happen and a process is seen to persist in a D sleep, it's a de facto sign of a problem. D sleeps and all sleeps at internal priority less than 0 cannot be interrupted by signals.
...
Randall Schulz
What I've wondered about for a long time is why a method to nuke such tasks can't be found. There must be a reason or it would have been done a long time ago.
Of course a method could be found. It simply must not be allowed. If you allow signals to interrupt negative-priority sleeps, the integrity of the internal data structures they protect cannot be guaranteed. In practice what you'd likely see if you did it is spreading rot in the kernel data structures. The only solution is to correct the software errors that are positively indicated by a persistent D sleep.
...
Sid.
Randall Schulz
Randall R Schulz wrote:
Sid,
On Tuesday 18 October 2005 13:07, Sid Boyce wrote:
Randall R Schulz wrote:
# cat /proc/proc_number/Status | grep -v State
State: D (disk sleep)
What's it? It's what the parenthetical says: A process sleeps in a D state when it's waiting for disk I/O. In practice, it is not necessarily a disk per se that's being waited for, but what all use of D sleeps have in common is that the event that ends the sleep will come very quickly (well under one second). When this doesn't happen and a
... process is seen to persist in a D sleep, it's a de facto sign of a problem. D sleeps and all sleeps at internal priority less than 0 cannot be interrupted by signals.
... Randall Schulz What I've wondered about for a long time is why a method to nuke such tasks can't be found. There must be a reason or it would have been done a long time ago.
Of course a method could be found. It simply must not be allowed.
If you allow signals to interrupt negative-priority sleeps, the integrity of the internal data structures they protect cannot be guaranteed. In practice what you'd likely see if you did it is spreading rot in the kernel data structures.
The only solution is to correct the software errors that are positively indicated by a persistent D sleep.
...
Sid.
Randall Schulz
That makes sense, I knew there must have been some such solid reasoning to do with system integrity, just couldn't figure it through. Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist Microsoft Windows Free Zone - Linux used for all Computing Tasks
On 10/18/05, Rejaine Monteiro
Hi,
about my problems with nfsserver (see message "nfsserver hangs randomly" on this list) I can see this errors on /proc on all nfsd process:
# cat /proc/proc_number/Status | grep -v State
State: D (disk sleep)
What's it?
[snipped lot's of lines] The "disk sleep" is better known as "uninterruptible sleep" (the D is for deadlock) that can occur when processes wait for an ioctl operation to finish. Usually these processes are hung, because they wait for a kernel resource to become available (a semaphore for instance). In this case it is a permanent error and you can only recover by rebooting as you have already learned. In conjuction with NFS it is known to happen with clients that try to mount a share even though the portmapper is not running. Sorry, can't imagine what happens with your NFS server. \Steve
participants (4)
-
Randall R Schulz
-
Rejaine Monteiro
-
Sid Boyce
-
Steve Graegert