[opensuse] nfsv4 causes system to become unstable under heavy I/O ?
Probably a bit of a long shot, but never mind - is anyone using nfsv4? I have a test-system with root on NFSv4. I have been running CPU & I/O stresstests for about a week now, and they have all ended in a hung system after a 8-16 hours (hard to say when, usually in the middle of the night). The I/O stresstests target locally attached disk, not the NFS root. The symptoms are those of a slow death, but have sofar all ended in a complete hang, system not even responding to Ctrl-Alt-Sysrq. This sounds to me like the NFS root not responding. No interesting console output, I've had a serial console attached for days. I was about to swap the disk controllers, when I thought of maybe trying NFSv3. This works much better ... the system has been chugging along for almost 24 hours now, no indications of any problems. Does this sound familiar to anyone? -- Per Jessen, Zürich (1.8°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/10/2014 08:57 AM, Per Jessen wrote:
Probably a bit of a long shot, but never mind - is anyone using nfsv4?
I have a test-system with root on NFSv4. I have been running CPU & I/O stresstests for about a week now, and they have all ended in a hung system after a 8-16 hours (hard to say when, usually in the middle of the night). The I/O stresstests target locally attached disk, not the NFS root. The symptoms are those of a slow death, but have sofar all ended in a complete hang, system not even responding to Ctrl-Alt-Sysrq. This sounds to me like the NFS root not responding.
No interesting console output, I've had a serial console attached for days. I was about to swap the disk controllers, when I thought of maybe trying NFSv3. This works much better ... the system has been chugging along for almost 24 hours now, no indications of any problems.
Does this sound familiar to anyone?
It should be easier to try to replicate the problem using a machine that does not have a NFS root. Put the system on a local disk, do your stresstest and try to access the nfs4 mounted filesystem at the same time. This probably gives you more post mortem information in the logs.
Florian Gleixner wrote:
On 12/10/2014 08:57 AM, Per Jessen wrote:
Probably a bit of a long shot, but never mind - is anyone using nfsv4?
I have a test-system with root on NFSv4. I have been running CPU & I/O stresstests for about a week now, and they have all ended in a hung system after a 8-16 hours (hard to say when, usually in the middle of the night). The I/O stresstests target locally attached disk, not the NFS root. The symptoms are those of a slow death, but have sofar all ended in a complete hang, system not even responding to Ctrl-Alt-Sysrq. This sounds to me like the NFS root not responding.
No interesting console output, I've had a serial console attached for days. I was about to swap the disk controllers, when I thought of maybe trying NFSv3. This works much better ... the system has been chugging along for almost 24 hours now, no indications of any problems.
Does this sound familiar to anyone?
It should be easier to try to replicate the problem using a machine that does not have a NFS root. Put the system on a local disk, do your stresstest and try to access the nfs4 mounted filesystem at the same time. This probably gives you more post mortem information in the logs.
Thanks, that's a good idea. I'll have a look at that tomorrow. -- Per Jessen, Zürich (3.8°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
In the past when I've had problems with a server locking up like this, I'll leave it logged in on multiple terminal screens, with different things running on different screens: top tail -f /var/log/messages watch 'netstat -aln | grep tcp | wc -l' watch 'lsof | wc -l' and leave it on the "top" screen In cases where the server is kind, it'll let me at least switch between console screens to see what was going on at the time. And if worse comes to worse and it totally freezes, I can at least see what it was doing at the time of the freeze. Chris -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/10/2014 02:57 AM, Per Jessen wrote:
No interesting console output, I've had a serial console attached for days. I was about to swap the disk controllers, when I thought of maybe trying NFSv3. This works much better ... the system has been chugging along for almost 24 hours now, no indications of any problems.
It might have something to do with one being UDP and the other being TCP. UDP is non-blocking. There may also be issues to do with kernel memory buffer consumption and locking. Its been a long while since I looked at the kernel network code so I'm guessing based on principles and the types of bugs that do occur. -- /"\ \ / ASCII Ribbon Campaign X Against HTML Mail / \ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (4)
-
Anton Aylward
-
Christopher Myers
-
Florian Gleixner
-
Per Jessen