[Bug 1189508] New: NFS server timeouts with kernel 5.13.4-1
http://bugzilla.opensuse.org/show_bug.cgi?id=1189508 Bug ID: 1189508 Summary: NFS server timeouts with kernel 5.13.4-1 Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: openSUSE Tumbleweed Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-bugs@opensuse.org Reporter: lopa@mailbox.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Setup: NFS server with either Leap 15.2 or Tumbleweed and a Linux client mounting a directory from the server. There's no difference if NFSv3 or NFSv4 is used. 1. With OpenSUSE Leap 15.2 on the nfs server machine Server: # uname -a Linux orion 5.3.18-lp152.78-default #1 SMP Tue Jun 1 14:53:21 UTC 2021 (556d823) x86_64 x86_64 x86_64 GNU/Linux Client: # mount orion:/video on /var/spool/video.orion type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=100,retrans=2,sec=sys,clientaddr=192.168.1.21,local_lock=none,addr=192.168.1.17) # cd /var/spool/video.orion/dir # dd if=00001.ts of=/dev/null bs=1M 2000+1 records in 2000+1 records out 2097399440 bytes (2.1 GB, 2.0 GiB) copied, 17.9036 s, 117 MB/s No issue, file transfer succeeds 2. With OpenSUSE Tumbleweed on the nfs server machine Server: # uname -a Linux orion 5.13.4-1-default #1 SMP Thu Jul 22 15:55:06 UTC 2021 (91a0cca) x86_64 x86_64 x86_64 GNU/Linux Syslog: Aug 12 23:46:52 orion kernel: [52908.001097] rpc-srv/tcp: nfsd: sent 102156 when sending 581760 bytes - shutting down socket Client: # mount orion:/video on /var/spool/video.orion type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=100,retrans=2,sec=sys,clientaddr=192.168.1.21,local_lock=none,addr=192.168.1.17) # cd /var/spool/video.orion/dir # dd if=00001.ts of=/dev/null bs=1M dd: error reading '00001.ts': Input/output error 98+0 records in 98+0 records out 102760448 bytes (103 MB, 98 MiB) copied, 123.445 s, 832 kB/s Syslog: Aug 10 22:01:33 vdr kernel: [17893.504201] nfs: server orion not responding, timed out Aug 10 22:01:36 vdr vdr: [1348] ERROR (tools.c,423): /var/spool/video: Eingabe-/Ausgabefehler Aug 10 22:01:36 vdr kernel: [17896.128046] nfs: server orion not responding, timed out Aug 10 22:02:02 vdr kernel: [17922.980069] nfs: server orion not responding, timed out The errors don't occur immediately, it works for some time before the nfs timeouts are starting. I've seen the rpc-srv/tcp error mesage in the syslog of the nfs server only once, normally there are only the nfs timeouts on the client and no error messages on the server. The issue looks very similar to https://bugzilla.kernel.org/show_bug.cgi?id=213887 . -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189508 http://bugzilla.opensuse.org/show_bug.cgi?id=1189508#c5 --- Comment #5 from Lothar Paltins <lopa@mailbox.org> --- (In reply to Neil Brown from comment #4) Ok, I've tried to switch off all functions. There were these messages for some of them: # ethtool --offload enp7s0 tx-gso-partial off Actual changes: tx-gre-segmentation: off [not requested] tx-gre-csum-segmentation: off [not requested] tx-ipxip4-segmentation: off [not requested] tx-ipxip6-segmentation: off [not requested] tx-udp_tnl-segmentation: off [not requested] tx-udp_tnl-csum-segmentation: off [not requested] tx-gso-partial: off # ethtool --offload enp7s0 tx-vlan-offload off Actual changes: tx-vlan-hw-insert: on [requested off] Could not change any device features # ethtool --offload enp7s0 rx-vlan-offload off Actual changes: tx-vlan-hw-insert: off [not requested] rx-vlan-hw-parse: off Now "ethtool --show-offload enp7s0 | grep -v fixed" shows everything as off. But unfortunately, this doesn't change the behavior. The transfer speed is almost the same as before and the nfs timeouts occurred again after reading about 20GB from the server. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1189508 http://bugzilla.opensuse.org/show_bug.cgi?id=1189508#c6 --- Comment #6 from Neil Brown <nfbrown@suse.com> ---
But unfortunately, this doesn't change the behavior.
Thanks for testing. Even negative results can be useful. I'll keep exploring possibilities and let you know what I find something else worth testing. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com