[Bug 1217488] New: pynfs CID* tests fails: OP_SETCLIENTID should return NFS4_OK, instead got NFS4ERR_DELAY
https://bugzilla.suse.com/show_bug.cgi?id=1217488 Bug ID: 1217488 Summary: pynfs CID* tests fails: OP_SETCLIENTID should return NFS4_OK, instead got NFS4ERR_DELAY Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel:Filesystems Assignee: nfbrown@suse.com Reporter: petr.vorel@suse.com QA Contact: petr.vorel@suse.com Target Milestone: --- Found By: --- Blocker: --- Although we start due #1217128 testing via cdmackay/pynfs.git, which has fix for "nfs4lib.BadCompoundRes: operation OP_SETCLIENTID should return NFS4_OK [1], instead got NFS4ERR_DELAY", we still get 9 tests failing with this error on Tumbleweed. Any idea what could be wrong now? O I can ask Calum Mackay on linux-nfs if you're busy with more important stuff. [1] https://git.linux-nfs.org/?p=cdmackay/pynfs.git;a=commit;h=0d4d3fd0bb7a63860... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |petr.vorel@suse.com, | |yosun@suse.com See Also| |https://bugzilla.suse.com/s | |how_bug.cgi?id=1217128 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P4 - Low -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 https://bugzilla.suse.com/show_bug.cgi?id=1217488#c2 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #2 from Neil Brown <nfbrown@suse.com> --- pynfs only waits for 10 seconds for the DELAY error to go away. I guess that isn't long enough. I think that failing the OP_SETCLIENTID just because there are already lots of clients is a bad choice. Certainly fail if there is a real shortage of memory, but not otherwise. Certainly look for idle clients to clean up, but don't fail. I'll post a patch upstream and see what they think. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 https://bugzilla.suse.com/show_bug.cgi?id=1217488#c3 --- Comment #3 from Petr Vorel <petr.vorel@suse.com> --- Neil's patch in ML: https://lore.kernel.org/linux-nfs/171375175915.7600.6526208866216039031@nobl... Thanks, Neil! -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 https://bugzilla.suse.com/show_bug.cgi?id=1217488#c4 --- Comment #4 from Petr Vorel <petr.vorel@suse.com> --- Based on upstream maintainer's comment about 1 GB not being enough [1] I tested with more RAM (QEMURAM=3600) and it solved the problem [2]. Let's see if Neil's v2 fix [3] is merged in upstream or not. [1] https://lore.kernel.org/linux-nfs/ZiZnbV+htcvGuGQl@tissot.1015granger.net/ [2] http://quasar.suse.cz/tests/3237 [3] https://lore.kernel.org/linux-nfs/171385732687.7600.2864936377155228614@nobl... -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 Petr Vorel <petr.vorel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P4 - Low |P5 - None -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 https://bugzilla.suse.com/show_bug.cgi?id=1217488#c5 Eric Bischoff <ebischoff@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ebischoff@suse.com --- Comment #5 from Eric Bischoff <ebischoff@suse.com> --- I think we saw it on SLES, after updating our NFS servers from SLES 15 SP5 to SLES 15 SP6. From time to time, we saw this command # mount -v -o defaults -t nfs ourserver:/srv/mirror /mirror never return. tcpdump was showing: 10.84.220.129.684 > 10.84.220.56.2049: Flags [P.], cksum 0xce58 (incorrect -> 0x4a73), seq 605:813, ack 285, win 501, options [nop,nop,TS val 964831646 ecr 678067752], length 208: NFS request xid 1672628359 204 getattr fh 0,2/43 10:25:30.671426 IP (tos 0x0, ttl 64, id 65389, offset 0, flags [DF], proto TCP (6), length 100) 10.84.220.56.2049 > 10.84.220.129.684: Flags [P.], cksum 0xc17d (correct), seq 285:333, ack 813, win 504, options [nop,nop,TS val 678067753 ecr 964831646], length 48: NFS reply xid 1672628359 reply ok 44 getattr ERROR: Request couldn't be completed in time Increasing the RAM on the NFS server from 2 GiB to 8 GiB seemed to solve the issue. Version on server: nfs-kernel-server 150600.26.2 Version on client: nfs-client 2.1.1-150100.10.37.1 -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1217488 https://bugzilla.suse.com/show_bug.cgi?id=1217488#c6 --- Comment #6 from Petr Vorel <petr.vorel@suse.com> --- Interesting. I would hope kernel would would explicitly warn about dropped connections due lack of RAM in dmesg. I hoped that Neil's afford would bring that. While there were some issues with the client, I believe following Neil's "support automatic changes to nfsd thread count" patchset [1], specifically commit [2] and [3] handle (among others) this issue. [1] https://lore.kernel.org/linux-nfs/20240715074657.18174-1-neilb@suse.de/ [2] https://lore.kernel.org/linux-nfs/20240715074657.18174-10-neilb@suse.de/ [3] https://lore.kernel.org/linux-nfs/20240715074657.18174-11-neilb@suse.de/ -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com