[Bug 986395] New: NFS client is using temp IPv6 address for mount
http://bugzilla.suse.com/show_bug.cgi?id=986395 Bug ID: 986395 Summary: NFS client is using temp IPv6 address for mount Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: christian.deckelmann@microfocus.com QA Contact: qa-bugs@suse.de CC: jbohac@suse.com, mt@suse.com Found By: --- Blocker: --- Hi, the NFS client is using the privacy IPv6 address to mount from a NFS server. This breaks the mount when the privacy IPs are renewed periodically Solution could be to implement RFC 5014 in the NFS client application. That would allow the client to not use the privacy IP as source for an NFS mount. Same could be implemented for other clients hat rely on stable connections for a long time. Thanks, Christian -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 Chenzi Cao <chcao@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bnc-team-screening@forge.pr |nfbrown@suse.com |ovo.novell.com | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c1 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CONFIRMED --- Comment #1 from Neil Brown <nfbrown@suse.com> --- I agree this is a bug, though it might have to turn into a feature request. I'll have a proper look in the next couple of weeks and see if there is an easy solution or not. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c2 --- Comment #2 from Marius Tomaschewski <mt@suse.com> --- (In reply to Christian Deckelmann from comment #0)
Hi,
the NFS client is using the privacy IPv6 address to mount from a NFS server. This breaks the mount when the privacy IPs are renewed periodically ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
They're _not_ renewed, but _deleted_. Only "normal" non-temporary dhcp6 and auto6 [$prefix:eui64_from_mac] addresses are renewed. When the preferred_lft reached 0, a new temp ($prefix:$random) address is added to the interface (-> for new connections) and the old one gets a deprecated flag. Once the valid_lft reached 0, an address is deleted.
Solution could be to implement RFC 5014 in the NFS client application.
Yes, there is also an examples showing the (socket and) getaddrinfo flags to set causing to filter out or to prefer (as in the example) temp addresses (e.g. in a web browser) in source address selections. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c3 --- Comment #3 from Neil Brown <nfbrown@suse.com> --- I've looked into this a bit more. I think that using a temporary address to connect to the server is a valid thing to do. As Marius clarified, that address will remain valid as long as the connection to the server persists. If the client needs to disconnect and reconnect it should get a new client address. However for NFSv4.0 the client needs to tell the server how the server can call-back to the client for various state management functions. If a temporary address is provided for that, the server may find that it cannot contact the client. This is only needed for NFSv4.0. NFSv4.1 and later use the connection from client to server as a back-channel so the server never contacts the client directly. So: I think that if the client uses NFSv4.1, the temporary address should not cause a problem. So mount with "-o vers=4.1" - assuming the server supports that. If v4.0 must be used, then you can specify the preferred client address with the "clientaddr mount option. i.e. "mount -t nfs -o clientaddr=xx:xx:xx::xx ....." If no clientaddr option is given, mount.nfs will choose one itself. It would be best if mount.nfs chose a non-temporary address and I will look into that. However I'd like to be sure that I'm chasing the right problem. So please: 1/ explain exactly why you think it currently doesn't work - what symptoms are there? 2/ If possible, test with "-o vers=4.1" and see if the symptoms persist. 3/ Also test with "-o clientaddr=......" and see if the symptoms persist. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c4 --- Comment #4 from Neil Brown <nfbrown@suse.com> --- Created attachment 684009 --> http://bugzilla.suse.com/attachment.cgi?id=684009&action=edit Patch to use public IPv6 for callback. This patch ensures that the default "clientaddr" is a public address. I'll post this upstream tomorrow to see what other think. I think this will fix the only problem with IPv6 temp addresses, but I'm open to being educated in this matter. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c11 --- Comment #11 from Marius Tomaschewski <mt@suse.com> --- (In reply to Neil Brown from comment #9)
But how often? RFC4941 doesn't list a minimum for TEMP_VALID_LIFETIME or TEMP_PREFERRED_LIFETIME, just recommended defaults of 1 week and 1 day. Cycling every 6 hours would probably be safe
IMO, it would not be safe at all. The defaults are veeeery long (and the defaults are similar to e.g. IEEE defaults for bridge STP causing up to 50s delay before forwarding packets or sleep(random(1..10s)) before dhcp4 starts to do anything. In practice, it is probably more common to use e.g. valid lifetime of 1hour. I'd not make any assumption about the times, but read the lifetime from the address. What would be probably doable is to a) prefer to use non-temporary addresses and (if there is none [a clear corner case]) b) start to switch over to a new temporary-address when the preferred lft reached 0 (still usable for existing connections until valid lft goes to 0). The temporary / privacy addresses are assigned additionally to non-temporary. IMO there is currently no autoconf option in the kernel to assign a privacy address only, that is, there is basically always a non-temporary / renewable [e.g. MAC based EUI64] address. When the user is using not using autoconf, but only dhcp6 and assigns only a temp addr, you can IMO assume he knows what he is doing and does not want to use nfs/persistent connections at all or he misconfigured the box. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c12 --- Comment #12 from Neil Brown <nfbrown@suse.com> ---
... but read the lifetime from the address.
as far as I can tell there is no interface to do this. With a bit of effort I could probably get a list of interfaces, then a list of addresses for each interface together with their lifetimes. Then compare the address I got when I bound my socket to each of these to deduce the lifetime of the address. But that is awfully clumsy. Always requesting a public address is certainly possible but seems to go explicitly against the configuration choice to use temporary private addresses for outgoing connections. I'm currently tempted to not change anything. Once the local address does expire the client will stop getting replies from the server, will timeout, and will reconnect. It should then get a new address. This should only take about 2 minutes, and should only happen if there has been constant traffic, with no gap of 5 minutes, for the entire lifetime of the temporary address. I need to test that this is what actually happens. If it is, I would need a very strong arguement for making any change given how clumsy such a change would have to be. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c13 --- Comment #13 from Neil Brown <nfbrown@suse.com> --- I've done some experimenting. An NFSv4 TCP connection never goes idle for more than about 5 minutes as it sends a periodic "RENEW" requests, so there is no chance of the clean change-over to the new address happening automatically. I set up temp address to be deprecated after 120 seconds and expired after 500 so I could test things a bit more easily. When the address I used to mount from expired, the client started ignoring replies from the server as expected and this continued for 5.5 minutes without the connection being broken, which is longer than I expected. It should be 3 minutes I think. But then the connection just started working again. The client started using the same temp IP address as before. Is that expected? I'll keep exploring. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c15 --- Comment #15 from Neil Brown <nfbrown@suse.com> ---
You're sure you haven't just overlooked something?
No, I'm not. I haven't been able to reproduce it. Other things are weird though. It seems fairly easy to trigger the IPv6: ipv6_create_tempaddr: regeneration time exceeded - disabled temporary address support message, which seems like it should be nearly impossible on a network with half a dozen hosts. And "ip -6 addr" reports e.g. inet6 2406:3400:c:15:7036:9572:6256:d431/64 scope global temporary dynamic valid_lft 277sec preferred_lft 14177sec so the preferred_lft is larger than the valid_lft. # grep . /proc/sys/net/ipv6/conf/eth0/temp_* /proc/sys/net/ipv6/conf/eth0/temp_prefered_lft:120 /proc/sys/net/ipv6/conf/eth0/temp_valid_lft:500 so preferred_lft should be less than 120. A couple of minutes later it says: valid_lft 155sec preferred_lft 0sec so it jumped to 0 (and became 'deprecated') rather quickly. but now establishing an IPv6 connection doesn't create a new temp address, but just uses the permanent one. When I do have an NFS connection from a temporary address, and the temp address becomes invalid, I've seen "time ls -l /mnt" take 14 minutes, thought 3.5 is more common. I don't think the NFS client does ever disconnect itself. It just waits for the networking layer to break the connection. Yes, I saw the thread on "research" thanks. There is definitely something wrong, and not having temp_addresses as the default would probably be the best fix. I'm starting to lean towards just telling NFS to always request a public address... OK, more weirdness.. My temp address became valid_lft 0sec preferred_lft 14100sec so it shouldn't work any more. and "ls -l" blocked for 210 seconds. Now that same temporary address is working again. The 'valid_lft' is 228sec. It never became deprecated. It just stopped working for a while, then started again. I think I want to say "temporary addresses are obviously buggy, they shouldn't be used.." -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c17 --- Comment #17 from Marius Tomaschewski <mt@suse.com> --- (In reply to Neil Brown from comment #12)
Always requesting a public address is certainly possible but seems to go explicitly against the configuration choice to use temporary private addresses for outgoing connections.
But is the right way to use in nfs client case IMO. While RFC3484 defined, that a public address is preferred over a temporary, RFC 6724 changes the default is to prefer temporary address (-> #section-5 and changes summary in appendix-B), but it is completely fine and _correct_ when the application does not use the defaults due to it's requirements, e.g.: https://tools.ietf.org/html/rfc6724#section-5 Implementations MUST provide a mechanism allowing an application to reverse the sense of this preference and prefer public addresses over temporary addresses (e.g., via appropriate API extensions such as [RFC5014]). Use of the mechanism MUST only affect the selection rules for the invoking application. Even the Privacy Extensions RFC 4941 considers this, e.g. section-3.6: The use of temporary addresses may cause unexpected difficulties with some applications. [...] In addition, some applications may not behave robustly if temporary addresses are used and an address expires before the application has terminated, or if it opens multiple sessions, but expects them to all use the same addresses. section-6: The determination as to whether to use public versus temporary addresses can in some cases only be made by an application. (In reply to Neil Brown from comment #15)
A couple of minutes later it says:
valid_lft 155sec preferred_lft 0sec so it jumped to 0 (and became 'deprecated') rather quickly. ... OK, more weirdness.. My temp address became valid_lft 0sec preferred_lft 14100sec
preferred_lft 0 is deprecated and should not be used for new connections; the kernel is even permitted to remove it _when_ there are no connections using it (. If valid_lft reaches 0 the address is not valid any more and prefered_lft > valid_lft is invalid lifetime. When something (e.g. router RA) contains a valid_lft 0, it is request to remove things (e.g. a prefix not valid any more). Sending prefered_lft 0 is kind of "scheduled removal". Your "valid_lft 0sec preferred_lft 14100sec" could be even be related to all this, but some cleanup code is not implemented properly.
I think I want to say "temporary addresses are obviously buggy, they shouldn't be used.."
:-) There were many changes in this area in 4.2/4.4 kernel. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c18 --- Comment #18 from Neil Brown <nfbrown@suse.com> --- Thanks, setting max_desync_factor to 0 does make it behave a bit better, particularly if I "ifdown" and "ifup" after all the settings are in place. But after the first temporary address becomes deprecated, it doesn't want to create another one. It just uses the permanent address. Very occasionally deprecated address, after it has disappeared, really does come back. I was watching closely this time. When the address stays gone (which it mostly does), the TCP connection seems to timeout after about 4.5 minutes. NFS then auto-reconnects and keeps working, now using the permanent address. Arg - it happened again. I'm trying to use "telnet" to an "nc" server to minimise complexity and watch what is happening, and twice now the temporary address has become deprecated, disappears from the output of "ip -6 addr" at which point traffic stops, and then miraculously re-appears about 3 minutes later and the TCP connection starts working again. Very hard to work out exactly what is happening when the symptoms keep changing... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 Ludwig Nussel <lnussel@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lnussel@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c19 --- Comment #19 from Neil Brown <nfbrown@suse.com> --- Created attachment 686928 --> http://bugzilla.suse.com/attachment.cgi?id=686928&action=edit Patch sent upsteeam
But is the right way to use in nfs client case IMO.
I'm coming around to you point of view. I've sent the attached patch upstream. If there is no disagreement, I'll apply it to 12-SP2. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c20 --- Comment #20 from Ludwig Nussel <lnussel@suse.com> --- meanwhile Rudi gave up and changed the default use_tempaddr=1 ... So the problem gets hidden again. I'd suggest to submit the fix anyways -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c21 Neil Brown <nfbrown@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #21 from Neil Brown <nfbrown@suse.com> --- The patch was accepted upstream so I've submitted it to SLE12-SP2. It should flow from the to Leap42.2, and from upstream into tumbleweed and other releases. So: closing as fixed. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|obs:running:5537:moderate | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=986395 http://bugzilla.suse.com/show_bug.cgi?id=986395#c23 --- Comment #23 from Swamp Workflow Management <swamp@suse.de> --- openSUSE-RU-2016:2200-1: An update that has 5 recommended fixes can now be installed. Category: recommended (moderate) Bug References: 985845,986108,986395,987035,989323 CVE References: Sources used: openSUSE Leap 42.1 (src): nfs-utils-1.3.0-20.1 -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com