Am 16.06.2016 um 20:51 schrieb Nikolai Zhubr:
Hello Marius, 16.06.2016 19:04, Marius Tomaschewski:
Am 16.06.2016 um 17:59 schrieb Marius Tomaschewski:
Am 16.06.2016 um 14:42 schrieb Nikolai Zhubr: [...] But LINK_REQUIRED=no default is definitely the wrong way to go.
But now, you'll be able to set such a default for you:
Good to know, I'll definitely add it to my emergency reminder. Although it still feels somewhat hacky...
Because it is :-) The short version of all this is: LINK_REQUIRED=no is not possible as default, because it breaks requested + approved features and reverts to obsolete RFCs [that not require duplicate address detection on ipv4], which were written before many of the protocols/mechanisms existing today.
I appreciate your detailed explantion of why changing the default is inacceptable and I tend to agree that your reasons are valid.
I explicitly spend my time here to explain in hope it will reach more people. Perhaps even somebody willing to help. Yes, we definitely have to improve the documentation -- any help here would be very appreciated. e.g. using https://en.opensuse.org/Portal:Wicked https://github.com/openSUSE/wicked/wiki https://github.com/openSUSE/wicked/tree/master/doc[/FAQ.txt] [merge them somehow?]
But then question is, how to properly set up a most trivial ipv4-only configuration with a static (fixed forever) ip address and some trivial tcp service (say httpd) bound to this fixed ip address only?
As I see it now, there are 2 choices currently:
1. Leave all per default. Works fine until one day ocasionally link happens to be temporary down at the time of booting up. The service will not be able to bind to the address it expects and will fail, which is unwise because we know our address is fixed forever anyway. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Really? :-)
Usually, you don't install 1000 routers + 1000 servers for 1
workstation in the network.
It is exactly the another way around: you install ~1..2 router
and few servers for 2000 workstations.
In case of the routers, you can disable e.g. duplicate address detection
and define "anybody using this address is broken, I'm not".
In case of some servers, you can do the same. But be prepared to
manually verify it and fix machines using an address, before you
install a new one -- and unintentionally have a typo in the IP addr.
In case of workstations -- using a static address configuration
(as with any dynamic addresses, dad is simply mandatory), you would
be very unhappy when the person effectively doing this in your advise
mixed the gateway or server address with the ip address and your
advise was to disable duplicate address detection/set LINK_REQUIRE=no.
Even worser with mac address duplicates (e.g. in some virtualization
setup where) -- ipv6 covers this, ipv4 not really.
Let's go through the duplicate address detection only -- just because
it is the most trivial variant to test/see, it's not the only one.
In ipv6 this is more clear/visible. For IPv4, the kernel unfortunately
does not provide such functionality/handling, but it is quite similar:
there are also diverse L2 protocols which all start with/need a carrier
(e.g. bond lacp, bridge stp, link authentication) and have to finish
before the address can be really used. Instead of arp, the kernel is
using ipv6 multicasts (fe80 link local <-> ff multicast addresses).
When the kernel is applying an IPv6 address (regardless if static or
not), it sets a tentative flag on it. [Yes, you can disable it and
run e.g. into MAC collisions, ... -> have fun].
It is visible in the interface, but is not ready yet/pending and it is
not possible to bind it in a service, thus a service start will simply
fail, e.g. netcat here [the link _will_ get carrier later]:
# ip link set down eth1 ; echo ; ip a s dev eth1 ; echo ; ip link set up
dev eth1 ; ip a s dev eth1 ; echo ; ip a a 2001:db8:a::b/64 dev eth1 ;
echo ; ip a s dev eth1 ; echo ; netcat -v -6 -l 2001:db8:a::b 8888 ;
echo ; ip a s dev eth1 ; echo ; sleep 2 ; ip a s dev eth1 ; echo ; sleep
5 ; ip a s dev eth1
3: eth1:
2. Manually tweak LINK_REQUIRED. This will allow our service to successfully start no matter what. But feels hacky, and you might not know about this LINK_REQUIRED thing until you get a surprise.
On a router, I'd do it or at least disable duplicate address detection (accept_dad=0, CHECK_DUPLICATE_IP=no); otherwise definitely not as a default but per interface if _really_ required.
I'm not sure, both variants feel somewhat flawed to me (And this is a remarkably trivial use case!). I can live with the second one, but it looks like there is still room for improvement.
There is simply no "golden way" that works in any case, just a bunch of
config tweaks / configuration scenarios permitting to address the issue.
"I just need to start after the network service started/network-online
is reached" is a false conclusion.
It is not possible to block the system at boot time forever -> there is
always a timeout. In our case, it is the WAIT_FOR_INTERFACES setting.
Regardless of the state, we have return & report after.
About services: a service which binds to a specific address (or device
in case of BindDevice) without to monitor the addresses/handle this
accordingly, is just broken or misconfigured.
You have to assume that the devices are detected in random order and
may reach carrier (or not) also in random order and all this at a
unspecified point of time. Anything else is opportunistic (hack :).
So:
- Bind to 0.0.0.0 || :: + set ACLs as neeeded
- Start the service once the address is available
(e.g. via POST_UP_SCRIPT="systemd:....")
- Fix the service (report a bug) to bind using a free bind,
from "man 7 ip":
IP_FREEBIND (since Linux 2.4)
If enabled, this boolean option allows binding to an IP
address that is nonlocal or
does not (yet) exist. This permits listening on a socket,
without requiring the
underlying network interface or the specified dynamic
IP address to be up at the
time that the application is trying to bind to it. This
option is the per-socket
equivalent of the ip_nonlocal_bind /proc interface
described below.
See e.g. [SLES Bug #958728 that caused to apply the patch]:
https://github.com/rsyslog/rsyslog/blob/v8-stable/ChangeLog
where we've implemented setting of the option for rsyslog.
See also:
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
which basically says the same.
Further, try to configure explicitly. Quite often, the config does
not say what it exactly wants and we use some defaults (e.g. kernel
sysctl where default is to autoconf ipv6 even on loopback, even the
kernel will never do it there ;-) where we think, it will fit and
cause the lowest number of bug reports / complains / regressions.
And: Do _not_ disable nanny use in wicked (set use-nanny true in
/etc/wicked/common.xml or local.xml):
nanny is responsible to "ifup" an interface when it appears and to
continue with ifup when the carrier arrives (or other prerequires)
and call POST_UP scripts.
A wicked with disabled nanny makes a direct/inline one-shot "ifup"
for 30sec (by default) and exits+reports -- it will _not_ continue
later when the carrier is there.
It is kind of test mode allowing us to debug what the finite state
machine is doing without to debug a daemon (which may also handle
stuff we are not relevant, ...).
A wicked without nanny will not reapply the config on restart e.g.
on update (not from /etc which may have changed in the meantime;
nanny keeps a copy of the last applied config) or renew leases.
Gruesse / Regards,
Marius Tomaschewski