Re: [wicked-devel] Wicked totally fails for eth card if the link is physically down at startup

17 Jun 2016

      Am 16.06.2016 um 20:51 schrieb Nikolai Zhubr:
...
Hello Marius,
16.06.2016 19:04, Marius Tomaschewski:
...
Am 16.06.2016 um 17:59 schrieb Marius Tomaschewski:
...
Am 16.06.2016 um 14:42 schrieb Nikolai Zhubr:
[...]
But LINK_REQUIRED=no default is definitely the wrong way to go.
But now, you'll be able to set such a default for you:
https://github.com/openSUSE/wicked/pull/657
Good to know, I'll definitely add it to my emergency reminder.
Although it still feels somewhat hacky...
Because it is :-)

The short version of all this is:

LINK_REQUIRED=no is not possible as default, because it breaks requested
+ approved features and reverts to obsolete RFCs [that
not require duplicate address detection on ipv4], which were
written before many of the protocols/mechanisms existing today.
...
I appreciate your detailed explantion of why changing the default is
inacceptable and I tend to agree that your reasons are valid.
I explicitly spend my time here to explain in hope it will reach more
people. Perhaps even somebody willing to help.

Yes, we definitely have to improve the documentation -- any help here
would be very appreciated.

e.g. using
 https://en.opensuse.org/Portal:Wicked
 https://github.com/openSUSE/wicked/wiki
 https://github.com/openSUSE/wicked/tree/master/doc[/FAQ.txt]
[merge them somehow?]
...
But then question is, how to properly set up a most trivial ipv4-only
configuration with a static (fixed forever) ip address and some trivial
tcp service (say httpd) bound to this fixed ip address only?
As I see it now, there are 2 choices currently:
1. Leave all per default. Works fine until one day ocasionally link
happens to be temporary down at the time of booting up. The service will
not be able to bind to the address it expects and will fail, which is
unwise because we know our address is fixed forever anyway.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Really? :-)

Usually, you don't install 1000 routers + 1000 servers for 1
workstation in the network.
It is exactly the another way around: you install ~1..2 router
and few servers for 2000 workstations.

In case of the routers, you can disable e.g. duplicate address detection
and define "anybody using this address is broken, I'm not".

In case of some servers, you can do the same. But be prepared to
manually verify it and fix machines using an address, before you
install a new one -- and unintentionally have a typo in the IP addr.

In case of workstations -- using a static address configuration
(as with any dynamic addresses, dad is simply mandatory), you would
be very unhappy when the person effectively doing this in your advise
mixed the gateway or server address with the ip address and your
advise was to disable duplicate address detection/set LINK_REQUIRE=no.

Even worser with mac address duplicates (e.g. in some virtualization
setup where) -- ipv6 covers this, ipv4 not really.

Let's go through the duplicate address detection only -- just because
it is the most trivial variant to test/see, it's not the only one.

In ipv6 this is more clear/visible. For IPv4, the kernel unfortunately
does not provide such functionality/handling, but it is quite similar:
there are also diverse L2 protocols which all start with/need a carrier
(e.g. bond lacp, bridge stp, link authentication) and have to finish
before the address can be really used. Instead of arp, the kernel is
using ipv6 multicasts (fe80 link local <-> ff multicast addresses).

When the kernel is applying an IPv6 address (regardless if static or
not), it sets a tentative flag on it. [Yes, you can disable it and
run e.g. into MAC collisions, ... -> have fun].

It is visible in the interface, but is not ready yet/pending and it is
not possible to bind it in a service, thus a service start will simply
fail, e.g. netcat here [the link _will_ get carrier later]:

# ip link set down eth1 ; echo ; ip a s dev eth1 ; echo ; ip link set up
dev eth1 ; ip a s dev eth1 ; echo ; ip a a 2001:db8:a::b/64 dev eth1 ;
echo ; ip a s dev eth1 ; echo ; netcat -v -6 -l 2001:db8:a::b 8888 ;
echo ; ip a s dev eth1 ; echo ; sleep 2 ; ip a s dev eth1 ; echo ; sleep
5 ; ip a s dev eth1

3: eth1:  mtu 1500 qdisc pfifo_fast state DOWN
group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff

3: eth1:  mtu 1500 qdisc pfifo_fast
state DOWN group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff

3: eth1:  mtu 1500 qdisc pfifo_fast
state DOWN group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:a::b/64 scope global tentative
       valid_lft forever preferred_lft forever

netcat: Cannot assign requested address

3: eth1:  mtu 1500 qdisc pfifo_fast
state DOWN group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:a::b/64 scope global tentative
       valid_lft forever preferred_lft forever

3: eth1:  mtu 1500 qdisc pfifo_fast
state DOWN group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:a::b/64 scope global tentative
       valid_lft forever preferred_lft forever

3: eth1:  mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
    link/ether 68:05:ca:0a:39:e7 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:a::b/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::6a05:caff:fe0a:39e7/64 scope link
       valid_lft forever preferred_lft forever

Once the link carrier has been detected, ... ipv6 assigns a tentative
link local address (you can think of it as "ipv6 MAC address", it is
quite often an EUI64 mapped MAC address).

Then it starts duplicate address detection of the link local address
as well as all other addresses you may have assigned before.

When all is fine, it will remove the tentative flag _and_ emit a
NEWADDR event -> programs are able to bind to the addresses.

In case of failure (duplicate has been detected), the kernel:
 - just deletes addresses dynamic with a lifetime + emits DELADDR
 - removes the tentative flag and sets dadfailed flag on persistent
   (infinite lifetime) addresses + emits NEWADDR [broken one].

In ipv4, you simply send to /dev/null and don't see all this.
Of course, it is possible to monitor much more things [e.g. bridge
traffic on switches, neighbour discovery ...].
...
2. Manually tweak LINK_REQUIRED. This will allow our service to
successfully start no matter what. But feels hacky, and you might not
know about this LINK_REQUIRED thing until you get a surprise.
On a router, I'd do it or at least disable duplicate address detection
(accept_dad=0, CHECK_DUPLICATE_IP=no); otherwise definitely not as a
default but per interface if _really_ required.
...
I'm not sure, both variants feel somewhat flawed to me (And this is a
remarkably trivial use case!). I can live with the second one, but it
looks like there is still room for improvement.
There is simply no "golden way" that works in any case, just a bunch of
config tweaks / configuration scenarios permitting to address the issue.

"I just need to start after the network service started/network-online
is reached" is a false conclusion.
It is not possible to block the system at boot time forever -> there is
always a timeout. In our case, it is the WAIT_FOR_INTERFACES setting.
Regardless of the state, we have return & report after.

About services: a service which binds to a specific address (or device
in case of BindDevice) without to monitor the addresses/handle this
accordingly, is just broken or misconfigured.

You have to assume that the devices are detected in random order and
may reach carrier (or not) also in random order and all this at a
unspecified point of time. Anything else is opportunistic (hack :).

So:
  - Bind to 0.0.0.0 || :: + set ACLs as neeeded

  - Start the service once the address is available
    (e.g. via POST_UP_SCRIPT="systemd:....")

  - Fix the service (report a bug) to bind using a free bind,
    from "man 7 ip":

       IP_FREEBIND (since Linux 2.4)
              If enabled, this boolean option allows binding to an IP
address that is nonlocal or
              does not (yet) exist.  This permits listening on a socket,
 without  requiring  the
              underlying  network  interface  or the specified dynamic
IP address to be up at the
              time that the application is trying to bind to it.  This
option is  the  per-socket
              equivalent of the ip_nonlocal_bind /proc interface
described below.

    See e.g. [SLES Bug #958728 that caused to apply the patch]:
       https://github.com/rsyslog/rsyslog/blob/v8-stable/ChangeLog
    where we've implemented setting of the option for rsyslog.

See also:
  https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
which basically says the same.

Further, try to configure explicitly. Quite often, the config does
not say what it exactly wants and we use some defaults (e.g. kernel
sysctl where default is to autoconf ipv6 even on loopback, even the
kernel will never do it there ;-) where we think, it will fit and
cause the lowest number of bug reports / complains / regressions.

And: Do _not_ disable nanny use in wicked (set use-nanny true in
/etc/wicked/common.xml or local.xml):

nanny is responsible to "ifup" an interface when it appears and to
continue with ifup when the carrier arrives (or other prerequires)
and call POST_UP scripts.

A wicked with disabled nanny makes a direct/inline one-shot "ifup"
for 30sec (by default) and exits+reports -- it will _not_ continue
later when the carrier is there.
It is kind of test mode allowing us to debug what the finite state
machine is doing without to debug a daemon (which may also handle
stuff we are not relevant, ...).

A wicked without nanny will not reapply the config on restart e.g.
on update (not from /etc which may have changed in the meantime;
nanny keeps a copy of the last applied config) or renew leases.

Gruesse / Regards,
 Marius Tomaschewski , 
-- 
 SUSE LINUX GmbH, GF: Felix Imendörffer, Jane Smithard,
 Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg),
 Maxfeldstraße 5, 90409 Nürnberg, Germany
-- 
To unsubscribe, e-mail: wicked-devel+unsubscribe@opensuse.org
To contact the owner, e-mail: wicked-devel+owner@opensuse.org