[Bug 960153] Network manager and DNS forwarders don't work together

8 Feb 2017

      http://bugzilla.suse.com/show_bug.cgi?id=960153
http://bugzilla.suse.com/show_bug.cgi?id=960153#c21

--- Comment #21 from Jonathan Kang <sckang@suse.com> ---
(In reply to Jan Kara from comment #20)
...
So 'netconfig modify' did not manage to finish in the allocated 1s, so
NetworkManager killed it. I'm a bit puzzled that I don't see further debug
messages showing that resolv.conf was updated directly but whatever. The
code is pretty clearly showing that if updating using netconfig fails, it
does the update directly.
Yes. There isn't any debug message when NM directly writes to resolv.conf. And
by reading those debug messagesk, it's obvious that netconfig was killed by
NM.
...
So I dug a bit more into netconfig to see what it is doing. I've
instrumented netconfig script with some debug messages and here is the
output (messages should be relatively self-explanatory):
Entering netconfig  11:00:15:691579073
Before modify  11:00:15:697381985
Entering modify  11:00:15:698074098
Done reading modify input  11:00:15:700162938
Determining state  11:00:15:702411665
Exiting from modify  11:00:15:703101298
After modify  11:00:15:703873893
Before modules  11:00:15:704495286
Running module /etc/netconfig.d//dns-resolver  11:00:15:705215014
dns-resolver begins  11:00:15:707300709
dns-resolver parsed policy  11:00:15:717535930
dns-resolver before resolv.conf  11:00:15:718404372
dns-resolver file created  11:00:15:721682996
dns-resolver header  11:00:15:723014609
dns-resolver searchlist  11:00:15:723693154
dns-resolver nameserver  11:00:15:724343748
dns-resolver done  11:00:16:693133913
<killed>
We can see that we spent most of the 1s time limit in
netconfig_check_md5_and_move() (that is the only thing happening between
'nameserver' and 'done' messages in dns-resolver. During successful resume
attempts when resolv.conf gets properly updated, I can see netconfig
finishing just barely under 1s. So that explains why the problem sometimes
happens and sometimes not.
Looking into netconfig_check_md5_and_move() function, I'm not hugely
surprised that it takes close to 1s. That spawning of subshells, gawk
processes, md5sum processes is going to take up some time when everything is
resuming and fighting for CPU and disk. That being said I can investigate
that more closely if you think it would be worth it.
These info are should be sufficient and very helpful. Thanks.
...
But overall it seems that the 1s timeout to run netconfig is just too short
in some cases and bumping it up should fix my issues.
Quite right. 1 second seems to be not long enough.

This is the documentation of nm_utils_kill_child_sync() function call:
...
Kill a child process synchronously and wait. The function first checks if
the child already terminated and if it did, return the exit status.
Otherwise send one @sig signal. @sig  will always be sent unless the child
already exited. If the child does not exit within @wait_before_kill_msec
milliseconds, the function will send %SIGKILL and waits for the child
indefinitly. If @wait_before_kill_msec is zero, no %SIGKILL signal will be
sent.
So we can set @wait_before_kill_msec to a bigger value(> 1 second). Or set
@wait_before_kill_msec to 0(we don't try to kill netconfig, let it exit
itself).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 960153] Network manager and DNS forwarders don't work together

bugzilla_noreply＠novell.com