Comment # 20 on bug 960153 from
Ah, good point. I have actually rebooted couple of times since I set the debug
option so I had enough NetworkManager logs accumulated. Looking into the logs a
situation when resolv.conf is updated directly by NetworkManager is triggered
by the following sequence of events:

Feb 05 15:18:22 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304302.827341] [dns-manager/nm-dns-manager.c:635] update_dns(): updating
resolv.conf
Feb 05 15:18:22 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304302.827355] [dns-manager/nm-dns-manager.c:250] run_netconfig():
spawning '/sbin/netconfig modify --service NetworkManager' 
Feb 05 15:18:22 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304302.828307] [dns-manager/nm-dns-manager.c:267] write_to_netconfig():
writing to netconfig: INTERFACE='NetworkManager'
Feb 05 15:18:22 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304302.828338] [dns-manager/nm-dns-manager.c:267] write_to_netconfig():
writing to netconfig: DNSSEARCH='suse.cz'
Feb 05 15:18:22 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304302.828355] [NetworkManagerUtils.c:805] nm_utils_kill_child_sync():
kill child process 'netconfig' (28627): waiting up to 1000 milliseconds for
process to terminate normally after sending no signal (0)...
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828418] [NetworkManagerUtils.c:821] nm_utils_kill_child_sync():
kill child process 'netconfig' (28627): sending SIGKILL...
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828620] [NetworkManagerUtils.c:855] nm_utils_kill_child_sync():
kill child process 'netconfig' (28627): after sending no signal (0) and
SIGKILL, process 28627 exited by signal 9 (1000266 usec elapsed)
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828908] [devices/nm-device.c:6716] nm_device_set_ip6_config():
[0x1e72d90] (em1): clear IP6Config instance
(/org/freedesktop/NetworkManager/IP6Config/10)
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828959] [dns-manager/nm-dns-manager.c:1040]
nm_dns_manager_begin_updates(): (device_ip6_config_changed): queueing DNS
updates (1)
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828979] [dns-manager/nm-dns-manager.c:1058]
nm_dns_manager_end_updates(): (nm_dns_manager_end_updates): DNS configuration
did not change
Feb 05 15:18:23 quack2.suse.cz NetworkManager[1105]: <debug>
[1486304303.828986] [dns-manager/nm-dns-manager.c:1062]
nm_dns_manager_end_updates(): (device_ip6_config_changed): no DNS changes to
commit (0)

So 'netconfig modify' did not manage to finish in the allocated 1s, so
NetworkManager killed it. I'm a bit puzzled that I don't see further debug
messages showing that resolv.conf was updated directly but whatever. The code
is pretty clearly showing that if updating using netconfig fails, it does the
update directly.

So I dug a bit more into netconfig to see what it is doing. I've instrumented
netconfig script with some debug messages and here is the output (messages
should be relatively self-explanatory):

Entering netconfig  11:00:15:691579073
Before modify  11:00:15:697381985
Entering modify  11:00:15:698074098
Done reading modify input  11:00:15:700162938
Determining state  11:00:15:702411665
Exiting from modify  11:00:15:703101298
After modify  11:00:15:703873893
Before modules  11:00:15:704495286
Running module /etc/netconfig.d//dns-resolver  11:00:15:705215014
dns-resolver begins  11:00:15:707300709
dns-resolver parsed policy  11:00:15:717535930
dns-resolver before resolv.conf  11:00:15:718404372
dns-resolver file created  11:00:15:721682996
dns-resolver header  11:00:15:723014609
dns-resolver searchlist  11:00:15:723693154
dns-resolver nameserver  11:00:15:724343748
dns-resolver done  11:00:16:693133913
<killed>

We can see that we spent most of the 1s time limit in
netconfig_check_md5_and_move() (that is the only thing happening between
'nameserver' and 'done' messages in dns-resolver. During successful resume
attempts when resolv.conf gets properly updated, I can see netconfig finishing
just barely under 1s. So that explains why the problem sometimes happens and
sometimes not.

Looking into netconfig_check_md5_and_move() function, I'm not hugely surprised
that it takes close to 1s. That spawning of subshells, gawk processes, md5sum
processes is going to take up some time when everything is resuming and
fighting for CPU and disk. That being said I can investigate that more closely
if you think it would be worth it.

But overall it seems that the 1s timeout to run netconfig is just too short in
some cases and bumping it up should fix my issues.


You are receiving this mail because: