[opensuse-buildservice] Random delays during worker start
Hello, In our private OBS instance we use KVM workers on SLE12S-SP3. The build servers are HP DL380 Gen9 with 40 cores with 256GB RAM for big workers, 128 GB Ram for smaller ones. On such HW we running for example 10 instances with 3 Jobs. The Problem is that after the initial install stage here are very often (I would say ~ 1/3 of the builds) huge delays in the start of the VM: A good one: .... [ 10s] booting kvm... [ 10s] ### VM INTERACTION START ### [ 11s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 11s] [ 1.063999] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 12s] [ 2.047790] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 12s] [ 2.052451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 12s] ### VM INTERACTION END ### [ 12s] 2nd stage started in virtual machine [ 12s] machine type: x86_64 ... Bad looks like: ... [ 11s] booting kvm... [ 12s] ### VM INTERACTION START ### [ 12s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 12s] [ 1.437692] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 111s] [ 98.122901] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 112s] [ 99.982451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 112s] ### VM INTERACTION END ### [ 112s] 2nd stage started in virtual machine [ 112s] machine type: x86_64 ... This could also up to 300 seconds, I do not really found a relationship to the size of the build jobs or amount of packages to install. Also the build itself seems not to be much slower after the delay. The host itself does not have any high load at these moments, most of CPUs is idle. So my questions are: What exact happens in that stage of the VM start, are there copies of the initial packages ongoing ? Does somebody see similar things ? OBS version is 2.92. Thank you Karsten Keil The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, please notify Airbus immediately and delete this e-mail. Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately. All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free. -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Mittwoch, 5. Dezember 2018, 19:54:54 CET Keil, Karsten wrote:
Hello,
In our private OBS instance we use KVM workers on SLE12S-SP3. The build servers are HP DL380 Gen9 with 40 cores with 256GB RAM for big workers, 128 GB Ram for smaller ones. On such HW we running for example 10 instances with 3 Jobs. The Problem is that after the initial install stage here are very often (I would say ~ 1/3 of the builds) huge delays in the start of the VM:
A good one: .... [ 10s] booting kvm... [ 10s] ### VM INTERACTION START ### [ 11s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 11s] [ 1.063999] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 12s] [ 2.047790] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 12s] [ 2.052451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 12s] ### VM INTERACTION END ### [ 12s] 2nd stage started in virtual machine [ 12s] machine type: x86_64 ... Bad looks like: ... [ 11s] booting kvm... [ 12s] ### VM INTERACTION START ### [ 12s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 12s] [ 1.437692] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 111s] [ 98.122901] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 112s] [ 99.982451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 112s] ### VM INTERACTION END ### [ 112s] 2nd stage started in virtual machine [ 112s] machine type: x86_64 ... This could also up to 300 seconds, I do not really found a relationship to the size of the build jobs or amount of packages to install. Also the build itself seems not to be much slower after the delay. The host itself does not have any high load at these moments, most of CPUs is idle. So my questions are: What exact happens in that stage of the VM start, are there copies of the initial packages ongoing ? Does somebody see similar things ? OBS version is 2.92.
It seems to be the initrd booting... OBS version should not really matter here, it is more a question which kernel/ initrd is used IMHO. Do you use a kernel-obs-build here or is it the one from the worker system? -- Adrian Schroeter SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany email: adrian@suse.de -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Hello Adrian, In this case it is kernel-obs-build-4.4.73-5.1.x86_64.rpm from SLE12-SP3 SDK GM. So would updating it to the latest from the SDK SP3 Update repository solve this issue ? We build our software in normal case always against th GM. Thanks Karsten Keil -----Original Message----- From: Adrian Schröter [mailto:adrian@suse.de] Sent: Thursday, December 06, 2018 9:11 AM To: opensuse-buildservice@opensuse.org Cc: Keil, Karsten Subject: Re: [opensuse-buildservice] Random delays during worker start On Mittwoch, 5. Dezember 2018, 19:54:54 CET Keil, Karsten wrote:
Hello,
In our private OBS instance we use KVM workers on SLE12S-SP3. The build servers are HP DL380 Gen9 with 40 cores with 256GB RAM for big workers, 128 GB Ram for smaller ones. On such HW we running for example 10 instances with 3 Jobs. The Problem is that after the initial install stage here are very often (I would say ~ 1/3 of the builds) huge delays in the start of the VM:
A good one: .... [ 10s] booting kvm... [ 10s] ### VM INTERACTION START ### [ 11s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 11s] [ 1.063999] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 12s] [ 2.047790] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 12s] [ 2.052451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 12s] ### VM INTERACTION END ### [ 12s] 2nd stage started in virtual machine [ 12s] machine type: x86_64 ... Bad looks like: ... [ 11s] booting kvm... [ 12s] ### VM INTERACTION START ### [ 12s] /usr/bin/qemu-kvm -nodefaults -no-reboot -nographic -vga none -object rng-random,filename=/dev/random,id=rng0 -device virtio-rng-pci,rng=rng0 -runas qemu -cpu host -net n [ 12s] [ 1.437692] dracut-pre-udev[161]: modprobe: FATAL: Module kqemu not found. [ 111s] [ 98.122901] dracut-pre-udev[161]: modprobe: FATAL: Module ibmvscsi not found. [ 112s] [ 99.982451] dracut-pre-udev[161]: modprobe: FATAL: Module ibmveth not found. [ 112s] ### VM INTERACTION END ### [ 112s] 2nd stage started in virtual machine [ 112s] machine type: x86_64 ... This could also up to 300 seconds, I do not really found a relationship to the size of the build jobs or amount of packages to install. Also the build itself seems not to be much slower after the delay. The host itself does not have any high load at these moments, most of CPUs is idle. So my questions are: What exact happens in that stage of the VM start, are there copies of the initial packages ongoing ? Does somebody see similar things ? OBS version is 2.92.
It seems to be the initrd booting... OBS version should not really matter here, it is more a question which kernel/ initrd is used IMHO. Do you use a kernel-obs-build here or is it the one from the worker system? -- Adrian Schroeter SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nuernberg, Germany email: adrian@suse.de -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, please notify Airbus immediately and delete this e-mail. Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately. All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free. -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Am 06.12.18 um 10:30 schrieb Keil, Karsten:
Hello Adrian,
In this case it is kernel-obs-build-4.4.73-5.1.x86_64.rpm from SLE12-SP3 SDK GM. So would updating it to the latest from the SDK SP3 Update repository solve this issue ?
Nobody knows ;-) But for testing, you could add dracut debug options to the qemu-kvm call in the build script: search build-vm vm_linux_kernel_parameter= and add "rd.debug" or similar to check what's going on.
We build our software in normal case always against th GM.
Me too, and I have not yet seen such problems, but my workers are slightly older ;-) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Hi Stefan, Am 10.12.18 um 16:59 schrieb Stefan Seyfried:
Am 06.12.18 um 10:30 schrieb Keil, Karsten:
Hello Adrian,
In this case it is kernel-obs-build-4.4.73-5.1.x86_64.rpm from SLE12-SP3 SDK GM. So would updating it to the latest from the SDK SP3 Update repository solve this issue ?
Nobody knows ;-)
It did not help.
But for testing, you could add dracut debug options to the qemu-kvm call in the build script:
search build-vm vm_linux_kernel_parameter= and add "rd.debug" or similar to check what's going on.
Very good hint. I added rd.debug to qemu_append variable in build-vm-kvm. The builds did stall on the loading of virtio-rng.ko. So it seems that here was not enough entropy on the worker servers random pool. So we installed haveged and the issue was gone. Maybe it should be added to the worker setup section in the documentation. Thanks a lot Karsten -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
participants (4)
-
Adrian Schröter
-
Karsten Keil
-
Keil, Karsten
-
Stefan Seyfried