[opensuse-buildservice] own OBS instance: worker -> putjob performance problem
Hi all, I'm running an instance of OBS 2.9.3 with two workers, building all sorts of stuff including kiwi images. Sometimes, it takes very long to upload the build results. Right now, the worker is uploading an image, which finished build at 09:21: # ls -lh /srv/obs/worker-root/root_9/.build.packages/KIWI/SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz -rw-r--r-- 1 root root 457M Jun 27 09:21 /srv/obs/worker-root/root_9/.build.packages/KIWI/SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz obs:~ # l /srv/obs/jobs/x86_64/.putjob.17789/SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz -h -rw-r--r-- 1 obsrun obsrun 88M Jun 27 12:31 /srv/obs/jobs/x86_64/.putjob.17789/SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz As you can see, it's three hours for 88MB... a bit slow for a multi-gbit ethernet link ;) stracing the worker process, I see it sending packets: 12:39:37 read(5, "\16\245\0Y\fEw\237M\16\367\227\30}\265'"..., 8192) = 8192 12:39:37 write(4, "2000\r\n\16\245\0Y\fEw\237M\16"..., 8200) = 8200 12:39:42 read(5, "\342\233\332\370\36W\331\177g=\272\337pbll"..., 8192) = 8192 12:39:42 write(4, "2000\r\n\342\233\332\370\36W\331\177g="..., 8200) = 8200 and the 5 seconds pause in between looks like it might be some missing tcp flag, or timing parameter or such? this is my iperf summary for this link: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 6.82 GBytes 5.86 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 6.82 GBytes 5.86 Gbits/sec receiver So the network itself should be fine. Any hints how to debug that? I have not seen this before the update from 2.8.4 to 2.9.3. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 27.06.2018 14:41, Stefan Seyfried wrote:
Any hints how to debug that? I have not seen this before the update from 2.8.4 to 2.9.3.
Some more facts: on the build worker: # ls -lh /srv/obs/worker-root/root_9/.build.packages/KIWI/ total 789M -rw-r--r-- 1 root root 96K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.packages -rw-r--r-- 1 root root 170K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.report -rw-r--r-- 1 root root 332M Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.tar.xz -rw-r--r-- 1 root root 126 Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.tar.xz.sha256 -rw-r--r-- 1 root root 2.1K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.verified -rw-r--r-- 1 root root 96K Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.packages -rw-r--r-- 1 root root 170K Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.report -rw-r--r-- 1 root root 457M Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz -rw-r--r-- 1 root root 116 Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz.sha256 -rw-r--r-- 1 root root 2.1K Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.verified on the OBS server: # ls -lh /srv/obs/jobs/x86_64/.putjob.17789/ total 558M -rw-r--r-- 1 obsrun obsrun 96K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.packages -rw-r--r-- 1 obsrun obsrun 170K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.report -rw-r--r-- 1 obsrun obsrun 332M Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.tar.xz -rw-r--r-- 1 obsrun obsrun 126 Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.tar.xz.sha256 -rw-r--r-- 1 obsrun obsrun 2.1K Jun 27 09:21 SLES_12SP2_SAAS_VM1-docker.x86_64-2.4.0-Build3.1.verified -rw-r--r-- 1 obsrun obsrun 96K Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.packages -rw-r--r-- 1 obsrun obsrun 170K Jun 27 09:21 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.report -rw-r--r-- 1 obsrun obsrun 224M Jun 27 13:37 SLES_12SP2_SAAS_VM1.x86_64-2.4.0-Build3.1.tbz So the first part got transferred quite fast, but then it somewhen started to slow down. Storage for /srv/obs is a NetAPP NFS share, which is not blazingly fast, but certainly much faster than the few kilobytes per second I'm seeing here. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 27.06.2018 15:40, Stefan Seyfried wrote:
On 27.06.2018 14:41, Stefan Seyfried wrote:
Any hints how to debug that? I have not seen this before the update from 2.8.4 to 2.9.3.
Some more facts:
It *seems* like the issue vanished after updating the workers from SLES11-SP4 to SLES12-SP3 (which had other "interesting" effects due to dracut doing stupid things to initrds... ;-)) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
participants (1)
-
Stefan Seyfried