[Bug 404039] New: Hangup of rpm file transfer for remoteurl projects
https://bugzilla.novell.com/show_bug.cgi?id=404039 Summary: Hangup of rpm file transfer for remoteurl projects Product: openSUSE.org Version: unspecified Platform: Other OS/Version: openSUSE 11.0 Status: NEW Severity: Blocker Priority: P5 - None Component: BuildService AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: martin.mohring@5etech.eu QAContact: adrian@novell.com CC: martin.mohring@5etech.eu Found By: --- OBS version in use: svn trunc -r 4276 build inside openSUSE:Tools:Unstable. I use 6 Worker Nodes and 1 OBS Server machine with all the storage and Webserver. Remoteurl projects currently get the worker nodes hangup because the file transfer for the binaries stucks in the middle. You can see this by files inside "remotecache": files of the form "upload*.cpio" stay there, not downloading the remote .rpm files anymore. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c1 --- Comment #1 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 03:51:18 MDT --- There was already fixed some similiar bug a while ago. I had to do with the receive buffer size for https package transfers. Michael could not explain why it had to be 16k or 32k buffer size. Maybe this is related to that bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 Adrian Schröter <adrian@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.provo.novell.com |mls@novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c2 --- Comment #2 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 04:57:21 MDT --- The hangup is accompanied by missing messages "stream_read_handler: EOF" inside "log/src_server.log lines", after the transfer was initiated with "new rpc https://api.opensuse.org/public//build/......". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c3 --- Comment #3 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 05:19:03 MDT --- I had put the "remotecache/" to another filesystem. Does that not work? I was missing the "serialize_end for /srv/obs/remotecache/440a5c87e0946f0e2f12b75c3051d6a5.lock" kind of messages. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c4 --- Comment #4 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 05:55:40 MDT --- Created an attachment (id=224538) --> (https://bugzilla.novell.com/attachment.cgi?id=224538) Deep recursion on subroutine "BSServerEvents::cpio_nextfile" part of the "log/src_server.log" with a deep recursion message. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c5 --- Comment #5 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 06:38:05 MDT --- Attached you will find the log of a complete hangup situation. The timer of the workers are still ticking, but nothing really happens anymore: - no new scheduled tasks - looking into the logfile of the worker in the webgui produces a timeount The situation will be indicated by: - a file upload0d4c4a7ec3aca280350933f945a1be8f.cpio staying inside remotecache/ - there is also a partially unpacked file: uploadf3910acd9227a018ead5b0d7b2ac6dd5:bash.rpm, which is 8192 bytes in size, e.g. incompleted. - The logfile with the situation is attached -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c6 --- Comment #6 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 06:42:29 MDT --- Created an attachment (id=224546) --> (https://bugzilla.novell.com/attachment.cgi?id=224546) end of the logfile for the deadlock situation In the end of the logfile you will see the transfer for the broken cpio file, as described in the message before. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c7 --- Comment #7 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 07:53:12 MDT --- To make it clear, the OBS Server is running on openSUSE:11.0 x86-64, like indicated in the Bug classification. I have installed only those packages from openSUSE:Tools and openSUSE:Tools:Unstable, which are not provided by openSUSE:11.0. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c8 --- Comment #8 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 07:54:23 MDT --- Version used: openSUSE:Tools:Unstable, build from svn trunc -r 4278. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c9 --- Comment #9 from Martin Mohring <martin.mohring@5etech.eu> 2008-06-26 08:12:18 MDT --- There is an underlying process /usr/bin/perl -w /usr/lib/obs/server//bs_srcserver beeing forked, which hangs up when the leftover file remotecache/uploadca99e1a663008948c2c3aef03584cf4b.cpio does not load anymore. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c10 --- Comment #10 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-01 17:16:57 MDT --- It seems the bigger the requested list of files (and those increasing the size of the transfering .cpio archive), the higher the probability for a hangup. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c11 Martin Mohring <martin.mohring@5etech.eu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |martin.mohring@5etech.eu --- Comment #11 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-07 12:59:58 MDT --- I can currently very easily trigger this bug after ca. 5-10 Minutes, especially when cache gets old after 1-2 days of little activity. I am not sure if others also have the problem, but I can trigger it easily. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 Martin Mohring <martin.mohring@5etech.eu> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Blocker |Critical -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User mls@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c12 --- Comment #12 from Michael Schröder <mls@novell.com> 2008-07-08 10:36:41 MDT --- Not sure if I've found the problem you're facing, but there seems to be some bug somewhere in the lighttp/fastcgi/ruby interaction: when the url is too long, the answer is no longer sent in "chunked" mode, and also no "content-length" is given. The build service can't handler that currently. I'll add a workaround that makes it fetch at most 100 packages at a time. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c13 --- Comment #13 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-09 02:11:12 MDT --- I have tried out the new bugfix, seem not to help. But I looked into the source code to see what lines of code you changed. So I tried out the following: - remove lots of pkgs from the cache - reduce the number of pkgs to fetch to 10 to trigger lots of .cpio transfers Result: hangup within less than 60 seconds. So there is another problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c14 --- Comment #14 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-09 02:18:07 MDT --- Another observation related to this bug: also the mirror script "obs_mirror_project", which does a lot of file transfers, hangs up also during the mirror of a distro more than one time when mirroring a distro like Fedora:8. I can remember that there was a problem with the ssl buffer beeing oveerrun because pkgs are too long. That was 8192, 16384 and I dont remember where it was fixed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c15 --- Comment #15 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-25 17:17:28 MDT --- Request to mls. Can you help my in the code to reduce the number of concurrent cpio transfers by changing the code. I want to reduce down to 1. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User mls@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c16 --- Comment #16 from Michael Schröder <mls@novell.com> 2008-07-28 02:49:08 MDT --- Just assign some constant string instead of the md5sum to "$serialmd5" in bs_srcserver. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c17 --- Comment #17 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-28 06:01:19 MDT --- I am currently testing with only one concurrent transfer (before I had cleared the cache). Another question to mls: what happens, when there is just one segment lost during the file transfer, or the last segment never comes? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User martin.mohring@5etech.eu added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c18 --- Comment #18 from Martin Mohring <martin.mohring@5etech.eu> 2008-07-28 07:04:33 MDT --- Another questions: can a broken cache result in the hangups? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=404039 User aj@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=404039#c19 Andreas Jaeger <aj@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|martin.mohring@5etech.eu | --- Comment #19 from Andreas Jaeger <aj@novell.com> 2008-10-24 05:30:07 MDT --- Is this issue solved? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com