[Bug 1079747] New: Kernel 4.15 seems often to stall the OBS workers
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 Bug ID: 1079747 Summary: Kernel 4.15 seems often to stall the OBS workers Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: dimstar@opensuse.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Since the upgrade to Kernel 4.15, I see a much increased number of 'stalled' OBS workers. The stall happens in various packages, most prominently, llvm5, ceph, libreoffice and java-9-openjdk. Most of the time, they are 'fine' after several attempts. There seems to be one package reliably reproducing it: tar on i586 https://build.opensuse.org/package/live_build_log/openSUSE:Factory/tar/stand... I'm CCing Adrian (OBS Admin) here, as he will likely be the only person that can provide traces from the VM, at the time this happens. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 Dominique Leuenberger <dimstar@opensuse.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |adrian@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 http://bugzilla.opensuse.org/show_bug.cgi?id=1079747#c3 Adrian Schröter <adrian@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(adrian@suse.com) | --- Comment #3 from Adrian Schröter <adrian@suse.com> --- build.opensuse.org is ready for handling some sysrq now.. You can use the latest osc (unstable build) from devel:tools:scm project and run eg: osc sendsysrq $project $package $repo $arch 9 osc sendsysrq $project $package $repo $arch t osc sendsysrq $project $package $repo $arch w to trigger sysrequests (note, only a subset is allowed via a whitelist, tell if you miss one). Alternative is to do it via the api, eg: osc api -X POST '/build/science:unstable?cmd=sendsysrq&arch=x86_64&sysrq=9&repository=openSUSE_Tumbleweed&package=FreeCAD' or use your favourite tool to speak HTTP... /me hopes that I am not the only one anymore who can debug guest kernels :) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 http://bugzilla.opensuse.org/show_bug.cgi?id=1079747#c4 --- Comment #4 from Adrian Schröter <adrian@suse.com> --- ah, please note that this will only work when the build has started within last 15minutes, otherwise it will still use a tool old worker.... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 http://bugzilla.opensuse.org/show_bug.cgi?id=1079747#c5 --- Comment #5 from Dominique Leuenberger <dimstar@opensuse.org> --- I gave this a try on Staging:B/tar/standard/i586, and the log spit out: [ 598s] 141: storing long sparse file names ok [17760s] 142: listing sparse files bigger than 2^33 B [17744.267539] sysrq: SysRq : This sysrq operation is disabled. Send sysrq 9 to Job [17796s] [17780.647309] sysrq: SysRq : Show State [17796s] [17780.648108] task PC stack pid father [17796s] [17780.649201] build S 0 1 0 0x00000000 [17796s] [17780.650363] Call Trace: [17796s] [17780.650885] ? __schedule+0x2a5/0x920 [17796s] [17780.651642] ? schedule+0x2d/0x80 [17796s] [17780.652332] ? do_wait+0x1af/0x220 [17796s] [17780.653038] ? kernel_wait4+0x70/0x110 [17796s] [17780.653810] ? task_stopped_code+0x60/0x60 [17796s] [17780.654666] ? do_int80_syscall_32+0x51/0x100 [17796s] [17780.655560] ? entry_INT80_32+0x36/0x36 [17796s] [17780.656351] kthreadd S 0 2 0 0x80000000 [17796s] [17780.658023] Call Trace: [17796s] [17780.658555] ? __schedule+0x2a5/0x920 [17796s] [17780.659312] ? schedule+0x2d/0x80 [17796s] [17780.660000] ? kthreadd+0x195/0x1b0 [17796s] [17780.660720] ? kthread_create_on_cpu+0xa0/0xa0 [17796s] [17780.661635] ? ret_from_fork+0x2e/0x38 [17796s] [17780.662447] kworker/0:0H I 0 4 2 0x80000000 [17796s] [17780.663578] Call Trace: [17796s] [17780.664098] ? __schedule+0x2a5/0x920 [17796s] [17780.664871] ? schedule+0x2d/0x80 [17796s] [17780.665569] ? worker_thread+0xa6/0x400 [17796s] [17780.666402] ? kthread+0xf0/0x110 [17796s] [17780.667096] ? process_one_work+0x3d0/0x3d0 [17796s] [17780.668750] ? kthread_create_worker_on_cpu+0x20/0x20 [17796s] [17780.669795] ? ret_from_fork+0x2e/0x38 [17796s] [17780.670607] mm_percpu_wq I 0 6 2 0x80000000 [17796s] [17780.671743] Call Trace: [17796s] [17780.672273] ? __schedule+0x2a5/0x920 [17796s] [17780.673043] ? schedule+0x2d/0x80 [17796s] [17780.673743] ? rescuer_thread+0x2ce/0x310 [17796s] [17780.674608] ? preempt_schedule_common+0x11/0x30 [17796s] [17780.675567] ? kthread+0xf0/0x110 [17796s] [17780.676262] ? cancel_delayed_work_sync+0x20/0x20 [17796s] [17780.677239] ? kthread_create_worker_on_cpu+0x20/0x20 [17796s] [17780.678574] ? ret_from_fork+0x2e/0x38 [17796s] [17780.679351] ksoftirqd/0 S 0 7 2 0x80000000 [17796s] [17780.680475] Call Trace: [17796s] [17780.680998] ? __schedule+0x2a5/0x920 [17796s] [17780.681754] ? schedule+0x2d/0x80 [17796s] [17780.682459] ? smpboot_thread_fn+0x1b4/0x200 [17796s] [17780.683348] ? kthread+0xf0/0x110 [17796s] [17780.684044] ? sort_range+0x20/0x20 [17796s] [17780.684779] ? kthread_create_worker_on_cpu+0x20/0x20 [17796s] [17780.685825] ? ret_from_fork+0x2e/0x38 [17796s] [17780.686917] rcu_preempt I 0 8 2 0x80000000 [...] -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 http://bugzilla.opensuse.org/show_bug.cgi?id=1079747#c6 --- Comment #6 from Dominique Leuenberger <dimstar@opensuse.org> --- Created attachment 759586 --> http://bugzilla.opensuse.org/attachment.cgi?id=759586&action=edit Complete log from OBS for Staging:B/tar/standard/i586 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1079747 http://bugzilla.opensuse.org/show_bug.cgi?id=1079747#c9 --- Comment #9 from Dominique Leuenberger <dimstar@opensuse.org> --- https://build.opensuse.org/package/live_build_log/openSUSE:Factory:Staging:G... The kernel stall case is less promising: [ 7976s] RPMLINT report: [ 7976s] =============== Send sysrq 9 to Job Send sysrq t to Job Send sysrq w to Job [13451s] qemu-system-x86_64: terminating on signal 15 from pid 12851 () [13451s] qemu-system-x86_64: Failed to unlink socket /var/cache/obs/worker/root_6/root.monitor: Permission denied Killed Job [13451s] ### VM INTERACTION END ### [13451s] No buildstatus set, either the base system is broken (kernel/initrd/udev/glibc/bash/perl) [13451s] or the build host has a kernel or hardware problem... There were no further reactions on the sysreq calls -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com