[Bug 1104331] New: OSD Segmentation fault in thread_name:safe_timer
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Bug ID: 1104331 Summary: OSD Segmentation fault in thread_name:safe_timer Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.3 Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other Assignee: bnc-team-screening@forge.provo.novell.com Reporter: eugen.block@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- We're running a Ceph Luminous Cluster on Leap 42.3: ceph1:~ # rpm -qi ceph-common Name : ceph-common Version : 12.2.5+git.1524775272.5e7ea8cf03 Release : 2.1 Architecture: x86_64 Install Date: Thu May 24 15:14:56 2018 Group : System/Filesystems Size : 18064537 License : LGPL-2.1 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Signature : RSA/SHA256, Fri Apr 27 01:48:28 2018, Key ID 98c97fe7324e6311 Source RPM : ceph-12.2.5+git.1524775272.5e7ea8cf03-2.1.src.rpm Build Date : Fri Apr 27 01:10:29 2018 Build Host : lamb72 Relocations : (not relocatable) Vendor : obs://build.opensuse.org/filesystems Since a couple of days we're suddenly experiencing random segfaults leading to failing OSDs. We didn't update any packages on our ceph servers, they ran stable until these segfaults. Sometimes the OSDs recover by themselves, sometimes a whole host goes down and leaves a degraded cluster. The description in [2] sounds exactly like our issue ([1] is a duplicate of [2]). I'll spare you the details since the upstream bug has lots of information. There seems to be a backport [3] for Luminous, but it's still in progress although the change has been reviewed. I just wanted to raise awareness for this issue since it impacts our production cluster that ran stable for months and we'd appreciate it if the updated packages would be available soon to fix this issue. Please let me know if any further information is needed. [1] https://tracker.ceph.com/issues/23431 [2] https://tracker.ceph.com/issues/23352 [3] https://tracker.ceph.com/issues/26871 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Eugen Block <eugen.block@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jens.mozdzen@suse.com, | |ncutler@suse.com Severity|Normal |Critical -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Eugen Block <eugen.block@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|Critical |Major -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 http://bugzilla.suse.com/show_bug.cgi?id=1104331#c2 Tim Serong <tserong@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(tserong@suse.com) | --- Comment #2 from Tim Serong <tserong@suse.com> ---
Tim, could you please review that PR?
LGTM -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 SMASH SMASH <smash_bz@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard| | maint:planned:update -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 http://bugzilla.suse.com/show_bug.cgi?id=1104331#c10 Eugen Block <eugen.block@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(eugen.block@suse. | |com) | --- Comment #10 from Eugen Block <eugen.block@suse.com> --- We updated the rest of the cluster to 12.2.7 more than three weeks ago, we haven't faced any OSD segfault since then, so I would confirm that the issue has been resolved. Thanks for your help! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 http://bugzilla.suse.com/show_bug.cgi?id=1104331#c11 --- Comment #11 from Swamp Workflow Management <swamp@suse.de> --- SUSE-RU-2018:2974-1: An update that has four recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1100101,1104331,1105251,1107857 CVE References: Sources used: SUSE Linux Enterprise Software Development Kit 12-SP3 (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 SUSE Linux Enterprise Server 12-SP3 (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 SUSE Linux Enterprise Desktop 12-SP3 (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 SUSE Enterprise Storage 5 (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 SUSE CaaS Platform ALL (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 SUSE CaaS Platform 3.0 (src): ceph-12.2.8+git.1536505967.080f2248ff-2.15.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 http://bugzilla.suse.com/show_bug.cgi?id=1104331#c12 --- Comment #12 from Swamp Workflow Management <swamp@suse.de> --- openSUSE-RU-2018:3034-1: An update that has four recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1100101,1104331,1105251,1107857 CVE References: Sources used: openSUSE Leap 42.3 (src): ceph-12.2.8+git.1536505967.080f2248ff-15.1, ceph-test-12.2.8+git.1536505967.080f2248ff-15.1 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Swamp Workflow Management <swamp@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard| maint:planned:update | -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Maintenance Robot <maint-coord+maintenance_robot@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard| |ibs:running:14612:important -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104331 Maintenance Robot <maint-coord+maintenance_robot@suse.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Whiteboard|ibs:running:14612:important | -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com