Bug ID 1104331
Summary OSD Segmentation fault in thread_name:safe_timer
Classification openSUSE
Product openSUSE Distribution
Version Leap 42.3
Hardware Other
OS Other
Status NEW
Severity Normal
Priority P5 - None
Component Other
Assignee bnc-team-screening@forge.provo.novell.com
Reporter eugen.block@suse.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

We're running a Ceph Luminous Cluster on Leap 42.3:

ceph1:~ # rpm -qi ceph-common 
Name        : ceph-common
Version     : 12.2.5+git.1524775272.5e7ea8cf03
Release     : 2.1
Architecture: x86_64
Install Date: Thu May 24 15:14:56 2018
Group       : System/Filesystems
Size        : 18064537
License     : LGPL-2.1 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and
BSD-3-Clause and MIT
Signature   : RSA/SHA256, Fri Apr 27 01:48:28 2018, Key ID 98c97fe7324e6311
Source RPM  : ceph-12.2.5+git.1524775272.5e7ea8cf03-2.1.src.rpm
Build Date  : Fri Apr 27 01:10:29 2018
Build Host  : lamb72
Relocations : (not relocatable)
Vendor      : obs://build.opensuse.org/filesystems

Since a couple of days we're suddenly experiencing random segfaults leading to
failing OSDs. We didn't update any packages on our ceph servers, they ran
stable until these segfaults.
Sometimes the OSDs recover by themselves, sometimes a whole host goes down and
leaves a degraded cluster. The description in [2] sounds exactly like our issue
([1] is a duplicate of [2]). I'll spare you the details since the upstream bug
has lots of information.
There seems to be a backport [3] for Luminous, but it's still in progress
although the change has been reviewed. I just wanted to raise awareness for
this issue since it impacts our production cluster that ran stable for months
and we'd appreciate it if the updated packages would be available soon to fix
this issue.
Please let me know if any further information is needed.

[1] https://tracker.ceph.com/issues/23431
[2] https://tracker.ceph.com/issues/23352
[3] https://tracker.ceph.com/issues/26871


You are receiving this mail because: