Mailinglist Archive: opensuse-bugs (4258 mails)

< Previous Next >
[Bug 1008107] Potential XFS Kernel bug - _xfs_buf_find: Block out of range
  • From: bugzilla_noreply@xxxxxxxxxx
  • Date: Sun, 29 Jan 2017 19:44:51 +0000
  • Message-id: <bug-1008107-21960-bZeab5PKqE@http.bugzilla.suse.com/>
http://bugzilla.suse.com/show_bug.cgi?id=1008107
http://bugzilla.suse.com/show_bug.cgi?id=1008107#c8

--- Comment #8 from David Taylor <david@xxxxxxxxxxxxxxxxxxxxx> ---
(In reply to Luis Rodriguez from comment #6)
(In reply to David Taylor from comment #4)
I have not been able to reproduce the problem.

I left the machine as-is for over a month to see if it would happen again (I
took it out of production use and let it sit with a different IP address).

If this was easy to reproduce I think we would have had much more reports
and have this fixed. Since this seems to have been a log rotation caused
issue I think the thing to do is focus on intensifying log rotation possible
mishaps.

I figure it probably is rare given how long I've had this and other Leap 42.1
boxes (and prior 13.x) running without issue. Not sure I would characterize it
as issue caused by log rotation other than it may simply have been the thing
that stepped on the mine. It certainly hasn't cropped up more than this one
time over the period that I've had this system running. FYI, my calling it
"production" is another convenience operational term, this is a lab/dev support
machine not a LOB critical box.

After running cleanly during that period of time, I patched it

Stick to one kernel at a time otherwise we cannot identify or fix things.

I get that, but please understand that I don't have spare hardware sitting
around (this is part of my work lab and I have to beg/borrow/steal to get what
I can). I was lucky to have some temporary slack on Hyper-V box that I could
stand up to replace this one for a while. I had to return the Hyper-V
resources back for another development effort, so given that the disk was not
showing damage, and the problem did not reoccur, I put the box back into
service. Patching to the latest security fix is part of returning to service;
especially so given this is the proxy server. I posted this problem at the
behest of dcurtisfra over on the OpenSuSE forums, otherwise I would have likely
just moved on without any posting here. Again, just trying to help, but it was
over three months before the first queries on this came in (originally posted
11/2 last year). That's not a criticism by the way, too many things with too
few resources, not everything gets attention. I understand that very well from
experience over the 20+ years in IT positions.



and put it
back into service. The problem has not cropped up since then. To be
honest, I'm not entirely certain what led up to that failure as it happened
at night during a quiet period. The only thing going on during that time
frame (for the most part) was logs being rolled by logrotate and logwatch
running reports as those run at 1AM.

:) Sounds like one thing to test against as I indicated in my last comment.

To correct the mounting /var issue, I had to flush the metadata. The fsck
hung otherwise. If there was something funky in it, it was destroyed in
attempting to get the volume back.

Oh well.

It is what it is. I wish I could have gotten more for you, but was trying to
get things back together.


I have the before and after syslog files saved off if they are of any use,
but otherwise I'm afraid I don't have much else I can provide at this point.

I think I've given enough info in to what we can do with what we have. If
you are not up to try real hard to reproduce I'm afraid this bug should be
closed as we just don't have enough lead to help address this. Will make a
note to try to draw up a program to tests against this in the future.

I will see what I can do as I mentioned in my response to your earlier
comments. Given the constraints on my time, it might be better to consider
closing it and if I happen to run into again, I'll post a new one and reference
this one.

--
You are receiving this mail because:
You are on the CC list for the bug.
< Previous Next >