Comment # 11 on bug 1147412 from
(In reply to Coly Li from comment #10)
> (In reply to Coly Li from comment #8)
> > I find a bug in bcache btree code which may cause a dirty btree node being
> > lack of journal protection from a power failure. The result might be an
> > inconsistent and broken btree node, and trigger a similar panic as the port
> > listed.
> > 
> > Now I am working on a fix and will test it.
> > 
> > Coly Li
> > 
> > P.S the fix looks like this,
> > 
> > commit d48baef4543246ef910b262959ae89c5a6d197f7
> > Author: Coly Li <colyli@suse.de>
> > Date:   Wed Sep 25 22:16:33 2019 +0800
> > 
> >     bcache: fix fifo index swapping condition in journal_pin_cmp()
> > 
> 
> This patch is invalid, the original code is correct. So far I don't find any
> suspicious place from the code.
> 
> Now I am doing bcache backport for a series fixes, let's see weather these
> fixes may be a bit helpful.

I guess, it is because the in-memory btree node was not flushed in time, and
the on-SSD btree node got corrupted. This is just my guess, and let me explain
how I think in this way.

Before v5.3, there is a problem that when I/O is busy, it is possible a race
will happy when flushing a dirty btree node onto the SSD. If this happens,
undefined behavior will happen.

This bug was fixed in following 2 patches,
commit 91be66e1318f ("bcache: performance improvement for btree_flush_write()")
commit 2aa8c529387c (���bcache: avoid unnecessary btree nodes flushing in
btree_flush_write()")

My suggestion is, try to backup the data, or make sure the data on backing
device is consistent. Then update to latest tumbleweed kernel or Linux stable
kernel, then re-make the bcache devices.

This is only my guess, so far this is the only related clue shows up in my
brain.

Coly Li


You are receiving this mail because: