Comment # 52 on bug 1063638 from Gabor Katona

(In reply to Oliver Kurz from comment #49)
> I work as a QA engineer at SUSE and try to help with resolving this bug.
> Currently I have the challenge to find a clear reproducer. So it would help
> very much if we find one scenario which we can automate the failure
> reproduction. Providing this to the development teams could help to fix the
> issue faster. With the help of openQA we can already automate a lot which
> are very realistic scenarios but I would appreciate some help now :)
> 
...
> Yes but only when we identified that the issues are actually different or at
> least the way how to reproduce are different. I would really appreciate if
> you could provide steps to reproduce this issue more easily.

Actually the development team has two really important tasks which can be split
into two bugs.

The first is to solve this CRITICAL bug. Rendering a system unusable (YES,
UNUSABLE) for several hours is more than critical. It is just like someone
would come and take the computer away for a few hours. No, restart does not
help, since the balancing continues in the emergency state, additionally you
risk data loss.

The second is just as important but more general. A fundamental system
component like a file system should NEVER eat up the CPU or render the system
unusable in any other way. Measures should be made to avoid such scenario
completely. Bugs are always coming and passing, but a filesystem should be
coded in a way not to make the system unusable by 100% CPU usage. It should
detect if a process, subcomponent, anything stucks, eats the CPU, etc.

Currently BTRFS is experimental, the sooner you accept it the faster you
provide a solution: SKIP BTRFS for opensuse.