Re: [opensuse-factory] BTRFS and Tumbleweed: Is it really stable to be considered default?
Hi Mauricio, Em qui, 2016-09-15 às 10:48 -0300, Mauricio Barbosa escreveu:
Did you checked your snapshots that snapper has taken? Maybe here is your problem of "lack of space". If it is the source of the problem delete the ones that is not useful...
snapper list or snapper -c $YOUR_CONFIG list
And if your system is broken after a "zypper dup" why not use the amazing "snapper rollback" feature that OpenSuse folks provided to you?
Yes, actually I reinstalled the system and after just one day I started to see the problem again (only 8 snapshots). This is something weird that no one was able to neither provide me a good explanation about what it is happening nor a workaround until it is fixed. The good side is that yesterday I figured out one way to "live with it". I will create in a dummy directory 50 files of 3GiB each. Hence, when I see ENOSPC, I just have to delete one of those files :) It is ugly, but I think it will work :D Best regards, Ronan Arraes -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 15, 2016 at 8:03 AM, Ronan Arraes Jardim Chagas <ronisbr@gmail.com> wrote:
Hi Mauricio,
Em qui, 2016-09-15 às 10:48 -0300, Mauricio Barbosa escreveu:
Did you checked your snapshots that snapper has taken? Maybe here is your problem of "lack of space". If it is the source of the problem delete the ones that is not useful...
snapper list or snapper -c $YOUR_CONFIG list
And if your system is broken after a "zypper dup" why not use the amazing "snapper rollback" feature that OpenSuse folks provided to you?
Yes, actually I reinstalled the system and after just one day I started to see the problem again (only 8 snapshots). This is something weird that no one was able to neither provide me a good explanation about what it is happening nor a workaround until it is fixed.
Good is relative. The explanation thus far is they're working on it, and it is mainly due to a metadata accounting problem. They haven't figured out why it's happening, but Jeff reported on linux-btrfs@ that he can consistently hit it with the btrfs/022 xfstest. He and Josef are presumably sorting through this. File systems are really hard, not least of which is they're really non-determinstic. From the moment they're created they change state as they get used. So there is some combination of things that's managed to get you into this state, and now you're stuck in it with this instance of the file system. That it's happened again after the file system is created new, is pretty damn weird from my perspective because I'd expect others to hit it also. But I don't think it's just you or Jeff wouldn't be hitting it consistently with btrfs/022. So basically, you're just going to have to wait until they have something to report and a fix. And/or reinstall and use XFS or something if you can't wait. *shrug* that is an explanation. It's not a great one for you, but that's where things are at. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hi Chris, Em qui, 2016-09-15 às 10:37 -0600, Chris Murphy escreveu:
So basically, you're just going to have to wait until they have something to report and a fix. And/or reinstall and use XFS or something if you can't wait. *shrug* that is an explanation. It's not a great one for you, but that's where things are at.
Yes, that is precisely why I said that no good explanation was provided yet :) As of now, I am not even sure how safe is the data in the BTRFS partition. Since I don't know what is happening and what is causing it, how can I be 100% sure that my data will be safe? Hence, all my sensible data was moved to EXT4. I mentioned a workaround in btrfs mailing list that I think can help me, so I will continuous to use BTRFS in root. Actually, I am only using BTRFS now just because I embrace FOSS and I am an openSUSE member. Hence, I see as my responsibility to help fixing it. Unfortunately, I do not have technical skills to dig the source code, but I can help sending these huge e-mails with all the debug data I can collect :) In the mean time, I think it is wise to verify how stable BTRFS is in Tumbleweed to avoid other users to hit the same problem, which can be very bad if you don't know what is happening or what to do to fix it. This is why I started this thread. Best regards, Ronan Arraes -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 15, 2016 at 11:05 AM, Ronan Arraes Jardim Chagas <ronisbr@gmail.com> wrote:
Hi Chris,
Em qui, 2016-09-15 às 10:37 -0600, Chris Murphy escreveu:
So basically, you're just going to have to wait until they have something to report and a fix. And/or reinstall and use XFS or something if you can't wait. *shrug* that is an explanation. It's not a great one for you, but that's where things are at.
Yes, that is precisely why I said that no good explanation was provided yet :)
As of now, I am not even sure how safe is the data in the BTRFS partition. Since I don't know what is happening and what is causing it, how can I be 100% sure that my data will be safe? Hence, all my sensible data was moved to EXT4. I mentioned a workaround in btrfs mailing list that I think can help me, so I will continuous to use BTRFS in root.
Seeing as you're in an edge case, the file system could implode at any time. Chances are it'd just go read only and you'd still be able to get your data off the volume, even if the file system is broken beyond repair.
Actually, I am only using BTRFS now just because I embrace FOSS and I am an openSUSE member. Hence, I see as my responsibility to help fixing it. Unfortunately, I do not have technical skills to dig the source code, but I can help sending these huge e-mails with all the debug data I can collect :)
I think your contribution has been helpful.
In the mean time, I think it is wise to verify how stable BTRFS is in Tumbleweed to avoid other users to hit the same problem, which can be very bad if you don't know what is happening or what to do to fix it. This is why I started this thread.
It's important to not extrapolate one's own experience. If I were to do the same thing based on my experience with Btrfs, I'd say make it the default everywhere, because I've had so few problems (that weren't deliberately self created for the purpose of testing). The truth is it depends on the use case and user preference. In the unlikely event the file system has a problem it can't fix at mount time, Btrfs fixing quickly becomes non-trivial, non-obvious, and not for a typical user. Even in the current version of btrfs-progs the fsck with --repair is listed as do not use except under advisement by a developer, and as being dangerous. And then there are a bunch of other repairs, zero-log, chunk-recover, super-recover, init-csum-tree, init-extent-tree, and so on. It's hard to know what to do and in what order, more difficult than any other file system on Linux. So it goes from easy to WTF very quickly. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 15 September 2016 at 20:41, Chris Murphy <lists@colorremedies.com> wrote:
Even in the current version of btrfs-progs the fsck with --repair is listed as do not use except under advisement by a developer, and as being dangerous. And then there are a bunch of other repairs, zero-log, chunk-recover, super-recover, init-csum-tree, init-extent-tree, and so on. It's hard to know what to do and in what order, more difficult than any other file system on Linux. So it goes from easy to WTF very quickly.
You mean unless you read the documentation? https://btrfs.wiki.kernel.org/index.php/Btrfsck -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 15, 2016 at 12:49 PM, Richard Brown <RBrownCCB@opensuse.org> wrote:
On 15 September 2016 at 20:41, Chris Murphy <lists@colorremedies.com> wrote:
Even in the current version of btrfs-progs the fsck with --repair is listed as do not use except under advisement by a developer, and as being dangerous. And then there are a bunch of other repairs, zero-log, chunk-recover, super-recover, init-csum-tree, init-extent-tree, and so on. It's hard to know what to do and in what order, more difficult than any other file system on Linux. So it goes from easy to WTF very quickly.
You mean unless you read the documentation?
Easy = you mount it and it fixes itself. WTF = you go read multiple pages of documentation, and also ask on the list, how to fix your problem because depending on the problem you have, depends on which tool with which options you use to fix it. The btrfs check --repair tool is very clearly not intended for regular users, not least of which is it's still marked as do not use / dangerous in three places in all versions of the documentation. No other file system has an fsck that's explicitly not fail safe. People continue to lose their file systems due to bugs that creep into the fsck tool. There is no in between. As soon as you have a Btrfs that will not mount, or only mounts read-only, unlike all other file systems, the fsck is the last choice on the list, not the first choice on the list. Having to read documentation to learn that fact is the opposite of easy. Easy is "oh just use the fsck" like every other file system. Nope, not Btrfs, that has a good chance of blowing things up. And BTW a conversation about this very issue has been going on over at linux-btrfs@ for the last week. It's widely regarded as a problem. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 15, 2016 at 1:12 PM, Chris Murphy <lists@colorremedies.com> wrote:
People continue to lose their file systems due to bugs that creep into the fsck tool.
Recent example, guess who reported the bug? https://bugzilla.kernel.org/show_bug.cgi?id=155791#c22 -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (3)
-
Chris Murphy
-
Richard Brown
-
Ronan Arraes Jardim Chagas