[Bug 1052419] New: nfs-kernel-server and Btrfs should not be used together
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 Bug ID: 1052419 Summary: nfs-kernel-server and Btrfs should not be used together Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.3 Hardware: Other OS: openSUSE 42.3 Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: pg@suse.for.sabi.co.uk QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- * Btrfs does not compute checksums on write in the case of O_DIRECT IO, and does not provide "stable writes" either, because they are enabled only through the regular IO mechanism. * The 'nfs-kernel-server' does not use the regular IO mechanism either, so it has the same issues as O_DIRECT and should not be used to export Btrfs filesystems unless for read-only mode. * The 'nfs-ganesha' server is simpler and runs in user mode, and does not have these problems with Btrfs, but AFAIK it is not part of openSUSE. * Btrfs is the default installation filesystem for openSUSE, a choice that I think is very good ('bcachefs' might be soon a better choice). The overall effect is not that data will be corrupted (usually, even if the lack of "stable writes" is an issue) but that checksums will be missing on data written via 'nfs-kernel-server' and this makes the system less resilient and causes baffling warnings to appear in logs. My recommendation is to package the 'nfs-ganesha' server and use it as the default or only NFS server for openSUSE. https://btrfs.wiki.kernel.org/index.php/Gotchas#Direct_IO_including_NFS_acce... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c1 Peter Grandi <pg@suse.for.sabi.co.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P2 - High --- Comment #1 from Peter Grandi <pg@suse.for.sabi.co.uk> --- The issue may also happen with iSCSI. Additional link as to the issue and suggested 'nfs-ganesha' configuration: http://www.sabi.co.uk/blog/17-one.html?170424#170424 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c2 Martin Pluskal <mpluskal@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mpluskal@suse.com Assignee|kernel-maintainers@forge.pr |pg@suse.for.sabi.co.uk |ovo.novell.com | --- Comment #2 from Martin Pluskal <mpluskal@suse.com> --- As you are adjusting priorities I assume that you are working on bringing nfs-ganesha to openSUSE -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|nfs-kernel-server and Btrfs |[opinion] nfs-kernel-server |should not be used together |and Btrfs should not be | |used together -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffm@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c3 --- Comment #3 from Jeff Mahoney <jeffm@suse.com> --- And committing to maintain it afterward, of course. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c4 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS CC| |pg@suse.for.sabi.co.uk Flags| |needinfo?(pg@suse.for.sabi. | |co.uk) --- Comment #4 from Jeff Mahoney <jeffm@suse.com> --- (In reply to Peter Grandi from comment #0)
* Btrfs does not compute checksums on write in the case of O_DIRECT IO, and does not provide "stable writes" either, because they are enabled only through the regular IO mechanism.
* The 'nfs-kernel-server' does not use the regular IO mechanism either, so it has the same issues as O_DIRECT and should not be used to export Btrfs filesystems unless for read-only mode.
What is your analysis for this? The only reference you've provided is the btrfs wiki that you've edited yourself to reflect this position.
The overall effect is not that data will be corrupted (usually, even if the lack of "stable writes" is an issue) but that checksums will be missing on data written via 'nfs-kernel-server' and this makes the system less resilient and causes baffling warnings to appear in logs.
Reading through the NFSD code, it's pretty clear that it *doesn't* use direct I/O or anything like it. There are two write paths in btrfs: direct i/o or not. For it to be direct i/o, O_DIRECT needs to be set when the file is opened. NFSD doesn't do that, even if the remote client does. It goes through the buffered write path, that just doesn't happen to start at sys_write because it *shouldn't* since it's internal I/O. The pages are properly copied off via __btrfs_buffered_write well before any I/O is initiated or CRCs are computed. The case where racing direct i/o threads could potentially introduce a similar issue may exist, but that is a *very* specific case that should not be extrapolated to include "all NFS writes" without well-documented and correct analysis. In any case, even if the issue was as you described, the solution would be to correct the spurious warnings, not exchange the well-supported NFS server that we have many man-years of experience maintaining for a relatively new userspace project. There are use cases for Ganesha, but this isn't one of them. What does your analysis show? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c5 Peter Grandi <pg@suse.for.sabi.co.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED Flags|needinfo?(pg@suse.for.sabi. | |co.uk) | --- Comment #5 from Peter Grandi <pg@suse.for.sabi.co.uk> --- Hi, I am not interested in discussing this, nor am I an openSUSE user. The entry in the Btrfs wiki has all the relevant information. I only noticed trying an openSUSE install in a VM (to check out KDE...) that openSUSE continues rightly to use Btrfs but does not package NFS Ganesha, and made a friendly (to both openSUSE and Btrfs) preventive report. I'll close this report. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c6 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|FIXED |INVALID --- Comment #6 from Jeff Mahoney <jeffm@suse.com> --- No, you've posted your incorrect analysis and opinion as fact on the btrfs wiki in addition to filing a public bug report. That translates to FUD and now we have to clean up the mess. I'm re-closing this as INVALID. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1052419 http://bugzilla.opensuse.org/show_bug.cgi?id=1052419#c7 --- Comment #7 from Peter Grandi <pg@suse.for.sabi.co.uk> --- I did not report an issue as to "all NFS writes" and your quote as to this seems to me to be malicious, I think it cannot be a mistake because a simple string search on this page shows that the first appearance of those words is by you. I have the reported the *possibility* of data updates after checksums are computed, that is as a result of unstable writes, in both the case of direct IO and the NFS kernel server. In the case of direct IO this was confirmed by Josef Bacik, as per the link in the Btrfs wiki, and in the case of the NFS kernel server it was mentioned by a user on IRC. Additionally as a rule any superficial analysis cannot easily prove (or disprove) the absence of race conditions leading to unstable writes, and however rare race conditions and unstable writes on busy servers can result in extensive data corruption. You have the authority to ignore my friendly report and make accusations of "FUD" that to me seem purely paranoid, and these remain on the record under your name. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com