[opensuse] XFS and CRC on Leap 42.2
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1. Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode. Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/ Regards, -- Ciro Iriarte http://iriarte.it -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02.04.2017 21:56, Ciro Iriarte wrote:
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
Regards,
Just checked: crc=1 is default on Leap 42.2. I have some machines using XFS on Leap 42.2 without problems. Have you checked the logs and dmesg? You probably have a broken disk.
On Sun, 2 Apr 2017 22:34:01 +0200 Florian Gleixner <flo@redflo.de> wrote:
On 02.04.2017 21:56, Ciro Iriarte wrote:
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
Regards,
Just checked: crc=1 is default on Leap 42.2. I have some machines using XFS on Leap 42.2 without problems. Have you checked the logs and dmesg? You probably have a broken disk.
Have you checked the XFS mailing list? http://xfs.org/index.php/XFS_email_list_and_archives I'd expect any known problems to be on there and that to be the best place to resolve any unknown problems. Pay particular attention to their request for information to be supplied when reporting problems. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Yup, there seems to be a bug related to this, but nothing definitive. All the reports are for older versiones of the kernel :/ 2017-04-02 18:36 GMT-04:00 Dave Howorth <dave@howorth.org.uk>:
On Sun, 2 Apr 2017 22:34:01 +0200 Florian Gleixner <flo@redflo.de> wrote:
On 02.04.2017 21:56, Ciro Iriarte wrote:
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
Regards,
Just checked: crc=1 is default on Leap 42.2. I have some machines using XFS on Leap 42.2 without problems. Have you checked the logs and dmesg? You probably have a broken disk.
Have you checked the XFS mailing list? http://xfs.org/index.php/XFS_email_list_and_archives
I'd expect any known problems to be on there and that to be the best place to resolve any unknown problems. Pay particular attention to their request for information to be supplied when reporting problems.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-- Ciro Iriarte http://iriarte.it -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
2017-04-02 16:34 GMT-04:00 Florian Gleixner <flo@redflo.de>:
On 02.04.2017 21:56, Ciro Iriarte wrote:
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
Regards,
Just checked: crc=1 is default on Leap 42.2. I have some machines using XFS on Leap 42.2 without problems. Have you checked the logs and dmesg? You probably have a broken disk.
The weird thing is that according the man page, the default should be crc=0. Currently the FS sits on top of a dmraid array reshaped from R5 to R6 (still in progress). Current messages complain with: 13780.768541] XFS (md4): log record CRC mismatch: found 0x759a7a5e, expected 0x6131115a. [13780.768546] ffffc90002040200: 00 00 00 01 00 00 00 00 69 01 00 00 40 aa fd a7 ........i...@... [13780.768547] ffffc90002040210: 00 00 00 10 69 00 00 00 4e 41 52 54 28 00 00 00 ....i...NART(... The dmesg errors from the hang event where already purged (had a reboot since then). I'm waiting for the reshape to finish to do any further maneuver, but I would like to know if disabling CRC is good idea (or even an option). Regards, -- Ciro Iriarte http://iriarte.it -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am 03.04.2017 um 03:35 schrieb Ciro Iriarte:
2017-04-02 16:34 GMT-04:00 Florian Gleixner <flo@redflo.de>: ...
Just checked: crc=1 is default on Leap 42.2. I have some machines using XFS on Leap 42.2 without problems. Have you checked the logs and dmesg? You probably have a broken disk.
The weird thing is that according the man page, the default should be crc=0. Currently the FS sits on top of a dmraid array reshaped from R5 to R6 (still in progress).
Also rechecked: Systems installed with xfs as root filesystem have crc=1 set, but whenever i add another volume and create a filesystem, crc=0 is set. Seems that yast sets the option.
Ciro Iriarte wrote:
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
==== Personal opinion based: the crc option is bundled with the new option to store free space in a free-space inode. Unfortunately, there is some obfuscation and non-engineering-speak as to why these features aren't available separately. This prevents unit testing and benchmarking of the features separately, which is, _at least_, annoying, since the free-space inode feature is likely to improve performance on all mature xfs-disks, where free space is scattered over the disk and cannot be consolidated. XFS's defragmenter only can defragment files -- not directories (which get VERY fragmented), nor free space. To "prove" the free-space-inode option wasn't needed to improve perf apart from the crc option, it was tested on 'new' (non-fragmented disks), and was shown to provide no benefit or a slight slowdown (though not clearly a slowdown withing statistical norms). The excuse/reasoning given for bundling the options was that it would require less testing to only test the bundled product, and that providing them separately would lower overall quality. This goes against engineering "best practices" that have shown unit-testing and development to provide the the most reliable means for quality development. The idea that bundling multiple features is somehow more reliable than providing them separately for test and benchmarking is indicative of burying problems that, hopefully, may be hidden in the bundle that would otherwise be exposed. I have nothing but respect for the lead developers on XFS and can only conclude that they are being *told* or forced by corporate sponsors to do it this way for marketing-related reasons. Most likely being that the free-space-inode provides a clear performance benefit, overall, and that the crc option provides a clear performance *hit*, with a noticeable lowering of usability with 3rd party tools that are not "special cased" or "permitted" to modify the meta information. An example of this was something as simple as setting a disk label on such a disk. This altered the meta info, and corrupted the disk, making it unrecoverable (AFAIK, you can't choose to ignore the CRC checking in order to recover valuble data, while ignoring supposed meta-data problems). I ran into this problem during my normal workflow, where I have a standard shell file to set params on xfs-partitions during creation, and set the label. The setting of the label would leave the partitions in an unusable state. (At the time one couldn't proceed even in a read-only state). The label option was later added as a "special case" so the crc option wouldn't disable the disk. I see the crc+free-space-inode bundling being done to encourage a move to using the bundle, as I suspect in mature disks the free-space-inode feature would provide consistent performance gains -- probably enough to outweigh any performance deficits caused by the crc option.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
--- I was under the impression that crc-broken disks weren't to even be readable. If that's changed, I'm less alarmed by the option, though still annoyed by the free-inode option not being available independently -- even to allow testing/benchmarking, because it would likely create more demand for it separately.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
--- I'd avoid a critical reliability on the crc option in my systems for a while until they've undergone more field testing. FWIW, there is alot of activity in XFS to add some BTRFS-like features (which I suspect will be stable before BTRFS is equivalently so), that may add instability for early-adopters which often applies to OpenSuse (for better or not). However, I look forward to the the arrival of those features in XFS, though with some trepidation associated with cutting(bleeding) edge features (probably less w/XFS due to its more conservative development history). -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (4)
-
Ciro Iriarte
-
Dave Howorth
-
Florian Gleixner
-
L A Walsh