On Tue, Aug 30, 2016 at 11:04 PM, Andrei Borzenkov
Отправлено с iPhone
30 авг. 2016 г., в 11:55, Lindsay Mathieson
написал(а): On 30/08/2016 6:35 PM, Richard Brown wrote: btrfs' RAID 5/6 implementation may not be one of the best ones out there, but all of them put data at risk thanks to the wonders of the write hole.
Incorrect, ZFS does not have the write hole.
And RAID6 has one big advantage over RAID10 - it can always lose up to two drives without loosing data. With RAID10 if two drives are from the same mirror then all data is lost.
Note that btrfs RAID10 allows single failed drive - due to allocation pattern it is unpredictable whether loss of *any* second drive will result in data loss.
Yes this could be a mkfs artifact. I found with four identically sized devices, the kernel code consistently allocates the same block group stripe to each device. But the mkfs allocate a different stripe initially. This puts two stripes on each device which causes this problem. It'd be nice to figure out whether the kernel allocator is in fact consistent (or just appears to be based on observation) and if so, use the same assignment logic for mkfs. At least with identically sized devices and both data and metadata chunks using a raid10 profile, it should look more like a conventional raid10. The gotcha though is that adding any device will invariably cause Btrfs kernel code to produce different allocation and now all bets are off again. Same if the metadata profile is single, DUP, or raid1 because in effect that makes the unallocated space different among the devices, which affects the block group allocation. So for the time being, in practice it is true that you can only for sure depend on 1 missing device with Btrfs raid1 or raid10, thus it is not scalable right now. -- Chris Murphy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org