On Tue, Mar 17, 2015 at 11:43 AM, Greg Freemyer
Resend without HTML
On Monday, March 16, 2015, Anton Aylward
wrote: On 03/16/2015 09:32 AM, Greg Freemyer wrote:
If you are going to use drives with 4KB physical sectors you need to avoid filesystems with 1KB blocks.
I'm sure there are many in the silent majority here who don't have the detailed knowledge, or, like me, have let it pass them by since they deal with other matters (such as users and applications), and wonder about one or another implication in that statement.
We've seen the example with mkfs or extFS for various block sizes, but what about other file systems?
And more to the point for most of us:
How can we tell about these things?
* What block size the disks are are
I suppose all late model disks are 4K :-)
* what size the file system blocks are for the file systems in use - Not just extFS but XFS, reiserFS, BtrFS - if the are not 4K what can we do about it?
* Some might ask about the other file systems in /proc/filesystems. - Does it matter with tmpfs? What about when it 'overflows' to /tmp?
* How can we tell if they are aligned? - if they are not, what can we do about it?
Either it matters, and these are the questions that emerge, or it doesn't.
It matters, but for most people, they do normal things and it works out fine. The developers have addressed their needs.
Right. Where it pops up is when physical sector size isn't passing through various layers. I think the libvirt stuff even has this right, but honestly I haven't checked to see if e.g. a qcow2 file on a 512e drive shows up inside the VM as a 512e drive. If blockdev is thwarted in learning the actual physical sector size, for whatever reason, then the mkfs utilities have no chance with defaults. At least on x86: ext[234] You probably don't have to worry about on anything larger than ~500MiB or whatever it is, you get a 4KiB block size. XFS is always a 4KiB block size, but the journal updates can have a smaller unit matching physical sector size. So if physical sector size is 4096 bytes but wrongly reported by blockdev as 512 bytes, then there could be a performance impact with heavy metadata writes causing a lot of RMW in the drive, just for the journal writes however. I'm not sure how noticeable it'd be. Btrfs is always a 4KiB block size, with a 16KiB node/leaf size by default, which is efficiently used (multiple file extent refs can be stored in one leaf as can inline data for small files.)
The defaults simply break in those conditions, so Felix has to concern himself with issues the rest of us never have come up.
Using VM's backed with qcow2 files would make this a lot easier and more space efficient. Resizes are both possible and easier.
If you look at the above you see he has a lot of partitions that start on a odd sector. That is bad today. Typical modern partitioning tools will align all partitions to 1MB. Felix needs to adopt a 21st century partitioning mechanism.
If the MBR or any EBR sector were to corrupt or read error, all subsequent partition info is lost, unless this information is separately backed up. GPT of course has two copies of everything, and are checksummed. -- Chris Murphy -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org