Re: [opensuse] Fwd: Performance penalty for mis-aligned partitions on a 4K physical sector drive

17 Mar 2015

      On Tue, Mar 17, 2015 at 11:43 AM, Greg Freemyer  wrote:
...
Resend without HTML
On Monday, March 16, 2015, Anton Aylward  wrote:
...
On 03/16/2015 09:32 AM, Greg Freemyer wrote:
...
If you are going to use drives with 4KB physical sectors you need to
avoid filesystems with 1KB blocks.
I'm sure there are many in the silent majority here who don't have the
detailed knowledge, or, like me, have let it pass them by since they
deal with other matters (such as users and applications), and wonder
about one or another implication in that statement.
We've seen the example with mkfs or extFS for various block sizes, but
what about other file systems?
And more to the point for most of us:
How can we tell about these things?
* What block size the disks are are
I suppose all late model disks are 4K :-)
* what size the file system blocks are for the file systems in use
  - Not just extFS but XFS, reiserFS, BtrFS
  - if the are not 4K what can we do about it?
* Some might ask about the other file systems in /proc/filesystems.
  - Does it matter with tmpfs?  What about when it 'overflows' to /tmp?
* How can we tell if they are aligned?
  - if they are not, what can we do about it?
Either it matters, and these are the questions that emerge, or it doesn't.
It matters, but for most people, they do normal things and it works
out fine.  The developers have addressed their needs.
Right. Where it pops up is when physical sector size isn't passing
through various layers. I think the libvirt stuff even has this right,
but honestly I haven't checked to see if e.g. a qcow2 file on a 512e
drive shows up inside the VM as a 512e drive.

If blockdev is thwarted in learning the actual physical sector size,
for whatever reason, then the mkfs utilities have no chance with
defaults.

At least on x86:

ext[234] You probably don't have to worry about on anything larger
than ~500MiB or whatever it is, you get a 4KiB block size.

XFS is always a 4KiB block size, but the journal updates can have a
smaller unit matching physical sector size. So if physical sector size
is 4096 bytes but wrongly reported by blockdev as 512 bytes, then
there could be a performance impact with heavy metadata writes causing
a lot of RMW in the drive, just for the journal writes however. I'm
not sure how noticeable it'd be.

Btrfs is always a 4KiB block size, with a 16KiB node/leaf size by
default, which is efficiently used (multiple file extent refs can be
stored in one leaf as can inline data for small files.)
...
The defaults simply break in those conditions, so Felix has to concern
himself with issues the rest of us never have come up.
Using VM's backed with qcow2 files would make this a lot easier and
more space efficient. Resizes are both possible and easier.
...
If you look at the above you see he has a lot of partitions that start
on a odd sector.  That is bad today.  Typical modern partitioning
tools will align all partitions to 1MB.  Felix needs to adopt a 21st
century partitioning mechanism.
If the MBR or any EBR sector were to corrupt or read error, all
subsequent partition info is lost, unless this information is
separately backed up. GPT of course has two copies of everything, and
are checksummed.

-- 
Chris Murphy
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org