On Thu, Mar 19, 2015 at 5:53 AM, Per Jessen
Felix Miata wrote:
Greg Freemyer composed on 2015-03-16 09:32 (UTC-0400):
If the workload is small 1 page writes, then the penalty is huge. For every write a read/modify/write cycle has to be implemented.
Is it really? What kind of use case produces many writes of different small files in sequence or short order?
Almost any database server. Untarring the kernel source tarball? More about untar-ing a tarball: Even with files an average size of 1MB I believe there would be significant impact. Remember inodes are less than 4KB, so every file create involves inode updates. If the filesystem knows you have 4KB physical sectors, it tries hard to only send 4KB writes. If you have 1KB pages setup, it will send 1KB of inode updates at a time. Everyone of those will take an extra platter rotation. Basically, any work load where the average write is less than a full track will see a major penalty for sure if the writes are not properly sized and aligned to the physical sectors.. (Often 1MB/track is a reasonable guesstimate. Again it varies by where on the drive you are writing.)
Perhaps a busy email server.
I can tell you parsing 50GB of PST files on rotating rust can take days whereas the same workload goes to hours with SSD. (a few million seeks really adds up and most rotating drives can only do hundreds of random i/o's per second.) I would expect doing the same thing on a rotating drive with 1KB pages, but 4KB sectors would take almost twice as long. (ie. a full week?) (Clearly, I do this work on SSDs when I can.) With my tool of choice, every email has to be read out of the PST, dumped into an EML, then a follow-on process reads every EML and adds the metadata to a database. Lots of the EML files are small and all of the database updates are small. Take a 5 KB EML is a reasonable example of poor situation: A 5 KB write is a full 4KB sector and a partial sector. The full sector is not a problem. It is a pure write. The problem is the 1KB page at the end of the file. A 4KB sector drive will NOT allow that to go straight to disk. Instead it has to read the current contents of the 4KB physical sector, modify the first KB of it, then write the full sector back out. The reason for the read / modify / write cycle is the ECC information in the header / footer of the physical sector. If the drive allowed a partial physical sector write, the ECC data would be immediately out of date. Therefore, the drive only ever writes full physical sectors. So for that 1KB at the end of the EML file, the drive has to: read the physical sector: wait 8.3 msecs for the platter to rotate around; write the updated physical sector.). For ease of calculation, lets round 8.3 msecs to 10msecs. If you have a million 5KB emails to create on disk, that's 10,000 seconds of time wasted waiting for the disk to rotate around. bad enough, but I forgot to say you need to update a million inodes as well. Those will also not be full physical sector updates, so double that to 20,000 wasted seconds. That's roughly 6 wasted hours for creating a million files. In my business life, I kick off processes often that create a 1 million write workload of relatively small writes. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org