[opensuse] any real-life experiences with bache / dm-cache and ssd ?
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share? -- Per Jessen, Zürich (1.1°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 11/02/2016 08:23, Per Jessen a écrit :
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
yes, and not that good http://dodin.info/wiki/pmwiki.php?n=Doc.UsingBcache too complicated to the fuss jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
jdd wrote:
Le 11/02/2016 08:23, Per Jessen a écrit :
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
yes, and not that good
http://dodin.info/wiki/pmwiki.php?n=Doc.UsingBcache
too complicated to the fuss
Hmm, I was hoping for some details about improved IO rates and such. I have some aging storage servers and I was wondering if I could improve the IO rates by adding a SSD cache instead of replacing the controllers or even the whole thing. -- Per Jessen, Zürich (3.0°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 11/02/2016 10:55, Per Jessen a écrit :
jdd wrote:
Le 11/02/2016 08:23, Per Jessen a écrit :
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
yes, and not that good
http://dodin.info/wiki/pmwiki.php?n=Doc.UsingBcache
too complicated to the fuss
Hmm, I was hoping for some details about improved IO rates and such. I have some aging storage servers and I was wondering if I could improve the IO rates by adding a SSD cache instead of replacing the controllers or even the whole thing.
my machine was (still is) a laptop with a 24Gb ssd on the mother board, in addition to the 500Gb hard drive. initially it was used by the intel fast start and widows. I cut it on two partitions, one for openSUSE. but one have to build the system around bcache, the disk can't anymore be read is something goes wrong without complicated things (my page). In fact I very soon decided to erase this disk, discard all the windows support for it (I keep windows on this machine for some rare uses) and install openSUSE on it, 24Gb is really enough. This gives me a really fast openSUSE. I now instal ssd's on all my computers. The Dell Optiplex 755 (pretty old dualcore), with 6Gb ram and two ssd's (one 120Gb, the other 240Gb) is now really fast. so if you can, instal your system on ssd :-) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Don, 2016-02-11 at 10:55 +0100, Per Jessen wrote:
jdd wrote:
Le 11/02/2016 08:23, Per Jessen a écrit :
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
yes, and not that good
http://dodin.info/wiki/pmwiki.php?n=Doc.UsingBcache
too complicated to the fuss
Hmm, I was hoping for some details about improved IO rates and such. I have some aging storage servers and I was wondering if I could improve the IO rates by adding a SSD cache instead of replacing the controllers or even the whole thing.
I have been using bcache for a couple of years, and I'm very happy with it. My setup is a 4TB md RAID1 volume with a 120GB bcache on top of it, which in turn is used as an LVM physical volume. That is, about 3% of the total PV capacity can be cached, and all LVs share the cache. Because the cache is small, it only serves to speed up random-access reads of the active working set, and to speed up most writes (writeback cache mode). The net effect is that, even though all data (including user home directories) is stored on a pair of spinning disks, the cache manages to hide that fact well enough to make the machine feel very snappy in use - it's close enough to a real SSD for my needs. Sorry, I can't give any concrete numbers to back up my experiences. \Olav -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/13/2016 07:51 PM, Olav Reinert wrote:
Because the cache is small, it only serves to speed up random-access reads of the active working set, and to speed up most writes (writeback cache mode).
The net effect is that, even though all data (including user home directories) is stored on a pair of spinning disks, the cache manages to hide that fact well enough to make the machine feel very snappy in use - it's close enough to a real SSD for my needs.
Sorry, I can't give any concrete numbers to back up my experiences.
Its worth considering some aspects of the logic of caching. What might be an extreme case is one I saw in the last century and discussed at the North York UNIX Users group, back in the days when SCO UNIX was the dominant system for SMB and personal as an alternative to Microsoft, certainly if one wanted to support a multi-user environment or run a "proper" database system. One SMB manager described the use of a disc accelerator board, which was really a cache. It too had write-though. The write-though basically released the system call immediately and wrote the block when convenient. Now historically, one of the changes from classical V6 UNIX to classical V7 UNIX back in the 1970s was that the file system changed the order of the writes to preserve the structural integrity. Data was written to the disk before the metadata and pointers. I won't go into why that ensured that FSCK could work properly, google for that yourself. But it is important that it was done that way. The use of this board could not guarantee that. A CompSci type from UofT warned of this. A couple of months later the SMB guy reported corrupted and unrecoverable disk. There was a "I told you so" feeling. Committing to disk, and committing in the right order is important. A write cache that 'accelerates' by releases the system call while deferring the actual transfer from the cache to storage simply cannot guarantee the integrity of the file system in the event of, for example, a power loss. Certainly a SSD, which will retain content, can, if the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me. Caching the read is also something that should be looked at critically. All in all the internal algorithms for memory allocation and mapping can probably make better use of memory than if it is externalized such as the memory on those accelerator boards. This is true for a number of reason but most of all that the OS knows better what it needs to cache based on the process activity. In the case of "virtual memory" more memory means the page-out happens less, retrieval from the page-out queue is in effect a cache operation. Late model *NIX uses memory mapping for just about all open files, or can. Dynamic linking, along with shared libraries, are opened memory mapped files with a load-on-demand. Such pages may be on the out queue but they never need to be re-written. Adding RAM is often the cheapest way to improve performance with a *NIX system that can make use of it. Again there are a few "if" in that. This may not be your bottleneck. Profiling is important. You may also come up against hardware limits. Many PC motherboards have too few memory slots or their chip set won't allow the use of larger capacity memory cards. But before you spend >$50 on a SSD http://www.amazon.com/SanDisk-6-0GB-2-5-Inch-Height-SDSSDP-064G-G25/dp/B007ZWLRSU/ref=sr_1_3?ie=UTF8&qid=1455464006&sr=8-3&keywords=64g+ssd consider spending the same on RAM http://www.amazon.com/HP-500662-B21-SDRAM-Server-Memory/dp/B0029L0P9Y/ref=sr_1_4?s=pc&ie=UTF8&qid=1455464146&sr=1-4&keywords=ddr3+ram I'm sure you can find cheaper SSD and more expensive RAM. Speed, capacity, scale, all factor in. Data pages, mapped files, are a yes-no-maybe situation. A program might need to read config files at start-up, but those are never read again and never written. Caching them is a waste of time, but how do you tell the cache that? The interface to the 'rotating rust' is also a cache in read mode, in that it reads a whole track at once. File system abstractions don't always permit this to be useful. Its all very well to have 4K pages and store files as contiguous blocks, but in reality a file access involves metatadata access as well, and we're talking about a multi-user system with asynchronous demands. Sometime I think that the design of file system such as ReiserFS where small files can have the data in the metadata block make sense, especially when dealing with small start-up config files. Anyway, we are back to the situation where only the application knows that this "data" file is a "read-once then discard" start-up config. even the OS doesn't know there is no need to cache the mapped pages. They don't even need to be put on the out queue, they can be marked 'free' the moment the file is closed. Oh? Perhaps they are? But how does the OS communicate this to the SSD bcache or the disk controller buffer? The discipline of the hierarchy of organization and standardization of interfaces and separation of function may have led to many good things, but the old day of the IBM1040 and the like where the application had control all the way down the stack to the disk had a lot of optimization that is now simply not possible because the detailed information isn't available from the higher levels all the way down. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Son, 2016-02-14 at 10:54 -0500, Anton Aylward wrote:
On 02/13/2016 07:51 PM, Olav Reinert wrote:
Because the cache is small, it only serves to speed up random -access reads of the active working set, and to speed up most writes (writeback cache mode).
The net effect is that, even though all data (including user home directories) is stored on a pair of spinning disks, the cache manages to hide that fact well enough to make the machine feel very snappy in use - it's close enough to a real SSD for my needs.
Sorry, I can't give any concrete numbers to back up my experiences. [cut] One SMB manager described the use of a disc accelerator board, which was really a cache. It too had write-though. The write-though basically released the system call immediately and wrote the block when convenient.
Now historically, one of the changes from classical V6 UNIX to classical V7 UNIX back in the 1970s was that the file system changed the order of the writes to preserve the structural integrity. Data was written to the disk before the metadata and pointers. I won't go into why that ensured that FSCK could work properly, google for that yourself. But it is important that it was done that way. The use of this board could not guarantee that. A CompSci type from UofT warned of this. A couple of months later the SMB guy reported corrupted and unrecoverable disk. There was a "I told you so" feeling.
Committing to disk, and committing in the right order is important.
A write cache that 'accelerates' by releases the system call while deferring the actual transfer from the cache to storage simply cannot guarantee the integrity of the file system in the event of, for example, a power loss. Certainly a SSD, which will retain content, can, if the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me. [cut]
I agree that caching can be non-trivial. As for write caching, bcache also supports writethrough. With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot. Regards, Olav -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 11:34 AM, Olav Reinert wrote:
Certainly a SSD, which will retain content, can, if
the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me.
I agree that caching can be non-trivial. As for write caching, bcache also supports writethrough.
I'd prefer to use that :-)
With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot.
I hope so! IIR there was even one rather more savvy cache manufacturer on the SCO days who had a batter backup on the cache board. I don't think they fared any better. Call me a cynic, but I still think that the order of the writes, or honouring the order that the OS and file system designers think the ordering of the writes should be, does matter. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
I agree that caching can be non-trivial. As for write caching, bcache also supports writethrough.
I'd prefer to use that :-)
LSI has had small-ish caches (512MB-1G) for years on their RAID boards that had the option of either -- as well as an auto-switching between the two based on the onboard battery's status (it's frequently tested and temperature monitored) -- when it goes through a complete recondition & test cycle, it disables the WB automatically, and re-enables it when the test is done and the battery is recharged. I usually use the forced-WB & on-disk caches as well due to the system being on a UPS -- which is usually good for 30-60 minutes before it will try to force an "orderly shutdown". (For the rare, longer, outages, I've I can run generators, but it's been years since I really needed them). As for file system safety -- xfs has the concept of "write-barriers" to allow enforcing write-ordering for safety -- that default to assuming they are not present (the safe option if you don't have battery backup and/or UPS backed power). I set the barrier option on my "work disk" where there is alot of writing (like "/home" or my "squid-cache", that either are backed up daily (/home) or wouldn't cause too much pain if they had to be recreated (cache partitions). Of course that's not something you can do on most new, default linux setups that like to put everything on 1 partition.
I hope so! IIR there was even one rather more savvy cache manufacturer on the SCO days who had a batter backup on the cache board.
Dunno about the one you mention, but LSI still does -- and goes through extensive methods to ensure disk safety -- including configurable background "Patrol" reads to look for decayed disk bits that can be rebuilt from RAID redundancies.
I don't think they fared any better. Call me a cynic, but I still think that the order of the writes, or honouring the order that the OS and file system designers think the ordering of the writes should be, does matter.
They are. That's why more modern file systems are giving the option of write-through or write-back and barriers to the users that allows the potential of safety and speed depending on your HW and needs. While XFS was the first with such options, I think ext3/4 and jfs (among others) also have such options.
Now historically, one of the changes from classical V6 UNIX to classical V7 UNIX back in the 1970s was that the file system changed the order of the writes to preserve the structural integrity. Data was written to the disk before the metadata and pointers. I won't go into why that ensured that FSCK could work properly, google for that yourself. But it is important that it was done that way.
Interesting, since, the file-system structure, these days, on file-systems that don't need or have an "fsck", is now held in the meta-data that is written to the journal -- before data is committed to diskmeta data. Writing the data to disk 1st, resulted in greater data savings at the cost of different people getting other people's data in their files. That was considered bad for "security", so now disk-blocks are meta-allocated 1st -- and only when valid data has been written into them can they be read-directly. Otherwise, newly-allocated space is flagged as needing to be "zeroed" before a user can read it. If they "write" to it first, (and never read it, the blocks don't need zeroing). But that's why the barriers exist -- they need to record the allocated blocks and mark them as needing cleaning 1st, which they do in the journal, so the next time the file system is mounted, it first "replays" any journal part so the FS-structure on disk remains integrous. Writing the data 1st leads to lots of problems with blocks marked used, but not attached to anything, or marked allocated, but not cleaned yet, so users would come up with fragments (or complete copies) of other people's sensitive data. -l -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 02:55 PM, Linda Walsh wrote:
Interesting, since, the file-system structure, these days, on file-systems that don't need or have an "fsck", is now held in the meta-data that is written to the journal -- before data is committed to diskmeta data.
As someone commented, we've come a long way with Linux and small systems since the SCO days! And yes, journalling is one of the, to my mind, most important steps along the way. Its worth thinking about causes and economics. When the IBM PC first came out IBM did not consider it a serious project, they though the demand would be for about a quarter of a million units! Even in the SCO days any of us who took computers seriously, saw PCs as a harbinger of the futures, those of us who had grown up with minis, PDP-8/PDP-11 or Data General, Burroughs, or the smaller IBM and HP machines at university found it easy enough to segue into using the AT and later generation PCs as replacements. SCO was just one of a number of vendors of UNIX/86 in the mid to late 1980s. Other microprocessors came and went ... Xenix, Unity, CoData, Onyx, Cromix, TriData, UniSoft, Appolo and of course SCO's probable competitor that I used extensively, Interactive. Some were more business-like than others. Some were more about technocrats wanting to "do" UNIX. Somewhere along the line, after about 1995, small PCs became serious business for SUN, IBM, Oracle. Linux had a lot to to with that :-) PCs became 'servers' as the Internet and the WWW rose. Looking back we can see the influence that "the Internet runs on Linux". That demanded failsafe mechanisms like robust file systems, fail-over, management tools. SUN and IBM pioneered a lot in this area. SUN growing upwards, IBM dealing with applying its expertise 'downwards". Linda's description of XFS, of journalling and write barrier techniques is quite to the point, but how we got there is a fascinating story of the R&D coming from surprising places. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2016-02-14 17:34, Olav Reinert wrote:
On Son, 2016-02-14 at 10:54 -0500, Anton Aylward wrote:
With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot.
Would fsck work with a dirty cache? Ie, does fsck know that it has to read both the HD and the SSD to do its job properly? -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 02/14/2016 12:43 PM, Carlos E. R. wrote:
On 2016-02-14 17:34, Olav Reinert wrote:
On Son, 2016-02-14 at 10:54 -0500, Anton Aylward wrote:
With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot.
That's Olav, not me, that you're quoting.
Would fsck work with a dirty cache?
Cache where? What is "dirty' in this context? Is there an IOCTl that lets the sysadmin force the bcache to flush to disk? An equivalent of doing a "sync(2)" possibly via CLI "sync(1)". Ah, memories of "sync;sync;sync;shutdown"
Ie, does fsck know that it has to read both the HD and the SSD to do its job properly?
I'm not sure that's a valid question. oh, wait, yes there are (or should be/may be) really three 'raw' devices here. One for the bcache, one for the rest of the hard drive (backing device) and one for the logical drive regardless of any caching. If the fsck is done on the logical raw rather than the physical raw, then it should work. your question makes sense only if there is no 'logical raw' and the raw access to bcache device and the unflushed backing media are all that exist. Which gets back to the issue of forcing bcache to flush to the backing device. I suspect that its not as complicated as this. But I'd still expect a way to force the flush of the cache. Hmm, Googling for "bcache force flush" https://wiki.archlinux.org/index.php/Bcache#Force_flush_of_cache_to_backing_... -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2016-02-14 19:04, Anton Aylward wrote:
On 02/14/2016 12:43 PM, Carlos E. R. wrote:
On 2016-02-14 17:34, Olav Reinert wrote:
On Son, 2016-02-14 at 10:54 -0500, Anton Aylward wrote:
A write cache that 'accelerates' by releases the system call while deferring the actual transfer from the cache to storage simply cannot guarantee the integrity of the file system in the event of, for example, a power loss. Certainly a SSD, which will retain content, can, if the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me.
That paragraph was missing.
With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot.
That's Olav, not me, that you're quoting.
Would fsck work with a dirty cache?
Cache where? What is "dirty' in this context?
data or metadata in the SSD, waiting to be transferred to the rotating disk. Power failure, crash, whatever, and the thing is dirty.
Is there an IOCTl that lets the sysadmin force the bcache to flush to disk? An equivalent of doing a "sync(2)" possibly via CLI "sync(1)".
Dunno.
Ah, memories of "sync;sync;sync;shutdown"
Ie, does fsck know that it has to read both the HD and the SSD to do its job properly?
I'm not sure that's a valid question. oh, wait, yes there are (or should be/may be) really three 'raw' devices here. One for the bcache, one for the rest of the hard drive (backing device) and one for the logical drive regardless of any caching.
Yes. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On Son, 2016-02-14 at 18:43 +0100, Carlos E. R. wrote:
On 2016-02-14 17:34, Olav Reinert wrote:
With writeback, bcache releases system calls as soon as the blocks involved have been written to the SSD. Flushing to the underlying storage happens later, asynchronously. Importantly, bcache guarantees that the system can be safely shut down even if the cache is dirty; it will resume flushing at the next reboot.
Would fsck work with a dirty cache?
Ie, does fsck know that it has to read both the HD and the SSD to do its job properly?
Yes, it does: Assuming /dev/sda is an SSD, and /dev/sdb is a spinning harddisk, and that bcache is set up to cache sdb on sda. Then you will have a /dev/bcache0 device to represent the caching assembly, and that's the device on which the file system is created, not sda or sdb. Consequently, /dev/bcache0 is the device to use in /etc/fstab, and when running fsck. bcache writes special superblocks on both sda and sdb to identify the devices involved in a bcache assembly. (This allows automatic assembly of /dev/bcache0 at boot time via udev rules.) Hence, running fsck directly on sda or sdb won't work at all because it won't find any file system superblocks. \Olav -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun, Feb 14, 2016 at 10:54 AM, Anton Aylward
On 02/13/2016 07:51 PM, Olav Reinert wrote:
Because the cache is small, it only serves to speed up random-access reads of the active working set, and to speed up most writes (writeback cache mode).
The net effect is that, even though all data (including user home directories) is stored on a pair of spinning disks, the cache manages to hide that fact well enough to make the machine feel very snappy in use - it's close enough to a real SSD for my needs.
Sorry, I can't give any concrete numbers to back up my experiences.
Its worth considering some aspects of the logic of caching.
What might be an extreme case is one I saw in the last century and discussed at the North York UNIX Users group, back in the days when SCO UNIX was the dominant system for SMB and personal as an alternative to Microsoft, certainly if one wanted to support a multi-user environment or run a "proper" database system.
One SMB manager described the use of a disc accelerator board, which was really a cache. It too had write-though. The write-though basically released the system call immediately and wrote the block when convenient.
Now historically, one of the changes from classical V6 UNIX to classical V7 UNIX back in the 1970s was that the file system changed the order of the writes to preserve the structural integrity. Data was written to the disk before the metadata and pointers. I won't go into why that ensured that FSCK could work properly, google for that yourself. But it is important that it was done that way. The use of this board could not guarantee that. A CompSci type from UofT warned of this. A couple of months later the SMB guy reported corrupted and unrecoverable disk. There was a "I told you so" feeling.
When I was young a dinosaur ate my server, so now I always keep them in dinosaur proof cages. I know, you say dinosaurs are extinct, but I say better safe than sorry. === In the modern era (as opposed to the pre-historic era of 40 years ago), we have huge controller disk caches that work similar to what you describe. One solution for at least the last 2 decades is to use an on board battery to maintain that disk controller cache. Around 2000 I was working with always up (24x365) disk subsystems. The ones I knew well were made by DEC, (I mean Compaq, (I guess I mean HP)). The battery could maintain the cache contents for 7 days during a hardware fault / power outage. They had the battery connected to the system via a Y-connector. The y-connector was accessible with the system operational. Thus if the battery died due to old age, you connected up the new battery before you disconnected the old, ensuring you didn't have a window of vulnerability during the battery replacement. Further, if for some reason 7 days wasn't long enough, You could charge a spare battery offline, then swap it out and keep the cache valid indefinitely.
Committing to disk, and committing in the right order is important.
Yes, and in the last 40 years a lot of effort has gone into making sure the order is right. Even only 20 years ago, the only status of a cache was clean or dirty. If it was dirty you had to issue a cache flush command to ensure any given piece of data was written to data. Fortunately, 15 years ago the ATA command set was enhanced to have a "force unit access" (FUA) command that allowed a specific data write to not be acknowledged as complete until the data written was on "stable storage". http://www.t13.org/documents/UploadedDocuments/docs2002/e01141r1.pdf If you read a XFS or RAID mailing list you will see they strongly urge against hardware cache systems that falsely report success for a FUA write operation. The exception is if the user has a battery backup system like the above that they trust and therefore the cache itself is considered stable. For my normal activity, I need reliable operation, but not 7x365 so I can easily accept a hardware cache that requires downtime to swap out the battery. The Linux kernel cache definitely allows FUA type writes from the filesystem level to propagate down the software stack to force a specific write out to a disk controller.
A write cache that 'accelerates' by releases the system call while deferring the actual transfer from the cache to storage simply cannot guarantee the integrity of the file system in the event of, for example, a power loss.
Thus the introduction of FUA writes 15 years ago. Hopefully any modern hardare based disk caching solution honors FUA write requests.
Certainly a SSD, which will retain content, can, if the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me.
As long as the FUA commands are honored, the filesystem should be trustworthy even with a large data cache.
Caching the read is also something that should be looked at critically. All in all the internal algorithms for memory allocation and mapping can probably make better use of memory than if it is externalized such as the memory on those accelerator boards. This is true for a number of reason but most of all that the OS knows better what it needs to cache based on the process activity.
Agreed except in the case of a shared external storage subsystem. I assume you've seen Per's posts about iSCSI systems. Basically you setup a server with lots of disk and ram to act as a dedicated storage server and export out iSCSI volumes. Then you have other machines (typically servers) that import those iSCSI volumes and mount them just as they would a physical ATA / SCSI disk.
In the case of "virtual memory" more memory means the page-out happens less, retrieval from the page-out queue is in effect a cache operation. Late model *NIX uses memory mapping for just about all open files, or can. Dynamic linking, along with shared libraries, are opened memory mapped files with a load-on-demand. Such pages may be on the out queue but they never need to be re-written.
Agreed
Adding RAM is often the cheapest way to improve performance with a *NIX system that can make use of it. Again there are a few "if" in that. This may not be your bottleneck. Profiling is important. You may also come up against hardware limits. Many PC motherboards have too few memory slots or their chip set won't allow the use of larger capacity memory cards. But before you spend >$50 on a SSD http://www.amazon.com/SanDisk-6-0GB-2-5-Inch-Height-SDSSDP-064G-G25/dp/B007ZWLRSU/ref=sr_1_3?ie=UTF8&qid=1455464006&sr=8-3&keywords=64g+ssd consider spending the same on RAM
A worthy comment, but RAM is still $7/GB or so and high-speed SSD is down below $0.50/GB I've bought at least 4TB of SSD in the last 6 months. At $7/GB that would be a $28K expense vs the $2K expense it was.
I'm sure you can find cheaper SSD and more expensive RAM. Speed, capacity, scale, all factor in.
Data pages, mapped files, are a yes-no-maybe situation. A program might need to read config files at start-up, but those are never read again and never written. Caching them is a waste of time, but how do you tell the cache that?
DPO - Disable Page Out - Yet another low-level ATA / SCSI type command (actually a flag bit used in conjunction with other commands). It tells the disk that data is not likely to be re-used so don't bother keeping it in cache after you get it to stable storage. A disk controller cache that sits between the kernel and the stable storage should honor the DPO flag and not keep that data around after it is written to disk.
The interface to the 'rotating rust' is also a cache in read mode, in that it reads a whole track at once. File system abstractions don't always permit this to be useful. Its all very well to have 4K pages and store files as contiguous blocks, but in reality a file access involves metatadata access as well, and we're talking about a multi-user system with asynchronous demands. Sometime I think that the design of file system such as ReiserFS where small files can have the data in the metadata block make sense, especially when dealing with small start-up config files. Anyway, we are back to the situation where only the application knows that this "data" file is a "read-once then discard" start-up config. even the OS doesn't know there is no need to cache the mapped pages. They don't even need to be put on the out queue, they can be marked 'free' the moment the file is closed. Oh? Perhaps they are? But how does the OS communicate this to the SSD bcache or the disk controller buffer?
I believe both of the ATA commands "FLUSH CACHE" and "FLUSH CACHE EXTENDED" allow a block range to be defined. I further "believe" the DPO feature is actually implemented as a single bit flag. So when the kernel is done with a file, it can send FLUSH CACHE commands with the DPO flag set. I don't know if it does, but I certainly wouldn't be surprised if it did.
The discipline of the hierarchy of organization and standardization of interfaces and separation of function may have led to many good things, but the old day of the IBM1040 and the like where the application had control all the way down the stack to the disk had a lot of optimization that is now simply not possible because the detailed information isn't available from the higher levels all the way down.
You haven't convinced me. There are low-level ATA/SCSI commands that you seem to ignore. If an app wants to control the caching scheme it can use O_DIRECT open() flag to bypass the kernel caching and implement its own. I'd be surprised if there isn't a way for userspace to trigger FUA / DPO / FLUSH_CACHE commands. After all, this is Linux we are talking about. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 02:20 PM, Greg Freemyer wrote:
On Sun, Feb 14, 2016 at 10:54 AM, Anton Aylward
wrote:
... [snip] ... === In the modern era (as opposed to the pre-historic era of 40 years ago), we have huge controller disk caches that work similar to what you describe. One solution for at least the last 2 decades is to use an on board battery to maintain that disk controller cache.
I am so glad you decided to follow up further on my note about that. Yes, one of those vendors for cache board for SCO UNIX devloped a baord with batter backup ... I thought I mentioned that.
Around 2000 I was working with always up (24x365) disk subsystems. The ones I knew well were made by DEC, (I mean Compaq, (I guess I mean HP)).
I know what you mean. I have some equipment in the basement, designed by DEC, built by Compaq but with a HP label.
The battery could maintain the cache contents for 7 days during a hardware fault / power outage. They had the battery connected to the system via a Y-connector. The y-connector was accessible with the system operational. Thus if the battery died due to old age, you connected up the new battery before you disconnected the old, ensuring you didn't have a window of vulnerability during the battery replacement.
That's a good mechanism. Was there something that noticed f the batter died of old age? Was there a means for testing that mechanisms worked (How far do you want to regress this?) Did the mechanism nag, nag, nag endlessly? 'Cos I'm sure I'm not alone when I speak of a set-up like this that the sysop decided to turn off the alarm, and turn off the alarm .. and um-plug the annoying alarm ... and forgot about it. What is it they say about idiot proof systems? Its humans that are the problem :-(
Committing to disk, and committing in the right order is important.
Yes, and in the last 40 years a lot of effort has gone into making sure the order is right.
Beginning, as I said with the momentous step in the change from the V6 to V7 UNIX and trying hard never to look back. I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them.
Even only 20 years ago, the only status of a cache was clean or dirty. If it was dirty you had to issue a cache flush command to ensure any given piece of data was written to data.
Not quite. The VM mechanisms of which I speak, which grew out of the VM of original BSD4.1 for the VAX of 1982 era, had to deal with the lack of a few crucial 'bits' in the VAX VM hardware support. That's one reason for the different types of ageing/available/"dirty" queues. There are many internal pointers that give context to a 'dirty'. Only a data page can be dirty, and there may be a file pointer to a buffer in virtual memory that is linear, but uses the VM's scatter-gather mechanisms to point to paces in the write-out queue. Perhaps "on the write out queue" is more meaningful, better descriptive, when speaking of VM mechanisms than "dirty".
Fortunately, 15 years ago the ATA command set was enhanced to have a "force unit access" (FUA) command that allowed a specific data write to not be acknowledged as complete until the data written was on "stable storage".
And of course every writer of disk controllers unerring makes use of that ...
If you read a XFS or RAID mailing list you will see they strongly urge against hardware cache systems that falsely report success for a FUA write operation.
Good. Buwee, Sort of. "Strongly urge" seems a bit lilly-livered[1].
The exception is if the user has a battery backup system like the above that they trust and therefore the cache itself is considered stable.
BAD :-( The security types have a saying "Trust, but verify". Unless there is that verification, one that somehow gets passed human failing, I would not trust.
For my normal activity, I need reliable operation, but not 7x365 so I can easily accept a hardware cache that requires downtime to swap out the battery.
We have a wide variety of users here. My home system ... I'm not sure I could justify a hardware cache anyway! But most of my clients have financial systems or web services where 7x365 *is* required and "Five Nines" may not be enough, and precautions and checks against human error or oversight is one of the first things that's considered. I've read that wall Street pretty much runs on Linux. They aren't the only financial hub in the world that uses Linux!
Hopefully any modern hardare based disk caching solution honors FUA write requests.
Hopefully.
Certainly a SSD, which will retain content, can, if the supporting software is there to do the job, ameliorate this. It gets back to the issue of the window/timing. But there are a few too many 'ifs" in this to inspire confidence in an old cynic like me.
As long as the FUA commands are honored, the filesystem should be trustworthy even with a large data cache.
I agree with your proviso there. But it is a case of "if".
Adding RAM is often the cheapest way to improve performance with a *NIX system that can make use of it. Again there are a few "if" in that. This may not be your bottleneck. Profiling is important. You may also come up against hardware limits. Many PC motherboards have too few memory slots or their chip set won't allow the use of larger capacity memory cards. But before you spend >$50 on a SSD http://www.amazon.com/SanDisk-6-0GB-2-5-Inch-Height-SDSSDP-064G-G25/dp/B007ZWLRSU/ref=sr_1_3?ie=UTF8&qid=1455464006&sr=8-3&keywords=64g+ssd consider spending the same on RAM
A worthy comment, but RAM is still $7/GB or so and high-speed SSD is down below $0.50/GB
We've had this argument before with tape storage vs DASD 50 years ago. Its not the cost. If it were the cost computers would not use RAM, they would access the SSD directly. My point is that the computer itself can make good use of more RAM ahead of faster disk. It can use the RAM because its under the immediate and complete control of the OS. Storage on disk is always "by proxy'. When you've run out of capacity on your motherboard, you look to other ways. Hmm, My desktop shipped from Dell with a since core CPU. One of the first things I did was get a 4-core chip for it. Hmm. that had a larger CPU cache as well :-) But oh, dear. Its the wrong socket type for one of the 20-core chips :-(
Data pages, mapped files, ...
DPO - Disable Page Out - Yet another low-level ATA / SCSI type command (actually a flag bit used in conjunction with other commands). It tells the disk that data is not likely to be re-used so don't bother keeping it in cache after you get it to stable storage.
Nice idea. The problem is that we're getting to the point where there needs to be an 'elevator' on the side of the call hierarchy between the application, the libraries, the OS call, the OS interface to file system model, the file system interface to the device drivers. I think I have tried to be vocal about keeping work at the right level in the stack. let me make it clear again if I haven't. Its a while since I examined the Virtual File System Switch code, perhaps I better do that once again to see if there is such an 'elevator' mechanism.. It strikes me that if a file system is mounted 'read only' then all access to that should have the DPO flag set. I note in passing that the VFS was invented by Sun for SunOS-3.x in the mid-1980s.
A disk controller cache that sits between the kernel and the stable storage should honor the DPO flag and not keep that data around after it is written to disk.
I'm not saying it shouldn't, I'm just questioning how the information what a file has been opened RO at the application level communicates that all the way down to the device driver, somehow knowing that this is an ATA/SCSI device. A key part of the whole UNIX model was always that it was "device agnostic". Let's face it, the whole VFS model means that there is no presumption as to what file system the file being opened RO is on. Heck, you could have your whole system running on BtrFS, minixFS, qnx4FS or some experimental-yet-to-be-releasedFS -- which is the whole point, isn't it?
I don't know if it does, but I certainly wouldn't be surprised if it did.
It's worth looking into or raising on a different list. D**n-it-all-to-h**, another demand on my time that I can't afford.
The discipline of the hierarchy of organization and standardization of interfaces and separation of function may have led to many good things, but the old day of the IBM1040 and the like where the application had control all the way down the stack to the disk had a lot of optimization that is now simply not possible because the detailed information isn't available from the higher levels all the way down.
You haven't convinced me.
I haven't convinced myself until i go look. Back to "Trust, but verify". I trust I'm right about there not being an 'elevator', but the VFS may be flexible enough. Don't expect me to find time to drill down though kernel code to explore this soon. Right now I have to deal doing the laundry to have clean clothes for tomorrow!
There are low-level ATA/SCSI commands that you seem to ignore. If an app wants to control the caching scheme it can use O_DIRECT open() flag to bypass the kernel caching and implement its own.
So the app is now taking on responsibility for 'knowing' what the underlying device is?
I'd be surprised if there isn't a way for userspace to trigger FUA / DPO / FLUSH_CACHE commands.
After all, this is Linux we are talking about.
Indeed. There's nothing to say that you can't write an app that access some weird piece of hardware directly to the 'raw' device. Oh wait, isn't that how things like 'fdisk' and the older version of X11 work? [1] http://www.merriam-webster.com/dictionary/lily%E2%80%93livered -- Any philosophy that can be put in a nutshell belongs there. -- Sydney J. Harris -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 14/02/2016 23:26, Anton Aylward a écrit :
'Cos I'm sure I'm not alone when I speak of a set-up like this that the sysop decided to turn off the alarm, and turn off the alarm .. and um-plug the annoying alarm ... and forgot about it.
on the server's room of my Linux user groups, my shoes where making an annoying "quiiiiirk" every some time. *years* after, I noticed an UPS under the board was 10 years old and quirked for a dead battery :-)) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2016-02-14 23:26, Anton Aylward wrote:
I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them.
Mine did. The teacher repeated several times how dangerous and bad language was the C that he was teaching us. Took pains to stress the point. I think we even practised how one variable would overflow and write another variable. Is this an error? - He would ask. - Yes. Is it bad? Yes. Is it dangerous? Yes. Will the compiler tell us? No. Me, I would add: Will the runtime whatever tell us? No. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlbBAUcACgkQja8UbcUWM1z9CwD+Jmx1JyUA4suVY09Ea42b0lFb lGmq3GInmGXAeWDi1V8A/1SObufgl40aeKQvBnOdDSNkwbEolPQirWzRTZOcjfzx =SJrq -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 14/02/2016 23:35, Carlos E. R. a écrit :
Me, I would add: Will the runtime whatever tell us? No.
sometime oe see curious writes on screen :-) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2016-02-14 23:39, jdd wrote:
Le 14/02/2016 23:35, Carlos E. R. a écrit :
Me, I would add: Will the runtime whatever tell us? No.
sometime oe see curious writes on screen :-)
Yep. He said that sometimes the program would just produce erroneous output, sometimes very difficult to notice. Sometimes, however, it would crash, and he said this was the best thing to happen, considering. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlbBArMACgkQja8UbcUWM1yapwD/bqrzzIcHgP+fQ+1YVaPlA7aE QXxlzDZIiD0Yp7QMGhEA/AkzOacqD95eVJ2rY2vntRjicts/TLeS85kjXNfCoYiF =wzPQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 05:39 PM, jdd wrote:
Le 14/02/2016 23:35, Carlos E. R. a écrit :
Me, I would add: Will the runtime whatever tell us? No.
sometime oe see curious writes on screen :-)
Sometimes nothing seems to happen. If you are lucky you find a new file called "core" on your disk. Sometimes if you are not lucky you find you were glad you made a backup of your disk. I gave up programming in C when I discovered Perl. I gave up programming in Perl when I discovered Ruby. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 05:04 PM, Anton Aylward wrote:
I gave up programming in C when I discovered Perl. I gave up programming in Perl when I discovered Ruby. I went back to C when I couldn't do what I needed in Perl or Ruby.
I have always liked this exchange regarding the choice of C :) http://article.gmane.org/gmane.comp.version-control.git/57918 -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 03:19 PM, David C. Rankin wrote:
On 02/14/2016 05:04 PM, Anton Aylward wrote:
I gave up programming in C when I discovered Perl. I gave up programming in Perl when I discovered Ruby. I went back to C when I couldn't do what I needed in Perl or Ruby.
I have always liked this exchange regarding the choice of C :)
http://article.gmane.org/gmane.comp.version-control.git/57918
LoL. Talk about a bitchslap! Who dares argue with that guy. -- After all is said and done, more is said than done. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 15/02/2016 00:04, Anton Aylward a écrit :
I gave up programming in Perl when I discovered Ruby.
did you discover FORTH? :-) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 03:12 AM, jdd wrote:
Le 15/02/2016 00:04, Anton Aylward a écrit :
I gave up programming in Perl when I discovered Ruby.
did you discover FORTH?
Yes, back in the 1970s when I was building embedded real time control systems. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 14/02/16 05:04 PM, Anton Aylward wrote:
On 02/14/2016 05:39 PM, jdd wrote:
Le 14/02/2016 23:35, Carlos E. R. a écrit :
Me, I would add: Will the runtime whatever tell us? No.
sometime oe see curious writes on screen :-) Sometimes nothing seems to happen. If you are lucky you find a new file called "core" on your disk. Sometimes if you are not lucky you find you were glad you made a backup of your disk.
I gave up programming in C when I discovered Perl. I gave up programming in Perl when I discovered Ruby.
And here I thought real programmers were still writing: cp stdin filename <hide> -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 08:53 AM, Darryl Gregorash wrote:
And here I thought real programmers were still writing:
cp stdin filename
http://imgs.xkcd.com/comics/real_programmers.png Butterflies are particularly efficacious if you are writing FORTH for a single register machine. There's nothing quite like a single register machine implemented with asynchronous random logic for sheer linear speed. : forever ( -- ) 0 begin dup . 1+ again ; forever -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 15/02/2016 15:20, Anton Aylward a écrit :
On 02/15/2016 08:53 AM, Darryl Gregorash wrote:
And here I thought real programmers were still writing:
cp stdin filename
http://imgs.xkcd.com/comics/real_programmers.png
Butterflies are particularly efficacious if you are writing FORTH for a single register machine. There's nothing quite like a single register machine implemented with asynchronous random logic for sheer linear speed.
: forever ( -- ) 0 begin dup . 1+ again ; forever
after several years with HP-41C and HO-48C, I liked pretty much FORTH :-) jdd -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Quoting Anton Aylward
On 02/15/2016 08:53 AM, Darryl Gregorash wrote:
And here I thought real programmers were still writing:
cp stdin filename
Real programmers enter, debug, and modify the program thru the front panel. Of course, only mossbacks use a physical front panel. However, like the physical front panel users, the ROM command line debugger fight between hex and octal. ;) Jeffrey, who has used most of the variants of the above debugging tools and prefers high level language symbolic debuggers. And has retired from being a "real programmer". -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 11:29 AM, Jeffrey L. Taylor wrote:
Real programmers enter, debug, and modify the program thru the front panel.
Of course, only mossbacks use a physical front panel. However, like the physical front panel users, the ROM command line debugger fight between hex and octal.
My first computer was an IMSAI 8080, which had a front panel. I also used to debug through the microcode on Data General Eclipse computers! https://en.wikipedia.org/wiki/IMSAI_8080 https://en.wikipedia.org/wiki/Data_General_Eclipse -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 02:29 PM, James Knott wrote:
On 02/15/2016 11:29 AM, Jeffrey L. Taylor wrote:
Real programmers enter, debug, and modify the program thru the front panel.
Of course, only mossbacks use a physical front panel. However, like the physical front panel users, the ROM command line debugger fight between hex and octal.
My first computer was an IMSAI 8080, which had a front panel. I also used to debug through the microcode on Data General Eclipse computers!
We also had a Interdata 8/32 in one of the labs at university. No real instruction set in the conventional sense, it was all microcode. https://en.wikipedia.org/wiki/Interdata_7/32_and_8/32 In many ways it was a predecessor to the ideas of the ICL 2900 series, https://en.wikipedia.org/wiki/ICL_2900_Series where any program could also load custom microcode. So a program complied in FORTRAN, for example brought in microcode that optimized for FORTRAN. Later models had demand paged microcode. It was all a bit ridiculous. Surely there were better ways to optimise? In many ways this was a research project for Manchester university, subsidized by the government and floated on the back of British industry. Anyway, this all led me to working with bit-slice and AMD-2900 chip set and building custom specific processing architectures and microcoding them. I'm sure there could have been a postgraduate thesis there at the time but I wasn't interested in an academic career. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 01:03 PM, Anton Aylward wrote:
It was all a bit ridiculous. Surely there were better ways to optimise?
That method hung around for quite a while, even when it was hidden from Joe User by hardware. Back at University I was manning the console of the computer center's CDC3200 when the BCD light came on. Nobody in ear shot had the slightest idea what the hell that meant, but the job went through the queue and the light went off. The Professor in charge walked by, we asked what the light was for, he looked in some manuals no one else dared touch, and said: Sonofabitch! Someone ran a COBOL job! Damn business majors! Binary Coded Decimal arithmetic was done in pseudo-hardware (optionally), and only after the compiled code requested it and caused the microcode to be loaded. It was accurate to 18 positions on either (or both) sides of the decimal point. -- After all is said and done, more is said than done. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 04:03 PM, Anton Aylward wrote:
Anyway, this all led me to working with bit-slice and AMD-2900 chip set
That's what the Eclipse used, IIRC. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Quoting James Knott
On 02/15/2016 11:29 AM, Jeffrey L. Taylor wrote:
Real programmers enter, debug, and modify the program thru the front panel.
Of course, only mossbacks use a physical front panel. However, like the physical front panel users, the ROM command line debugger fight between hex and octal.
My first computer was an IMSAI 8080, which had a front panel. I also used to debug through the microcode on Data General Eclipse computers!
https://en.wikipedia.org/wiki/IMSAI_8080 https://en.wikipedia.org/wiki/Data_General_Eclipse --
I also had an IMSAI 8080. I don't remember whether I grouped the red and blue switches by groups of 3 (octal) or groups of four (hex). And I used the microcode debugger on my LSI-11 (DEC). Plus the front panel on an AN/UYK-7 (militarized Univac) at work. Fond memories that I have no desire to re-enact. Jeffrey -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/15/2016 10:21 PM, Jeffrey L. Taylor wrote:
I also had an IMSAI 8080. I don't remember whether I grouped the red and blue switches by groups of 3 (octal) or groups of four (hex).
I had mine arranged for octal. It made remembering the op codes easier. Fond memories that I have no desire to re-enact. Mine started with 4K, but eventually made it's way up to 20K. ;-) I used audio cassette for storage and hooked it up to an ASR 35 Teletype I bought used from my employer. I also had an a keyboard, from Southwest Technical Products and later a Cherry keyboard that I modified. I also had a video card & security monitor for display. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 05:35 PM, Carlos E. R. wrote:
On 2016-02-14 23:26, Anton Aylward wrote:
I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them. Mine did.
The teacher repeated several times how dangerous and bad language was the C that he was teaching us. Took pains to stress the point.
You are fortunate to have such a teacher. I was in Chapters, our BigBox chain of bookstores, once and it must have been that I was looking at some new language books. A student asked me about languages, did I know UNIX, did I know C? She had a problem with learning C. I checked the shelves for something better than her courseware books, which, while good on grammatical language constructs, if-then-else and lexical stuff, but poor on program constructs. One I found was Lyon's Commentary. I showed her that. Do you recall in the move Mozart when Salieri comes to visit and Mozart is out but his wife shows Salieri some of his work. The reaction was rather like that. OK, not /quite/ as dramatic. "This is wonderful, I understand this!". Sometimes I think that there is a gulf separating the languages developed in North America from the ones developed in Europe (and possibly Japan). Sometimes I think this is reflected not just in the aesthetics of the languages themselves but in the attitude towards teaching programming. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 04:26 PM, Anton Aylward wrote:
I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them.
What bothers me more are the number of questions you find, say on programming sites like StackOverflow.com of people actively trying to learn how to do stack smashing and buffer overflow exploits -- supposedly for "educational purposes"... Kinda makes you wonder what we are training the next generation to do.... -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 03:12 PM, David C. Rankin wrote:
I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them. What bothers me more are the number of questions you find, say on programming sites like StackOverflow.com of people actively trying to learn how to do stack smashing and buffer overflow exploits -- supposedly for "educational
On 02/14/2016 04:26 PM, Anton Aylward wrote: purposes"... Kinda makes you wonder what we are training the next generation to do....
Work for NSA & DOD? I hear they're hurting for real cyber security talent. Schools are churning out CS grads who have never used UNIX/Linux and don't know what a "socket" is! If all you are taught in school is Windows, the real talent will have to be self-taught. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 06:12 PM, David C. Rankin wrote:
On 02/14/2016 04:26 PM, Anton Aylward wrote:
I do wonder, however, just how much mistakes like this are taught in CS courses? I've noted many times that the #1 and #1 vulnerabilities in the SANS Top 20 list, SQL Injection and Bugger Overflow, have been around for more than 20 years. Buffer Overflow, if you recall, was the root cause of the Morris Worm of 1988 which took down an appreciable part of the Internet-as-it-then-was. My point here is that when I interview new intakes of programmers or even talk with one who've been working for my client for some years, even the ones that are aware of these tell me their schools & colleege course never mentioned them.
What bothers me more are the number of questions you find, say on programming sites like StackOverflow.com of people actively trying to learn how to do stack smashing and buffer overflow exploits -- supposedly for "educational purposes"... Kinda makes you wonder what we are training the next generation to do....
Indeed. And the wannabes who post to the security groups on LinkedIn who have the same attitude and want to know how they can "Break in" to the security business by showing their prowess in this manner. Yes, I've seen them use that phrase - "break in to" - without realising the irony of their Freudian slip. That, too, makes me wonder. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
The battery could maintain the cache contents for 7 days during a hardware fault / power outage. They had the battery connected to the system via a Y-connector. The y-connector was accessible with the system operational. Thus if the battery died due to old age, you connected up the new battery before you disconnected the old, ensuring you didn't have a window of vulnerability during the battery replacement.
Sounds very much like Compaq Storageworks HSG80 controllers and associated write-cache batteries.
That's a good mechanism. Was there something that noticed f the batter died of old age? Was there a means for testing that mechanisms worked (How far do you want to regress this?)
Did the mechanism nag, nag, nag endlessly?
LSI / 3ware cards require regular testing of the batteries. HP Smart Arrays will alert you when the batteries have insuficcient capacity, I presume they do their own internal testing. When the capacity is insufficient, the write cache is disabled.
'Cos I'm sure I'm not alone when I speak of a set-up like this that the sysop decided to turn off the alarm, and turn off the alarm .. and um-plug the annoying alarm ... and forgot about it.
Perhaps, but disabling the write cache will have a noticable effect. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Feb 17, 2016 at 6:46 AM, Per Jessen
The battery could maintain the cache contents for 7 days during a hardware fault / power outage. They had the battery connected to the system via a Y-connector. The y-connector was accessible with the system operational. Thus if the battery died due to old age, you connected up the new battery before you disconnected the old, ensuring you didn't have a window of vulnerability during the battery replacement.
Sounds very much like Compaq Storageworks HSG80 controllers and associated write-cache batteries.
Actually, the prior incarnation (HSZ80) is what I was thinking of. The HSZ series had a true SCSI front-end interface. The HSG had a glass (fibre-channel) front-end. I worked with both at the time, but I don't recall doing a battery swap on an HSG. Greg -- Greg Freemyer www.IntelligentAvatar.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
Adding RAM is often the cheapest way to improve performance with a *NIX system that can make use of it. Again there are a few "if" in that. This may not be your bottleneck. Profiling is important. You may also come up against hardware limits. Many PC motherboards have too few memory slots or their chip set won't allow the use of larger capacity memory cards. But before you spend >$50 on a SSD http://www.amazon.com/SanDisk-6-0GB-2-5-Inch-Height-SDSSDP-064G-G25/dp/B007ZWLRSU/ref=sr_1_3?ie=UTF8&qid=1455464006&sr=8-3&keywords=64g+ssd consider spending the same on RAM
A worthy comment, but RAM is still $7/GB or so and high-speed SSD is down below $0.50/GB
I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
I've bought at least 4TB of SSD in the last 6 months. At $7/GB that would be a $28K expense vs the $2K expense it was.
What I'm looking for is an economical compromise that will keep this storage setup efficient for maybe another 2-3 years. A PCIe SSD card with e.g. 400Gb is only about CHF500.
There are low-level ATA/SCSI commands that you seem to ignore. If an app wants to control the caching scheme it can use O_DIRECT open() flag to bypass the kernel caching and implement its own.
I'd be surprised if there isn't a way for userspace to trigger FUA / DPO / FLUSH_CACHE commands.
Probably fcntl() and ioctl(). /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov wrote:
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
wrote: I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches.
Each storage server has 4Gb, expandable to 32Gb afaict - but ietd does use direct IO. (type=blockio). I think I need to check up on fileio vs. blockio. Adding some more RAM is a lot easier than adding a bcache. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Feb 17, 2016 at 7:03 AM, Per Jessen
Andrei Borzenkov wrote:
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
wrote: I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches.
Each storage server has 4Gb, expandable to 32Gb afaict - but ietd does use direct IO. (type=blockio). I think I need to check up on fileio vs. blockio. Adding some more RAM is a lot easier than adding a bcache.
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space? If so, you should verify that FUA, DPO, CACHE_FLUSH commands do / don't work and that your filesystem is able to handle any false successes those commands trigger. XFS in particular has advice as to how to address features falsely reporting success. note that if all those features are well implemented in ietd, then letting your filesystem leverage them will improve performance (at least theoretically. XFS has worked hard at being well performing even if FUA in particular is not reliable, but it needs to be told. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
On Wed, Feb 17, 2016 at 7:03 AM, Per Jessen
wrote: Andrei Borzenkov wrote:
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
wrote: I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches.
Each storage server has 4Gb, expandable to 32Gb afaict - but ietd does use direct IO. (type=blockio). I think I need to check up on fileio vs. blockio. Adding some more RAM is a lot easier than adding a bcache.
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space?
Almost certainly not - that would somewhat negate the idea of using direct IO. Still, I think it's an implementation left-over from years back. I have not found any reason for using blockio vs fileio (those are the two terms used by ietd), but I'll have to test it. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
17.02.2016 20:08, Per Jessen пишет:
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space?
Almost certainly not - that would somewhat negate the idea of using direct IO.
Not at all. Direct IO is used to avoid double buffering (which in some cases can result in significant slow down) and give application full control over when it flushes data to disk. It does not mean application should refrain from caching. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov wrote:
17.02.2016 20:08, Per Jessen пишет:
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space?
Almost certainly not - that would somewhat negate the idea of using direct IO.
Not at all. Direct IO is used to avoid double buffering (which in some cases can result in significant slow down) and give application full control over when it flushes data to disk. It does not mean application should refrain from caching.
Agree, but ietd is not the application, only the iscsi target server. ietd uses hardly any memory at all. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Wed, Feb 17, 2016 at 12:08 PM, Per Jessen
Greg Freemyer wrote:
On Wed, Feb 17, 2016 at 7:03 AM, Per Jessen
wrote: Andrei Borzenkov wrote:
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
wrote: I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches.
Each storage server has 4Gb, expandable to 32Gb afaict - but ietd does use direct IO. (type=blockio). I think I need to check up on fileio vs. blockio. Adding some more RAM is a lot easier than adding a bcache.
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space?
Almost certainly not - that would somewhat negate the idea of using direct IO. Still, I think it's an implementation left-over from years back. I have not found any reason for using blockio vs fileio (those are the two terms used by ietd), but I'll have to test it.
/Per
We may be talking about different things. The device /dev/sdx can be directly opened as a block device and it can perform blockio. By default I'm pretty sure the kernel would provide a cache for it. OTOH, if the "O_DIRECT" flag is passed in on the open() call, then the kernel is being instructed not to provide a cache. User space will typically manage cache management if O_DIRECT is in use. Grep the source code for the O_DIRECT flag and see if it is used to open the block devices. That will let you know if the kernel is providing cache functionality. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
On Wed, Feb 17, 2016 at 12:08 PM, Per Jessen
wrote: Greg Freemyer wrote:
On Wed, Feb 17, 2016 at 7:03 AM, Per Jessen
wrote: Andrei Borzenkov wrote:
On Wed, Feb 17, 2016 at 2:28 PM, Per Jessen
wrote: I'll have to check, but I don't think adding memory will do much. The filesystems cache is not used because there are no filesystems in use - the raw disk space is handed out over iSCSI, just as Greg described.
Caches are per-block device, so unless you are using devices for direct IO they should still benefit from RAM caches.
Each storage server has 4Gb, expandable to 32Gb afaict - but ietd does use direct IO. (type=blockio). I think I need to check up on fileio vs. blockio. Adding some more RAM is a lot easier than adding a bcache.
The purpose of O_DIRECT is to move the cache handling from kernel space to userspace. Does ietd not implement a cache in user space?
Almost certainly not - that would somewhat negate the idea of using direct IO. Still, I think it's an implementation left-over from years back. I have not found any reason for using blockio vs fileio (those are the two terms used by ietd), but I'll have to test it.
/Per
We may be talking about different things.
The device /dev/sdx can be directly opened as a block device and it can perform blockio. By default I'm pretty sure the kernel would provide a cache for it.
ietd uses "blockio" and "fileio" like this: <!-- In fileio mode (default), it defines a mapping between a "Logical Unit Number" <lun> and a given device <device> , which can be any block device (including regular block devices like hdX and sdX and virtual block devices like LVM and Software RAID devices) or regular files. In blockio mode, it defines a mapping between a "Logical Unit Number" <lun> and a given block device <device>. This mode will perform direct block i/o with the device, bypassing page-cache for all operations. This allows for efficient handling of non-aligned sector transfers (virtualized environments) and large block transfers (media servers). This mode works ideally with high-end storage HBAs and for applications that either do not need caching between application and disk or need the large block throughput. --> Judging by that, blockio is the appropriate mode, but adding memory to the storage server will not help at all. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
ietd uses "blockio" and "fileio" like this:
Judging by that, blockio is the appropriate mode, but adding memory to the storage server will not help at all.
LIO, the preferred iscsi target for openSUSE, also uses the terms "fileio" and "blockio". I guess they're native to the iSCSI spec. With LIO, creating a fileio backstore gives me this comment: "Note: block backstore preferred for best results." -- Per Jessen, Zürich (5.0°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2 I've been using it for a few years now... -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 SP1|GNOME 3.10.4|3.12.51-60.25-default up 6 days 19:50, 6 users, load average: 0.56, 0.27, 0.19 CPU AMD A4-5150M @ 2.70GHz | GPU Radeon HD 8350G -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Malcolm wrote:
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2
I had to log in to get access, is that intentional? -- Per Jessen, Zürich (4.6°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu 11 Feb 2016 01:32:59 PM CST, Per Jessen wrote:
Malcolm wrote:
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2
I had to log in to get access, is that intentional? Hi It would appear to be the case.
-- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 SP1|GNOME 3.10.4|3.12.51-60.25-default up 6 days 21:11, 6 users, load average: 0.72, 0.47, 0.43 CPU AMD A4-5150M @ 2.70GHz | GPU Radeon HD 8350G -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
Malcolm wrote:
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2
I had to log in to get access, is that intentional?
On random read, it slows down quite a bit, I guess that is because it's actually reading from disk? I'm surprised to see how slow "Mixed 70/30 random read and write with 8K block size" is. Random read is reading from disk, but why is the random write so slow? -- Per Jessen, Zürich (5.2°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu 11 Feb 2016 03:35:16 PM CST, Per Jessen wrote:
Per Jessen wrote:
Malcolm wrote:
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2
I had to log in to get access, is that intentional?
On random read, it slows down quite a bit, I guess that is because it's actually reading from disk? I'm surprised to see how slow "Mixed 70/30 random read and write with 8K block size" is. Random read is reading from disk, but why is the random write so slow?
Hi Possibly because it's set at writethough? Test and tweak, things have also improved with later kernels, I've found it robust with bcache/xfs on RAID mirrors. Maybe a read here for tweaks; https://www.kernel.org/doc/Documentation/bcache.txt -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 SP1|GNOME 3.10.4|3.12.51-60.25-default up 6 days 22:44, 6 users, load average: 0.38, 0.33, 0.38 CPU AMD A4-5150M @ 2.70GHz | GPU Radeon HD 8350G -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Malcolm wrote:
On Thu 11 Feb 2016 03:35:16 PM CST, Per Jessen wrote:
Per Jessen wrote:
Malcolm wrote:
On Thu 11 Feb 2016 08:23:17 AM CST, Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Hi I wrote a blog entry about it https://forums.opensuse.org/entry.php/159-Setting-up-bcache-on-openSUSE-13-2
I had to log in to get access, is that intentional?
On random read, it slows down quite a bit, I guess that is because it's actually reading from disk? I'm surprised to see how slow "Mixed 70/30 random read and write with 8K block size" is. Random read is reading from disk, but why is the random write so slow?
Hi Possibly because it's set at writethough?
Ah, okay.
Test and tweak, things have also improved with later kernels, I've found it robust with bcache/xfs on RAID mirrors.
Maybe a read here for tweaks; https://www.kernel.org/doc/Documentation/bcache.txt
Thanks - I think I might acquire one of those 128Gb SSDs on PCIe, they're not too expensive to experiment with. /Per -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Don, 2016-02-11 at 15:35 +0100, Per Jessen wrote:
On random read, it slows down quite a bit, I guess that is because it's actually reading from disk? I'm surprised to see how slow "Mixed 70/30 random read and write with 8K block size" is. Random read is reading from disk, but why is the random write so slow?
I think the fio.bash test script is a bit misleading in this case. The interesting property to test is caching, meaning we expect repeated access of the same files to show increasing performance because the cache hit rate increases. The following are repeated runs of the 70/30 random rw tests from fio.bash on my bcache setup. The test files are created (but not cached on the SSD) during the first run, and reused for subsequent runs (which gradually causes more and more of the files to be cached): # rm /home/8k7030test.* # fio --size=128M --direct=1 --rw=randrw --refill_buffers --norandommap --ioengine=libaio --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 - -runtime=10 --group_reporting --name=8k7030test --directory=/home | grep iops read : io=64208KB, bw=6052.3KB/s, iops=756, runt= 10609msec write: io=26704KB, bw=2517.2KB/s, iops=314, runt= 10609msec # !fio read : io=102888KB, bw=9474.1KB/s, iops=1184, runt= 10859msec write: io=41888KB, bw=3857.5KB/s, iops=482, runt= 10859msec # !fio read : io=142144KB, bw=13066KB/s, iops=1633, runt= 10879msec write: io=57912KB, bw=5323.3KB/s, iops=665, runt= 10879msec # !fio read : io=139864KB, bw=13272KB/s, iops=1659, runt= 10538msec write: io=56968KB, bw=5405.1KB/s, iops=675, runt= 10538msec # !fio read : io=147000KB, bw=13905KB/s, iops=1738, runt= 10572msec write: io=59616KB, bw=5639.5KB/s, iops=704, runt= 10572msec # !fio read : io=177832KB, bw=16286KB/s, iops=2035, runt= 10919msec write: io=71288KB, bw=6528.9KB/s, iops=816, runt= 10919msec It is also interesting to inspect the entire fio output. Among other things, it shows the performance of all physical devices involved, as well as the load distribution on each one of them. Regards, Olav -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/14/2016 09:39 AM, Olav Reinert wrote:
The interesting property to test is caching, meaning we expect repeated access of the same files to show increasing performance because the cache hit rate increases.
Yes but is that reasonable? In the real world, I mean? Suppose we have a web site and the server is taking $LARGENUM hits per second. All manner of files will keep being accessed as well as the one relating to pages being hit (it gets more complicated if this is a dynamic site and pages come from databases), the dot-htacess, the cgi code (perl, python, java, ruby, whatever), the static graphics file for the eye--candy. Over and over and over, the same files. It strikes me that this is a better test situation than a rigged test. its the kind of thing which, if the caching mechanism works, is likely to sell because it has demonstrable value. But it also ha to compete with the systems own ability to cache open files via the memory mapping and the page caching algorithms, and gets back to the issue of is this better served by more RAM. Somewhere, the capacity of the motherboard to support more RAM will run out. That's where running a test like this becomes valid. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Because I've been out googling this some and also looking for suitable SSD PCIe cards, I get lots of ads presented with such cards, including this HP card: HP IO Accelerator 3TB MLC PCI-E https://www.pcp.ch/de/Hewlett-Packard-HP-3TB-IO-Accelerator-MLC-PCIe-1a17932... A mere 45'000 francs .... ships in 3-5 days. -- Per Jessen, Zürich (11.4°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun 21 Feb 2016 11:09:43 AM CST, Per Jessen wrote:
Per Jessen wrote:
I was studying bcache yesterday and just wondered if anyone had any real-life experiences to share?
Because I've been out googling this some and also looking for suitable SSD PCIe cards, I get lots of ads presented with such cards, including this HP card:
HP IO Accelerator 3TB MLC PCI-E <snip link> A mere 45'000 francs .... ships in 3-5 days.
Hi Sounds like you need a couple of them ;) Just a note, you can run multiple disks through one bcache (bcache device), be interesting to try different formats across multiple disks... -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 SP1|GNOME 3.10.4|3.12.51-60.25-default up 2 days 14:59, 4 users, load average: 0.30, 0.37, 0.27 CPU AMD A4-5150M @ 2.70GHz | GPU Radeon HD 8350G -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (15)
-
Andrei Borzenkov
-
Anton Aylward
-
Carlos E. R.
-
Darryl Gregorash
-
David C. Rankin
-
Greg Freemyer
-
James Knott
-
jdd
-
Jeffrey L. Taylor
-
John Andersen
-
Lew Wolfgang
-
Linda Walsh
-
Malcolm
-
Olav Reinert
-
Per Jessen