On Thu, May 2, 2013 at 2:03 AM, Yamaban <foerster@lisas.de> wrote:
On Thu, 2 May 2013 06:26, Jan Engelhardt <jengelh@...> wrote:
On Wednesday 2013-05-01 20:52, Claudio Freire wrote:
[snip]
particular order. If there's a crash, or a power outage, you don't know which pages have been written and which haven't. Postgres and most reasonably reliable databases you mention will have a WAL to make sure pages are written in the right order, and no torn pages appear. The WAL is a file opened in sync I/O mode. It's slow, but safe.[...] with memory mapped I/O, you have no way to force a specific write order.
You can have order with mmap. In fact, OpenLDAP developer Howard Chu presented the inner workings of a new MDB backend last year at LinuxCon Europe. mdb is all about mmap and uses COW with 2 snapshots {current,previous} at any time,[1,2] ensuring reliability whilst keeping speed.
[1] www.openldap.org/pub/hyc/mdm-paper.pdf [2] www.openldap.org/pub/hyc/mdm-slides.pdf
Thanks for the links, but keep one thing in mind:
In LDAP there are much more reads than writes.
Syslogs are mostly writes ans little reads. That, coupled with the need of reliable crash recovery, makes mmap and consorts much less suited for the use in logging esp. Syslog.
Not only that. msync only flushes to the file system, but there is no guarantee it will reach the disc. Ie: it flushes dirty pages, but doesn't execute a write barrier. If you map an O_DIRECT file descriptor with msync, the performance is abysmal under heavy writes. Or was last time I tried. So, LDAP's technique doesn't scale for write-heavy workloads. I will certainly read the papers though... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org