On 2010-08-15 03:57, Linda Walsh wrote:
Unfortunately, my machines have been run through through the ringer, going for prolonged periods of time of having problems, including unclean shutdowns (too many things on plate to address issues that only happened at shutdown, with uptimes often measured in months.
The only non-user-error dataloss I've ever experienced has been attributable to 2 areas, neither was XFS.
The data loss problem that Greg refers to is quite specific, the cause is known. It is a problem with any modern filesystem with a journal and optimized structures held in memory for speed. When the system is writing to disk, things may not get actually written at than moment; the actual write can be delayed. There is a classic example of how a file been edited is saved. Let me see if I remember it right: the in memory version of the file is saved as file.new the current file.bak is deleted the current file.txt is renamed to file.bak and file.new is renamed as file.txt The idea is that you will not be in danger of losing the file if the computer crashes during the operation. You could even rename the old file.bak as "file.bak;1" for extra security. However, there is no actual guarantee that the disk operations are really committed in that order, or at all. In fact, you may end with a partially written file.new, no file.bak, and a deleted file.txt. What programmers want (again, if I recall correctly) is a way of ensuring that one operation happens only after the previous one has been committed to disk (and only when they specify this type of working). What the kernel chaps propose is that they use the existing "flush"; it seems that the other type of operation is not contemplated in the standards. The big snag is that "flush" flushes everything to disk, not only the operation we are interested in, and thus, negating the efficiency of the memory buffers and all the structures that speed the filesystem. Ie, all disk new operations halt till all the pending operations are really written to disk, slowing performance notabily. There is a nice, extensive write up where this is explained very nicely, posted here a year ago or so, but I don't have the link handy. Maybe someone remembers. And XFS is affected by this more than others, as it uses memory structures more than others. A power off at a very bad moment can cause data loss. There maybe other scenarios than the one I described above, but this is the one they referred to, IIRC. -- Cheers / Saludos, Carlos E. R. (from 11.2 x86_64 "Emerald" GM (Elessar))