On 12/04/2021 12.52, Simon Lees wrote:
On 4/12/21 5:20 PM, Carlos E. R. wrote:
On 12/04/2021 03.47, David Haller wrote:
Hello,
...
The next (philosophical) question is why are those functions uninterruptible?
It could write an item of the cache, then another item, then the next... I see no "philosophical" reason why that can not be aborted. There must be a reason out there that I don't know.
Now, don't go investigating the code for me, it is just a curiosity :-)
Without investigating the code I suspect it comes down to how you define an item and whether the sync command actually knows about said items. The sync command basically says look at everything in cache that has yet to be written to disk and write it to disk because I want to have the disk back in a consistent state (because I may want eject a USB or power off a machine).
In all likely hood all the sync command knows is the blocks of cache that haven't been written to disk and where those blocks should be written on disk. At that level it should have no concept of the data thats actually in that block of cache so it can't go I know if I write the following 4 blocks to cache myfile.txt will be fully written to disk and it might be safe to stop at this point. To further complicate this say when you saved myfile.txt it put the data across 8 blocks of cache, something may have decided even before you called sync that there was some spare time and had already written 3 of those 8 blocks to cache. So really the only way the OS can be sure that data isn't corrupted by the sync process is to allow it to run completely (like the only way you can be that data on your USB stick won't be corrupted when you pull it out is to have something call sync which is what pressing the eject button does).
Understood, that's good enough for me, thanks :-)
If you are curious, I have been investigating a problem I have with hibernation: sometimes it does not succeed, it stalls. When this happens, I can not poweroff the machine, in the end I have to switch the power off.
What I have found, is that the machine is, those times, is trying to sync the filesystem and failing, precisely because texpire is running. Probably leaving the machine there for half an hour would succeed - but obviously, sometimes one is in a hurry to hibernate of poweroff (battery running out, say), and I had no idea that waiting for half an hour might work.
I have found out more. In my case, there is a dedicated partition to /var/spool/news/ (formatted as reiserfs), and this partition goes 100% busy during texpire. If I run "sync" fifteen minutes after texpire finishes, it takes a minute to complete. A sync at other point in the day, takes about half a minute.
The partition was mounted "relatime,lazytime". If I take out the "lazytime" parameter, the sync completes in a second (except if texpire is running).
My conclusion is that lazytime is broken in the case of reiserfs (or in the case of news). The writing to disk is not delayed to an appropriate time, it is delayed for ever.
I guess it might depend on what you call an appropriate time, its quite possible that while texpire is running and causing significant disk usage its deciding the appropriate time is after it has finished. You have to remember this is mostly a powersaving feature for laptops so that rather then spending lots of energy constantly spinning up a rotating hard drive to write small amounts of data it somewhat waits until your done (or the cache is full) and spins up the drive once to write out all the data. By mounting with lazytime your saying I don't care when in the future this happens on the other hand sometimes when you hibernate your laptop you do care.
Well, I read the documentation again and the current version says it can delay as much as 24 hours. I think that's way too much, at least it should be tunable. When the machine is idling, waiting an hour it is more than enough. (if it is possible to sync only a partition, then a cron job would do) When texpire is running the problem is different: the machine can not be suspended or halted. It can take half an hour to complete, during which time if it is a laptop, or a server during a power failure on UPS, it should be treated as an emergency and handle the situation somehow. If it is a manual command I know I have to kill texpire, but if not, the battery will die sooner. -- Cheers / Saludos, Carlos E. R. (from 15.2 x86_64 at Telcontar)