
On 4/12/21 5:20 PM, Carlos E. R. wrote:
On 12/04/2021 03.47, David Haller wrote:
Hello,
On Mon, 12 Apr 2021, Carlos E. R. wrote:
It is just a fact, it can not.
Situation.
The system is running the process "texpire", which is part of leafnode package, an nntp proxy server. texpire simply looks at every posts and deletes those that are old according to certain rules. This is very intensive on the disk, looking at 1.2 million files. It runs for half an hour.
If during that process I issue a "sync", it does not succeed till texpire ends. It can not be killed.
You have to kill texpire, as that's continuously creating new stuff to be synced.
But I don't want to. In this case, I want sync to abandon the attempt because I did a mistake calling "sync". And I wonder why it can not.
And you can consider 'sync' to be halfdead already (just not yet done reporting back (or not usually, other than exiting with status 0)), as it's basically just issuing a 'sync', 'syncfs', 'fsync' or fdatasync' syscall (see the manpages in section 2) and reporting back. The source for sync is rather straightforward and a measly 5537 bytes in coreutils version 8.32.
==== comments by me after //DNH: ==== [..] int main (int argc, char **argv) { [..] if (! args_specified || (arg_file_system && ! HAVE_SYNCFS)) mode = MODE_SYNC; else if (arg_file_system) mode = MODE_FILE_SYSTEM; //DNH: will call syncfs(2) if available else if (! arg_data) mode = MODE_FILE; //DNH: will call fsync(2) else mode = MODE_DATA; //DNH: will call fdatasync(2)
if (mode == MODE_SYNC) sync (); //DNH: well, duh ;) [..] return ok ? EXIT_SUCCESS : EXIT_FAILURE; } ====
So, default is to just call 'sync(2)', unless you call 'sync(1)' with options or arguments in which case it branches to a function and calls fsync(2) or fdatasync(2) or syncfs(2) depeding on 'mode'. And:
==== man 2 sync ==== sync() causes all pending modifications to filesystem metadata and cached file data to be written to the underlying filesystems. ====
Anyway: all those functions (sync, fsync, fdatasync, syncfs) are syscalls and, as they're writing to storage, in uninterruptible sleep-state ('D' in ps/top etc.), and thus not "killable" from userland.
Ah. This is what I feared.
And if sync(1) is "in" the call to these functions, you can not kill it as well until that function returns control to the sync(1) process.
So, your only option is to kill anything that still causes more stuff to be synced and wait. Or shut the machine off hard[1] via sysrq+b, sysrq+o, the reset- or powerbutton ...
HTH, -dnh
[1] as the kernel is already syncing ...
The next (philosophical) question is why are those functions uninterruptible?
It could write an item of the cache, then another item, then the next... I see no "philosophical" reason why that can not be aborted. There must be a reason out there that I don't know.
Now, don't go investigating the code for me, it is just a curiosity :-)
Without investigating the code I suspect it comes down to how you define an item and whether the sync command actually knows about said items. The sync command basically says look at everything in cache that has yet to be written to disk and write it to disk because I want to have the disk back in a consistent state (because I may want eject a USB or power off a machine). In all likely hood all the sync command knows is the blocks of cache that haven't been written to disk and where those blocks should be written on disk. At that level it should have no concept of the data thats actually in that block of cache so it can't go I know if I write the following 4 blocks to cache myfile.txt will be fully written to disk and it might be safe to stop at this point. To further complicate this say when you saved myfile.txt it put the data across 8 blocks of cache, something may have decided even before you called sync that there was some spare time and had already written 3 of those 8 blocks to cache. So really the only way the OS can be sure that data isn't corrupted by the sync process is to allow it to run completely (like the only way you can be that data on your USB stick won't be corrupted when you pull it out is to have something call sync which is what pressing the eject button does).
If you are curious, I have been investigating a problem I have with hibernation: sometimes it does not succeed, it stalls. When this happens, I can not poweroff the machine, in the end I have to switch the power off.
What I have found, is that the machine is, those times, is trying to sync the filesystem and failing, precisely because texpire is running. Probably leaving the machine there for half an hour would succeed - but obviously, sometimes one is in a hurry to hibernate of poweroff (battery running out, say), and I had no idea that waiting for half an hour might work.
I have found out more. In my case, there is a dedicated partition to /var/spool/news/ (formatted as reiserfs), and this partition goes 100% busy during texpire. If I run "sync" fifteen minutes after texpire finishes, it takes a minute to complete. A sync at other point in the day, takes about half a minute.
The partition was mounted "relatime,lazytime". If I take out the "lazytime" parameter, the sync completes in a second (except if texpire is running).
My conclusion is that lazytime is broken in the case of reiserfs (or in the case of news). The writing to disk is not delayed to an appropriate time, it is delayed for ever.
I guess it might depend on what you call an appropriate time, its quite possible that while texpire is running and causing significant disk usage its deciding the appropriate time is after it has finished. You have to remember this is mostly a powersaving feature for laptops so that rather then spending lots of energy constantly spinning up a rotating hard drive to write small amounts of data it somewhat waits until your done (or the cache is full) and spins up the drive once to write out all the data. By mounting with lazytime your saying I don't care when in the future this happens on the other hand sometimes when you hibernate your laptop you do care. -- Simon Lees (Simotek) http://simotek.net Emergency Update Team keybase.io/simotek SUSE Linux Adelaide Australia, UTC+10:30 GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B