Re: Why can't the process "sync" be killed?

12 Apr 2021

      On 4/12/21 5:20 PM, Carlos E. R. wrote:
...
On 12/04/2021 03.47, David Haller wrote:
...
Hello,
On Mon, 12 Apr 2021, Carlos E. R. wrote:
...
It is just a fact, it can not.
Situation.
The system is running the process "texpire", which is part of leafnode
package, an nntp proxy server. texpire simply looks at every posts and
deletes those that are old according to certain rules. This is very
intensive
on the disk, looking at 1.2 million files. It runs for half an hour.
If during that process I issue a "sync", it does not succeed till
texpire
ends. It can not be killed.
You have to kill texpire, as that's continuously creating new stuff to
be synced.
But I don't want to. In this case, I want sync to abandon the attempt
because I did a mistake calling "sync". And I wonder why it can not.
...
And you can consider 'sync' to be halfdead already (just
not yet done reporting back (or not usually, other than exiting with
status 0)), as it's basically just issuing a 'sync', 'syncfs', 'fsync'
or fdatasync' syscall (see the manpages in section 2) and reporting
back. The source for sync is rather straightforward and a measly 5537
bytes in coreutils version 8.32.
==== comments by me after //DNH: ====
[..]
int
main (int argc, char **argv)
{
[..]
   if (! args_specified || (arg_file_system && ! HAVE_SYNCFS))
     mode = MODE_SYNC;
   else if (arg_file_system)
     mode = MODE_FILE_SYSTEM;        //DNH: will call syncfs(2) if
available
   else if (! arg_data)
     mode = MODE_FILE;               //DNH: will call fsync(2)
   else
     mode = MODE_DATA;               //DNH: will call fdatasync(2)
   if (mode == MODE_SYNC)
     sync ();                        //DNH: well, duh ;)
[..]
   return ok ? EXIT_SUCCESS : EXIT_FAILURE;
}
====
So, default is to just call 'sync(2)', unless you call 'sync(1)' with
options or arguments in which case it branches to a function and calls
fsync(2) or fdatasync(2) or syncfs(2) depeding on 'mode'. And:
==== man 2 sync ====
        sync()  causes  all  pending  modifications  to filesystem
metadata and
        cached file data to be written to the underlying 
filesystems.
====
Anyway: all those functions (sync, fsync, fdatasync, syncfs) are
syscalls and, as they're writing to storage, in uninterruptible
sleep-state ('D' in ps/top etc.), and thus not "killable" from
userland.
Ah. This is what I feared.
...
And if sync(1) is "in" the call to these functions, you can
not kill it as well until that function returns control to the sync(1)
process.
So, your only option is to kill anything that still causes more stuff
to be synced and wait. Or shut the machine off hard[1] via sysrq+b,
sysrq+o, the reset- or powerbutton ...
HTH,
-dnh
[1] as the kernel is already syncing ...
The next (philosophical) question is why are those functions
uninterruptible?
It could write an item of the cache, then another item, then the next...
I see no "philosophical" reason why that can not be aborted. There must
be a reason out there that I don't know.
Now, don't go investigating the code for me, it is just a curiosity :-)
Without investigating the code I suspect it comes down to how you define
an item and whether the sync command actually knows about said items.
The sync command basically says look at everything in cache that has yet
to be written to disk and write it to disk because I want to have the
disk back in a consistent state (because I may want eject a USB or power
off a machine).

In all likely hood all the sync command knows is the blocks of cache
that haven't been written to disk and where those blocks should be
written on disk. At that level it should have no concept of the data
thats actually in that block of cache so it can't go I know if I write
the following 4 blocks to cache myfile.txt will be fully written to disk
and it might be safe to stop at this point. To further complicate this
say when you saved myfile.txt it put the data across 8 blocks of cache,
something may have decided even before you called sync that there was
some spare time and had already written 3 of those 8 blocks to cache. So
really the only way the OS can be sure that data isn't corrupted by the
sync process is to allow it to run completely (like the only way you can
be that data on your USB stick won't be corrupted when you pull it out
is to have something call sync which is what pressing the eject button
does).
...
If you are curious, I have been investigating a problem I have with
hibernation: sometimes it does not succeed, it stalls. When this
happens, I can not poweroff the machine, in the end I have to switch the
power off.
What I have found, is that the machine is, those times, is trying to
sync the filesystem and failing, precisely because texpire is running.
Probably leaving the machine there for half an hour would succeed - but
obviously, sometimes one is in a hurry to hibernate of poweroff (battery
running out, say), and I had no idea that waiting for half an hour might
work.
I have found out more. In my case, there is a dedicated partition to
/var/spool/news/ (formatted as reiserfs), and this partition goes 100%
busy during texpire. If I run "sync" fifteen minutes after texpire
finishes, it takes a minute to complete. A sync at other point in the
day, takes about half a minute.
The partition was mounted "relatime,lazytime". If I take out the
"lazytime" parameter, the sync completes in a second (except if texpire
is running).
My conclusion is that lazytime is broken in the case of reiserfs (or in
the case of news). The writing to disk is not delayed to an appropriate
time, it is delayed for ever.
I guess it might depend on what you call an appropriate time, its quite
possible that while texpire is running and causing significant disk
usage its deciding the appropriate time is after it has finished. You
have to remember this is mostly a powersaving feature for laptops so
that rather then spending lots of energy constantly spinning up a
rotating hard drive to write small amounts of data it somewhat waits
until your done (or the cache is full) and spins up the drive once to
write out all the data. By mounting with lazytime your saying I don't
care when in the future this happens on the other hand sometimes when
you hibernate your laptop you do care.

-- 
Simon Lees (Simotek)                            http://simotek.net

Emergency Update Team                           keybase.io/simotek
SUSE Linux                           Adelaide Australia, UTC+10:30
GPG Fingerprint: 5B87 DB9D 88DC F606 E489 CEC5 0922 C246 02F0 014B