
Hello, On Mon, 12 Apr 2021, Carlos E. R. wrote:
On 12/04/2021 03.47, David Haller wrote:
On Mon, 12 Apr 2021, Carlos E. R. wrote:
It is just a fact, it can not.
Situation.
The system is running the process "texpire", which is part of leafnode package, an nntp proxy server. texpire simply looks at every posts and deletes those that are old according to certain rules. This is very intensive on the disk, looking at 1.2 million files. It runs for half an hour.
If during that process I issue a "sync", it does not succeed till texpire ends. It can not be killed.
You have to kill texpire, as that's continuously creating new stuff to be synced.
But I don't want to. In this case, I want sync to abandon the attempt because I did a mistake calling "sync". And I wonder why it can not.
See below. I just thought that it might help to trigger an emergency sync via sysrq+s ... [..]
Anyway: all those functions (sync, fsync, fdatasync, syncfs) are syscalls and, as they're writing to storage, in uninterruptible sleep-state ('D' in ps/top etc.), and thus not "killable" from userland.
Ah. This is what I feared.
And if sync(1) is "in" the call to these functions, you can not kill it as well until that function returns control to the sync(1) process.
So, your only option is to kill anything that still causes more stuff to be synced and wait. Or shut the machine off hard[1] via sysrq+b, sysrq+o, the reset- or powerbutton ...
HTH, -dnh
[1] as the kernel is already syncing ...
The next (philosophical) question is why are those functions uninterruptible?
It could write an item of the cache, then another item, then the next... I see no "philosophical" reason why that can not be aborted. There must be a reason out there that I don't know.
I guess so that the filesystems are in a defined and consistent state. See fs/sync.c and mm/filemap.c for details.
Now, don't go investigating the code for me, it is just a curiosity :-)
Too late ;)
If you are curious, I have been investigating a problem I have with hibernation: sometimes it does not succeed, it stalls. When this happens, I can not poweroff the machine, in the end I have to switch the power off.
sysrq+r sysrq+e sysrq+i sysrq+s sysrq+u sysrq+o might help.
What I have found, is that the machine is, those times, is trying to sync the filesystem and failing, precisely because texpire is running. Probably leaving the machine there for half an hour would succeed - but obviously,
Yep.
sometimes one is in a hurry to hibernate of poweroff (battery running out, say), and I had no idea that waiting for half an hour might work.
Sure.
I have found out more. In my case, there is a dedicated partition to /var/spool/news/ (formatted as reiserfs), and this partition goes 100% busy during texpire. If I run "sync" fifteen minutes after texpire finishes, it takes a minute to complete. A sync at other point in the day, takes about half a minute.
I use a loop-mounted reiserfs image as news-spool for leafnode: $ df -h /var/spool/news/ Filesystem Size Used Avail Use% Mounted on /dev/loop0 8.0G 3.3G 4.8G 41% /var/spool/news I could probably shrink that a bit ;) This works nicely and sync(1)-ing while texpire is done in a couple of seconds (on rotating rust).
The partition was mounted "relatime,lazytime". If I take out the "lazytime" parameter, the sync completes in a second (except if texpire is running).
I just use 'rw,acl,user_xattr,barrier=flush'...
My conclusion is that lazytime is broken in the case of reiserfs (or in the case of news). The writing to disk is not delayed to an appropriate time, it is delayed for ever. leafnode, texpire, do a lot of timestamp changing, compared to other tools.
Probably. Reiserfs was always special ;) HTH, -dnh -- Linux is not a desktop OS for people whose VCRs are still flashing "12:00". -- Paul Tomblin