On Sat, 21 Mar 2020 12:37:23 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'. That's the correct word according to https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html where it also notes: "In fact, if SIGKILL fails to terminate a process, that by itself constitutes an operating system bug which you should report." So I think Carlos should open a bugzilla.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
Yes, but that's the wrong answer. It might have been the remote system broke or was destroyed, for example, so it cannot be restored. And it's not what Carlos wants anyway. He wants his system to hibernate. And specifically he wants to be able to kill the mc process. Maybe he's assessed any data integrity issues and decided he doesn't care, or at least that it's the least worst option. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org