[opensuse] A user process blocks hibernation - what?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, This past night I could not hibernate my desktop machine, it refused. Looking at the log, I found a clue: <4.5> 2020-03-20 03:07:40 Telcontar sudo - - - root : TTY=pts/76 ; PWD=/root ; USER=root ; COMMAND=/usr/bin/chvt 10 <10.6> 2020-03-20 03:07:40 Telcontar sudo - - - pam_unix(sudo:session): session opened for user root by (uid=0) <10.6> 2020-03-20 03:07:41 Telcontar sudo - - - pam_unix(sudo:session): session closed for user root <4.5> 2020-03-20 03:07:46 Telcontar sudo - - - root : TTY=pts/76 ; PWD=/root ; USER=root ; COMMAND=/usr/bin/systemctl hibernate <10.6> 2020-03-20 03:07:46 Telcontar sudo - - - pam_unix(sudo:session): session opened for user root by (uid=0) <10.6> 2020-03-20 03:07:46 Telcontar sudo - - - pam_unix(sudo:session): session closed for user root <3.6> 2020-03-20 03:07:46 Telcontar systemd 1 - - Reached target Sleep. <3.6> 2020-03-20 03:07:46 Telcontar systemd 1 - - Starting Hibernate... <0.7> 2020-03-20 03:07:46 Telcontar kernel - - - [317397.413158] PM: Hibernation mode set to 'shutdown' <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - INFO: running /usr/lib/systemd/system-sleep/grub2.sleep for hibernate <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - INFO: Running prepare-grub .. <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - 2020-03-20 03:07:46+01:00 - Hibernating the system now... <3.4> 2020-03-20 03:07:46 Telcontar systemd-sh - - - Hibernating the system now... <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - service: no such service upsd.service <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - running kernel is grub menu entry Main_openSUSE (vmlinuz-4.12.14-lp151.28.40-default) <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - preparing boot-loader: selecting entry Main_openSUSE, kernel /boot/4.12.14-lp151.28.40-default <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - running /usr/sbin/grub2-once "Main_openSUSE" <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - time needed for sync: 0.5 seconds, time needed for grub: 0.2 seconds. <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - INFO: Done. <3.6> 2020-03-20 03:07:51 Telcontar systemd-sleep 14514 - - Suspending system... <0.4> 2020-03-20 03:07:51 Telcontar kernel - - - [317402.526273] PM: Tried to create trampoline again <0.6> 2020-03-20 03:07:51 Telcontar kernel - - - [317402.559008] PM: Syncing filesystems ... <3.4> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - The canary thread is apparently starving. Taking action. <3.6> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Demoting known real-time threads. <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5399 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5398 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5394 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Demoted 3 threads. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317403.182461] PM: done. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317403.182463] Freezing user space processes ... <0.3> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185120] Freezing of tasks failed after 20.001 seconds (2 tasks refusing to freeze, wq_busy=0): <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185147] pool D 0 14428 5381 0x00000004 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185150] Call Trace: <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185159] ? __schedule+0x27f/0x830 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185161] schedule+0x28/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185168] request_wait_answer+0x79/0x1e0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185172] ? wait_woken+0x80/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185176] __fuse_request_send+0x78/0x80 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185180] fuse_simple_request+0xbd/0x190 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185184] fuse_do_getattr+0xf3/0x2b0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185190] fuse_update_attributes+0x7a/0x90 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185198] vfs_statx+0x79/0xb0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185200] SYSC_newlstat+0x26/0x40 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185203] do_syscall_64+0x7b/0x160 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185204] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185206] RIP: 0033:0x7fc61ad0f535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185207] RSP: 002b:00007fc60d1e8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185209] RAX: ffffffffffffffda RBX: 00007fc60413d5c0 RCX: 00007fc61ad0f535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185209] RDX: 00007fc60d1e87a0 RSI: 00007fc60d1e87a0 RDI: 00007fc604172fd0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185210] RBP: 0000559c247e1590 R08: 00007fc6040aa860 R09: 00007fc61ad60470 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185211] R10: 0000000000100007 R11: 0000000000000246 R12: 00007fc604172fd0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185212] R13: 00007fc60d1e8920 R14: 00007fc604172fd0 R15: 00007fc604017110 <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185253] mc D 0 3425 1 0x00000004 <======================= <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185255] Call Trace: <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185257] ? __schedule+0x27f/0x830 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185259] schedule+0x28/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185261] request_wait_answer+0x110/0x1e0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185263] ? wait_woken+0x80/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185265] __fuse_request_send+0x78/0x80 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185268] fuse_simple_request+0xbd/0x190 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185271] fuse_do_getattr+0xf3/0x2b0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185275] fuse_update_attributes+0x7a/0x90 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185276] vfs_statx+0x79/0xb0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185278] SYSC_newlstat+0x26/0x40 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185279] do_syscall_64+0x7b/0x160 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185281] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185282] RIP: 0033:0x7f9d3c1b1535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185282] RSP: 002b:00007fff021b3808 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185284] RAX: ffffffffffffffda RBX: 00007fff021b4940 RCX: 00007f9d3c1b1535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185285] RDX: 00007fff021b3840 RSI: 00007fff021b3840 RDI: 00007fff021b3940 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185286] RBP: 00007fff021b3910 R08: 0000000000000000 R09: 00007f9d3c202290 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185286] R10: 00007f9d3c1ffe30 R11: 0000000000000246 R12: 00007fff021b3940 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185287] R13: 000056285945211a R14: 0000562859452124 R15: 00007fff021b3954 <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185291] OOM killer enabled. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185292] Restarting tasks ... done. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE <3.3> 2020-03-20 03:08:12 Telcontar systemd 1 - - Failed to start Hibernate. <3.4> 2020-03-20 03:08:12 Telcontar systemd 1 - - Dependency failed for Hibernate. <4.6> 2020-03-20 03:08:12 Telcontar systemd-logind 1557 - - Operation 'sleep' finished. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - hibernate.target: Job hibernate.target/start failed with result 'dependency'. <3.6> 2020-03-20 03:08:12 Telcontar systemd 1 - - sleep.target: Unit not needed anymore. Stopping. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Unit entered failed state. <3.4> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Failed with result 'exit-code'. <3.6> 2020-03-20 03:08:12 Telcontar systemd 1 - - Stopped target Sleep. The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool'). I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead. I think that mc was blocked because it had open a remote directory externally: sshfs cer@192.168.1.134:/ ~/fusermount/ and that other machine had been hibernated a minute before. How can it be that a plebeian app stops the almighty kernel in its tracks? - -- Cheers Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnSa+Bwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVY7kAmwQyudMUHLBapNUWQTAU RrOXoQ4dAJ9kHsU9bLDY79jb7fozH2sASLOOlw== =Vbxz -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
20.03.2020 13:29, Carlos E. R. пишет:
Hi,
This past night I could not hibernate my desktop machine, it refused. Looking at the log, I found a clue:
<4.5> 2020-03-20 03:07:40 Telcontar sudo - - - root : TTY=pts/76 ; PWD=/root ; USER=root ; COMMAND=/usr/bin/chvt 10 <10.6> 2020-03-20 03:07:40 Telcontar sudo - - - pam_unix(sudo:session): session opened for user root by (uid=0) <10.6> 2020-03-20 03:07:41 Telcontar sudo - - - pam_unix(sudo:session): session closed for user root <4.5> 2020-03-20 03:07:46 Telcontar sudo - - - root : TTY=pts/76 ; PWD=/root ; USER=root ; COMMAND=/usr/bin/systemctl hibernate <10.6> 2020-03-20 03:07:46 Telcontar sudo - - - pam_unix(sudo:session): session opened for user root by (uid=0) <10.6> 2020-03-20 03:07:46 Telcontar sudo - - - pam_unix(sudo:session): session closed for user root <3.6> 2020-03-20 03:07:46 Telcontar systemd 1 - - Reached target Sleep. <3.6> 2020-03-20 03:07:46 Telcontar systemd 1 - - Starting Hibernate... <0.7> 2020-03-20 03:07:46 Telcontar kernel - - - [317397.413158] PM: Hibernation mode set to 'shutdown' <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - INFO: running /usr/lib/systemd/system-sleep/grub2.sleep for hibernate <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - INFO: Running prepare-grub .. <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - 2020-03-20 03:07:46+01:00 - Hibernating the system now... <3.4> 2020-03-20 03:07:46 Telcontar systemd-sh - - - Hibernating the system now... <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - service: no such service upsd.service <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - running kernel is grub menu entry Main_openSUSE (vmlinuz-4.12.14-lp151.28.40-default) <3.6> 2020-03-20 03:07:46 Telcontar systemd-sleep 14514 - - preparing boot-loader: selecting entry Main_openSUSE, kernel /boot/4.12.14-lp151.28.40-default <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - running /usr/sbin/grub2-once "Main_openSUSE" <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - time needed for sync: 0.5 seconds, time needed for grub: 0.2 seconds. <3.6> 2020-03-20 03:07:47 Telcontar systemd-sleep 14514 - - INFO: Done. <3.6> 2020-03-20 03:07:51 Telcontar systemd-sleep 14514 - - Suspending system... <0.4> 2020-03-20 03:07:51 Telcontar kernel - - - [317402.526273] PM: Tried to create trampoline again <0.6> 2020-03-20 03:07:51 Telcontar kernel - - - [317402.559008] PM: Syncing filesystems ... <3.4> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - The canary thread is apparently starving. Taking action. <3.6> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Demoting known real-time threads. <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5399 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5398 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Successfully demoted thread 5394 of process 5394 (/usr/bin/pulseaudio). <3.5> 2020-03-20 03:08:12 Telcontar rtkit-daemon 5395 - - Demoted 3 threads. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317403.182461] PM: done. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317403.182463] Freezing user space processes ... <0.3> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185120] Freezing of tasks failed after 20.001 seconds (2 tasks refusing to freeze, wq_busy=0): <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185147] pool D 0 14428 5381 0x00000004 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185150] Call Trace: <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185159] ? __schedule+0x27f/0x830 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185161] schedule+0x28/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185168] request_wait_answer+0x79/0x1e0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185172] ? wait_woken+0x80/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185176] __fuse_request_send+0x78/0x80 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185180] fuse_simple_request+0xbd/0x190 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185184] fuse_do_getattr+0xf3/0x2b0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185190] fuse_update_attributes+0x7a/0x90 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185198] vfs_statx+0x79/0xb0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185200] SYSC_newlstat+0x26/0x40 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185203] do_syscall_64+0x7b/0x160 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185204] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185206] RIP: 0033:0x7fc61ad0f535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185207] RSP: 002b:00007fc60d1e8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185209] RAX: ffffffffffffffda RBX: 00007fc60413d5c0 RCX: 00007fc61ad0f535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185209] RDX: 00007fc60d1e87a0 RSI: 00007fc60d1e87a0 RDI: 00007fc604172fd0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185210] RBP: 0000559c247e1590 R08: 00007fc6040aa860 R09: 00007fc61ad60470 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185211] R10: 0000000000100007 R11: 0000000000000246 R12: 00007fc604172fd0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185212] R13: 00007fc60d1e8920 R14: 00007fc604172fd0 R15: 00007fc604017110 <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185253] mc D 0 3425 1 0x00000004 <======================= <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185255] Call Trace: <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185257] ? __schedule+0x27f/0x830 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185259] schedule+0x28/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185261] request_wait_answer+0x110/0x1e0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185263] ? wait_woken+0x80/0x80 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185265] __fuse_request_send+0x78/0x80 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185268] fuse_simple_request+0xbd/0x190 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185271] fuse_do_getattr+0xf3/0x2b0 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185275] fuse_update_attributes+0x7a/0x90 [fuse] <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185276] vfs_statx+0x79/0xb0 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185278] SYSC_newlstat+0x26/0x40 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185279] do_syscall_64+0x7b/0x160 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185281] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185282] RIP: 0033:0x7f9d3c1b1535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185282] RSP: 002b:00007fff021b3808 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185284] RAX: ffffffffffffffda RBX: 00007fff021b4940 RCX: 00007f9d3c1b1535 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185285] RDX: 00007fff021b3840 RSI: 00007fff021b3840 RDI: 00007fff021b3940 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185286] RBP: 00007fff021b3910 R08: 0000000000000000 R09: 00007f9d3c202290 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185286] R10: 00007f9d3c1ffe30 R11: 0000000000000246 R12: 00007fff021b3940 <0.4> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185287] R13: 000056285945211a R14: 0000562859452124 R15: 00007fff021b3954 <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185291] OOM killer enabled. <0.6> 2020-03-20 03:08:12 Telcontar kernel - - - [317423.185292] Restarting tasks ... done. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE <3.3> 2020-03-20 03:08:12 Telcontar systemd 1 - - Failed to start Hibernate. <3.4> 2020-03-20 03:08:12 Telcontar systemd 1 - - Dependency failed for Hibernate. <4.6> 2020-03-20 03:08:12 Telcontar systemd-logind 1557 - - Operation 'sleep' finished. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - hibernate.target: Job hibernate.target/start failed with result 'dependency'. <3.6> 2020-03-20 03:08:12 Telcontar systemd 1 - - sleep.target: Unit not needed anymore. Stopping. <3.5> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Unit entered failed state. <3.4> 2020-03-20 03:08:12 Telcontar systemd 1 - - systemd-hibernate.service: Failed with result 'exit-code'. <3.6> 2020-03-20 03:08:12 Telcontar systemd 1 - - Stopped target Sleep.
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote: ...
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses. - -- Cheers, Carlos E. R. (from openSUSE 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnStIBwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVdiUAnR0zJDIru/XWIxSdYHXe ky/P/eZyAJ9AficEFNqH1Ycgt2u+09rW+NJxRg== =eWEm -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete. -- Per Jessen, Zürich (17.8°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <per@computer.org> wrote:
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
It sounds like a pretty severe bug to enter kernel mode, disable interrupts and then wait for some network event? And indeed a problem in the overall system architecture if it permits of such bugs! Or am I missing something? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 20/03/2020 16.25, Dave Howorth wrote:
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <> wrote:
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
It sounds like a pretty severe bug to enter kernel mode, disable interrupts and then wait for some network event? And indeed a problem in the overall system architecture if it permits of such bugs!
Or am I missing something?
Same here. I still think the kernel should have control of everything, destroy all resources assigned to the process, and of course, keep the power to hibernate. If a process can't, ask the user. Ok, f**k that process. And then, there was the issue that apparently caused this: mc had a directory opened, that happened to be a remote directory. The other machine had hibernated and thus dissapeared from the network. Why be stuck as unkillable? It is a normal life occurrence, for another computer to disappear. The local process should still respond, even if the remote machine is gone and not responding. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
Carlos E. R. wrote:
On 20/03/2020 16.25, Dave Howorth wrote:
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <> wrote:
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
The user run application 'mc' (Midnight Commander) was blocking hibernation (and now I see there was another application named 'pool').
I tried to kill 'mc' with killall -9. It still refused. I killed the terminal that had it, no way. In the end, I had to poweroff the machine instead.
I think that mc was blocked because it had open a remote directory externally:
sshfs cer@192.168.1.134:/ ~/fusermount/
and that other machine had been hibernated a minute before.
How can it be that a plebeian app stops the almighty kernel in its tracks?
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ? You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal. -- Per Jessen, Zürich (13.4°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
Carlos E. R. wrote:
On 20/03/2020 16.25, Dave Howorth wrote:
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <> wrote:
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
> > The user run application 'mc' (Midnight Commander) was blocking > hibernation (and now I see there was another application named > 'pool'). > > I tried to kill 'mc' with killall -9. It still refused. I > killed the terminal that had it, no way. In the end, I had to > poweroff the machine instead. > > I think that mc was blocked because it had open a remote > directory externally: > > sshfs cer@192.168.1.134:/ ~/fusermount/ > > and that other machine had been hibernated a minute before. > > > > How can it be that a plebeian app stops the almighty kernel in > its tracks? >
Both threads are in kernel mode and as you yourself said cannot be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no? Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 20/03/2020 20.55, Dave Howorth wrote:
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
Carlos E. R. wrote:
On 20/03/2020 16.25, Dave Howorth wrote:
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <> wrote:
Carlos E. R. wrote:
On Friday, 2020-03-20 at 14:12 +0300, Andrei Borzenkov wrote:
>> >> The user run application 'mc' (Midnight Commander) was blocking >> hibernation (and now I see there was another application named >> 'pool'). >> >> I tried to kill 'mc' with killall -9. It still refused. I >> killed the terminal that had it, no way. In the end, I had to >> poweroff the machine instead. >> >> I think that mc was blocked because it had open a remote >> directory externally: >> >> sshfs cer@192.168.1.134:/ ~/fusermount/ >> >> and that other machine had been hibernated a minute before. >> >> >> >> How can it be that a plebeian app stops the almighty kernel in >> its tracks? >> > > Both threads are in kernel mode and as you yourself said cannot > be interrupted. So there is little kernel can do.
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
When one process blocks interrupts, my teachers told me it applied to the entire computer, all processes. Nothing, not even the kernel, can intervene. The keyboard gets blocked, the clock interrupts gets blocked. It is impossible to block interrupts for minutes - and the entire machine was responsive - except a single process. I also do not understand how a user process can block interrupts, that should be reserved to the kernel.
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
Right. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
Carlos E. R. wrote:
On 20/03/2020 20.55, Dave Howorth wrote:
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
Carlos E. R. wrote:
On 20/03/2020 16.25, Dave Howorth wrote:
On Fri, 20 Mar 2020 13:21:17 +0100 Per Jessen <> wrote:
Carlos E. R. wrote:
> Sorry, I still do not understand why a user process such as mc > can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
When one process blocks interrupts, my teachers told me it applied to the entire computer, all processes. Nothing, not even the kernel, can intervene. The keyboard gets blocked, the clock interrupts gets blocked.
Interrupts are blocked per process. Very basic example: I might have a daemon I want to ignore Ctrl-C (SIGINT), so I block it. Does not mean any other process should also ignore it.
It is impossible to block interrupts for minutes
It is entirely possible to block interrupts for minutes.
I also do not understand how a user process can block interrupts, that should be reserved to the kernel.
Because it was in kernel mode, probably in a kernel driver. In your case, maybe some filesystem code. A user process does not always remain in user mode, it needs services from the kernel, for instance to do I/O. -- Per Jessen, Zürich (10.7°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-03-21 a las 09:41 +0100, Per Jessen escribió:
Carlos E. R. wrote:
> A user process can enter kernel mode - this one did, and then > disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
When one process blocks interrupts, my teachers told me it applied to the entire computer, all processes. Nothing, not even the kernel, can intervene. The keyboard gets blocked, the clock interrupts gets blocked.
Interrupts are blocked per process. Very basic example: I might have a daemon I want to ignore Ctrl-C (SIGINT), so I block it. Does not mean any other process should also ignore it.
That's totally news to me. :-o But control c is signal issued by the kernel, not a hardware interrupt that some code has to handle, in order to read the hardware keyboard interface. I'm talking of things like INT 01, INT 02, etc. Some hardware puts a high voltage on a line, and the CPU halts completely and jumps to a predefined address. Hardware.
It is impossible to block interrupts for minutes
It is entirely possible to block interrupts for minutes.
And then the entire kernel goes kaput. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnYMzRwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVbaYAn2iQp9wO5XSNQZ9t331M 7nx/ShOtAKCPZmc67pJOTM3PoVLm+PXURFfoyg== =zHJ5 -----END PGP SIGNATURE-----
Carlos E. R. wrote:
El 2020-03-21 a las 09:41 +0100, Per Jessen escribió:
Interrupts are blocked per process. Very basic example: I might have a daemon I want to ignore Ctrl-C (SIGINT), so I block it. Does not mean any other process should also ignore it.
That's totally news to me. :-o
But control c is signal issued by the kernel, not a hardware interrupt that some code has to handle, in order to read the hardware keyboard interface. I'm talking of things like INT 01, INT 02, etc. Some hardware puts a high voltage on a line, and the CPU halts completely and jumps to a predefined address. Hardware.
Which has no bearing on the problem you have described. You said "I tried to kill 'mc' with killall -9. It still refused. " - that means SIGKILL has been disabled (only possible in kernel mode).
It is impossible to block interrupts for minutes
It is entirely possible to block interrupts for minutes.
And then the entire kernel goes kaput.
Uh, no. It just waits - as you have found out. -- Per Jessen, Zürich (10.9°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-03-21 a las 14:03 +0100, Per Jessen escribió:
Carlos E. R. wrote:
El 2020-03-21 a las 09:41 +0100, Per Jessen escribió:
Interrupts are blocked per process. Very basic example: I might have a daemon I want to ignore Ctrl-C (SIGINT), so I block it. Does not mean any other process should also ignore it.
That's totally news to me. :-o
But control c is signal issued by the kernel, not a hardware interrupt that some code has to handle, in order to read the hardware keyboard interface. I'm talking of things like INT 01, INT 02, etc. Some hardware puts a high voltage on a line, and the CPU halts completely and jumps to a predefined address. Hardware.
Which has no bearing on the problem you have described. You said "I tried to kill 'mc' with killall -9. It still refused. " - that means SIGKILL has been disabled (only possible in kernel mode).
It is impossible to block interrupts for minutes
It is entirely possible to block interrupts for minutes.
And then the entire kernel goes kaput.
Uh, no. It just waits - as you have found out.
Not on hardware interrupts, which is what I was thinking about. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnYUFhwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfV29wAoJg0jCBGrUlSpfX+C4On ol7K5yk6AJ9AjMtupdxO1lEzCKNNuV1X3lSfJA== =2kdp -----END PGP SIGNATURE-----
Dave Howorth wrote:
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
Sorry, I still do not understand why a user process such as mc can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-) It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ? I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection. -- Per Jessen, Zürich (10.6°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
> Sorry, I still do not understand why a user process such as mc > can not be destroyed on order. No excuses.
A user process can enter kernel mode - this one did, and then disabled interrupts. I.e. it has to complete.
Disabled interrupts? But all the processes were working, only this one was stuck. My training said that when interrupts were disabled, noone got access to them.
I don't understand a word of that ?
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though). Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services. Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly. -- Per Jessen, Zürich (10.1°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-03-21 a las 12:37 +0100, Per Jessen escribió:
Dave Howorth wrote:
...
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
If the connection dies, it dies. So, end the whatever is doing no matter what, there is no recovering. And in fact, it was doing nothing, that terminal had not been used in hours.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
That's a terrible solution. In this case, I might have done it. What if the other machine is remote? But what if it is a laptop with a dying battery? If it refuses to hibernate the battery goes and all data in all processes is lost, which is much worse than a single mc process not exiting cleanly. I see no excuses for not hibernating no matter what. Poweroff succeded fast, it found no excuses to not power off. But of course, all possible data in everything is lost. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnYP4Rwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVbAMAn3a6Aae6nPcgeIjUPw67 xplkc1iAAKCDhRGMt75alD6BOL+ClCZsERQj6g== =8os2 -----END PGP SIGNATURE-----
Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
El 2020-03-21 a las 12:37 +0100, Per Jessen escribió:
Dave Howorth wrote:
...
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
If the connection dies, it dies. So, end the whatever is doing no matter what, there is no recovering.
You're guessing. What you describe works perfectly fine with NFS, for instance.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
That's a terrible solution. In this case, I might have done it. What if the other machine is remote?
Unless you have a way of waking it up remotely, don't hibernate it. -- Per Jessen, Zürich (8.0°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-03-21 a las 15:46 +0100, Per Jessen escribió:
Carlos E. R. wrote:
El 2020-03-21 a las 12:37 +0100, Per Jessen escribió:
Dave Howorth wrote:
...
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
If the connection dies, it dies. So, end the whatever is doing no matter what, there is no recovering.
You're guessing. What you describe works perfectly fine with NFS, for instance.
IF the other machine goes up again. I was not going to restore that machine.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
That's a terrible solution. In this case, I might have done it. What if the other machine is remote?
Unless you have a way of waking it up remotely, don't hibernate it.
Power failure, reboot, maintenance... things happen. The other machine is manned, and its owner decides to hibernate it that moment, after hours of doing nothing and idling. Those are excuses, a process has to cope with network failure. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnZoVhwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVuGwAn0Wy7+KV0BJh13hO+kuT OOPqqjuxAJ92XVrh3Rd1FaKWnW350glSGM8J8w== =mFL0 -----END PGP SIGNATURE-----
On Sat, 21 Mar 2020 12:37:23 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'. That's the correct word according to https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html where it also notes: "In fact, if SIGKILL fails to terminate a process, that by itself constitutes an operating system bug which you should report." So I think Carlos should open a bugzilla.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
Yes, but that's the wrong answer. It might have been the remote system broke or was destroyed, for example, so it cannot be restored. And it's not what Carlos wants anyway. He wants his system to hibernate. And specifically he wants to be able to kill the mc process. Maybe he's assessed any data integrity issues and decided he doesn't care, or at least that it's the least worst option. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Sat, 21 Mar 2020 12:37:23 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'.
I guess I tend to think of signals causing interrupts, i.e. asynchronous execution of code. Signals can be blocked. Yes, I use the two words interchangeably, mea culpa. -- Per Jessen, Zürich (7.8°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
Dave Howorth wrote:
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'.
I guess I tend to think of signals causing interrupts, i.e. asynchronous execution of code.
Wrong way around. An interrupt, e.g. an IO operation completing, causes a signal to be delivered to the process. The process can be set up to ignore or to handle the signal.
Signals can be blocked. Yes, I use the two words interchangeably, mea culpa.
I like this definition, if anyone is interested - https://techterms.com/definition/interrupt "An interrupt is a signal sent to the processor that interrupts the current process. It may be generated by a hardware device or a software program." Anyway, we're bordering on off-topic. -- Per Jessen, Zürich (7.5°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Content-ID: <alpine.LSU.2.21.2003212026470.10293@Legolas.valinor> El 2020-03-21 a las 14:41 -0000, Dave Howorth escribió:
On Sat, 21 Mar 2020 12:37:23 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
...
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'.
Yes, when I think of interrupt I do of the pin in the CPU with that name. With variations: a single one, or one normal and another that can not be masked, or numbered interrupts by writing a number in some bus, specific or not, at the same time of after lifting the IRQ line. Not of the strange concept that Microsoft used in MsDos with numbered software interrupts, with support from the CPU. Could have been called predefined subrutiine table or something. It confuses the hell out of me, sorry.
That's the correct word according to https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html where it also notes:
"In fact, if SIGKILL fails to terminate a process, that by itself constitutes an operating system bug which you should report."
So I think Carlos should open a bugzilla.
Ok, will do, thanks, if the log survived. The machine is being migrated, so the log may be in the new or the old machine, dunno. I can't access it now.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
Yes, but that's the wrong answer. It might have been the remote system broke or was destroyed, for example, so it cannot be restored. And it's not what Carlos wants anyway. He wants his system to hibernate. And specifically he wants to be able to kill the mc process. Maybe he's assessed any data integrity issues and decided he doesn't care, or at least that it's the least worst option.
There is no filesystem data integrity issue, from my point of view. The terminal where mc was "running" had not been used for hours. There was no activity. The use case is simply I was going to sleep, I was sleepy already, and not in the mood to fight a computer refusing to hibernate for 3 times in a row, getting cold in my pijamas. So I issue the command on both machines, as nearly the same time as keyboarding the command on both. The new machine is faster and I typed there first, anyway, so it went down fast. Meaning, at that time I'm not considering remembering what network connections I may have opened. In fact, it is very possble there are ssh sessions in any direction. I never care about them, unless I want the history to be saved, I just hibernate. The next day the sessions are duly dead. Consider a laptop and clossing the lid. Would it be acceptable it not going to sleep inmediately, and running the battery out? The kernel has to suspend the machine no matter what, no excusses accepted. What if the laptop goes into the backback and then catches fire? I'm not imagining things, it has happened, albeit with Windows in the cases I heard. It is not acceptable that a machine does not hibernate on order. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnZtHhwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVjMoAnjkAAzTgtHEIQq6KJE+X kGGcf59GAKCP+vNBJi3eGLGMrT6dwzv+JzoFuQ== =JZyd -----END PGP SIGNATURE-----
On Sat, 21 Mar 2020 20:38:05 +0100 (CET) "Carlos E. R." <robin.listas@telefonica.net> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Content-ID: <alpine.LSU.2.21.2003212026470.10293@Legolas.valinor>
El 2020-03-21 a las 14:41 -0000, Dave Howorth escribió:
On Sat, 21 Mar 2020 12:37:23 +0100 Per Jessen <per@computer.org> wrote:
Dave Howorth wrote:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <per@computer.org> wrote:
...
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Yes, I'm looking at it as being "sat" inside a process. Hardware interrupts are usually not serviced by a process (kernel or user), but by an interrupt handler which then queues whatever it is (for processing). (I'm not sure how HPET interrupts are handled though).
Carlos' 'midnight commander' is just a process, accessing the fuse filesystem that is mounted with sshfs. As it has disabled SIGKILL, it must be in kernel mode. I think disabling SIGKILL can only be interpreted to mean "this _must_ complete, to avoid corrupting data".
OK, I think the difficulty we've had is that you've been using the word 'interrupt' when you should have been using the word 'signal'.
Yes, when I think of interrupt I do of the pin in the CPU with that name. With variations: a single one, or one normal and another that can not be masked, or numbered interrupts by writing a number in some bus, specific or not, at the same time of after lifting the IRQ line. Not of the strange concept that Microsoft used in MsDos with numbered software interrupts, with support from the CPU. Could have been called predefined subrutiine table or something. It confuses the hell out of me, sorry.
That's the correct word according to https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html where it also notes:
"In fact, if SIGKILL fails to terminate a process, that by itself constitutes an operating system bug which you should report."
So I think Carlos should open a bugzilla.
Ok, will do, thanks, if the log survived. The machine is being migrated, so the log may be in the new or the old machine, dunno. I can't access it now.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
A FUSE driver also has to use kernel services.
Going back to the very first post, I think the situation could have been remedied by resuming the machine at 192.168.1.134. Now Carlos' 'mc' would have been able to complete the "must complete" code and exit cleanly.
Yes, but that's the wrong answer. It might have been the remote system broke or was destroyed, for example, so it cannot be restored. And it's not what Carlos wants anyway. He wants his system to hibernate. And specifically he wants to be able to kill the mc process. Maybe he's assessed any data integrity issues and decided he doesn't care, or at least that it's the least worst option.
There is no filesystem data integrity issue, from my point of view. The terminal where mc was "running" had not been used for hours. There was no activity.
The use case is simply I was going to sleep, I was sleepy already, and not in the mood to fight a computer refusing to hibernate for 3 times in a row, getting cold in my pijamas. So I issue the command on both machines, as nearly the same time as keyboarding the command on both. The new machine is faster and I typed there first, anyway, so it went down fast.
Meaning, at that time I'm not considering remembering what network connections I may have opened. In fact, it is very possble there are ssh sessions in any direction. I never care about them, unless I want the history to be saved, I just hibernate. The next day the sessions are duly dead.
Consider a laptop and clossing the lid. Would it be acceptable it not going to sleep inmediately, and running the battery out? The kernel has to suspend the machine no matter what, no excusses accepted. What if the laptop goes into the backback and then catches fire? I'm not imagining things, it has happened, albeit with Windows in the cases I heard.
It is not acceptable that a machine does not hibernate on order.
Exactly so.
- -- Cheers Carlos E. R.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-03-21 a las 11:15 -0000, Dave Howorth escribió:
On Sat, 21 Mar 2020 09:28:41 +0100 Per Jessen <> wrote:
Dave Howorth wrote:
On Fri, 20 Mar 2020 19:58:44 +0100 Per Jessen <per@computer.org> wrote:
I don't understand a word of that ?
You have one process which disabled interrupts whilst in some bit of kernel code, maybe a driver, who knows. Disabling interrupts just means a bit of code that must complete without any asynchronous calls happening. Most probably to guarantee data integrity. It's perfectly normal.
Right, but kernel code that suspends interrupts is not supposed to persist indefinitely and should have been QAd by kernel devs, no?
No and maybe, in that order :-)
It _is_ supposed to suspend indefinitely, but usually not for very long. (in the order of microseconds probably). Yes, it probably has been QAed and shown to work fine.
Ah, I think I understand. When the term 'interrupt' is used, Carlos and I think of a hardware capability. I gather you're thinking of an emulated software capability.
Indeed I think of hardware. I'm basically a hardware guy, my training is in electronics.
Plus as Carlos says, since when has a network connection disappearing been unexpected and have any effect on data integrity?
A network filesystem mount ?
I have a number of systems running with root on NFS, root is always mounted with "hard,intr". That means "wait forever" in the case of loss of the connection.
But in that case the mount is not done by a user program (mc in Carlos' case) via FUSE
The "mount" was done outside of 'mc' using sshfs, because 'mc' internal method has been broken for years. Still, fuse and userland. Maybe had I thought of it, I might have killed the sshfs process instead. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXnYOLRwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVaE0An0mUujwW1kdTezWLlRGA vC0F+qjuAJ98kOFZ7KUI/p/5VskcR/9mI2cuxQ== =pzcp -----END PGP SIGNATURE-----
participants (4)
-
Andrei Borzenkov
-
Carlos E. R.
-
Dave Howorth
-
Per Jessen