Re: [opensuse-programming] heisenbug ...
Roger Oberholtzer wrote:
On Tue, Aug 9, 2016 at 11:33 AM, Per Jessen <per@computer.org> wrote:
This morning I came across a daemon that was apparently hanging on select() - when I attached an strace:
# strace -p 836 Process 836 attached - interrupt to quit select(6, [4 5], [], NULL, NULL) = -1 EBADF (Bad file descriptor) time(NULL) = 1470735061 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1920, ...}) = 0 send(3, "<20>Aug 9 11:31:01 bwbemag[836]"..., 79, MSG_NOSIGNAL) = -1 ECONNREFUSED (Connection refused) close(3) = 0 socket(PF_FILE, SOCK_DGRAM, 0) = 3 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 connect(3, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0 send(3, "<20>Aug 9 11:31:01 bwbemag[836]"..., 79, MSG_NOSIGNAL) = 79 close(5) = -1 EBADF (Bad file descriptor) close(5) = -1 EBADF (Bad file descriptor) futex(0x804e3a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x804e36c, 0) = 1 futex(0xb7e06bd8, FUTEX_WAIT, 837, NULL) = 0 time(NULL) = 1470735061 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1920, ...}) = 0 send(3, "<21>Aug 9 11:31:01 bwbemag[836]"..., 53, MSG_NOSIGNAL) = 53 exit_group(0) = ? Process 836 detached
Does anyone have a suggestion as to why this was hanging instead of just having exited long ago? I think it's been hanging on that select() for days or even months.
It is not hanging so much as the select call is never returning that there is activity on the monitored ports.
Which would be fine except it suddenly returns "Bad file descriptor" when I attach strace.
Perhaps file descriptor 4 or 5 is closed? You can check what they are in /proc/836/fd Are both 4 and 5 still open?
The process has exited :-( -- To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-programming+owner@opensuse.org
participants (1)
-
Per Jessen