[Bug 328848] New: nscd has too many open files after network outage
https://bugzilla.novell.com/show_bug.cgi?id=328848 Summary: nscd has too many open files after network outage Product: openSUSE 10.3 Version: RC 1 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem AssignedTo: pbaudis@novell.com ReportedBy: mmarek@novell.com QAContact: qa@suse.de CC: schwab@novell.com Found By: --- Hi, nscd loops forever on my machine after a network outage and doesn't accept new connections: # strace -p 12078 Process 12078 attached - interrupt to quit accept(11, 0, NULL) = -1 EMFILE (Too many open files) epoll_wait(12, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29988) = 1 accept(11, 0, NULL) = -1 EMFILE (Too many open files) epoll_wait(12, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29988) = 1 accept(11, 0, NULL) = -1 EMFILE (Too many open files) epoll_wait(12, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29988) = 1 accept(11, 0, NULL) = -1 EMFILE (Too many open files) epoll_wait(12, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29988) = 1 accept(11, 0, NULL) = -1 EMFILE (Too many open files) .. Also I'm unable to get a gdb backtrace of nscd :-( gdb /usr/sbin/nscd 12078 GNU gdb 6.6.50.20070726-cvs Copyright (C) 2007 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"... Using host libthread_db library "/lib64/libthread_db.so.1". Attaching to program: /usr/sbin/nscd, process 12078 linux-nat.c:981: internal-error: linux_nat_attach: Assertion `pid == GET_PID (inferior_ptid) && WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) I'll keep the nscd process running for now in case you need more info. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=328848#c2
Petr Baudis
https://bugzilla.novell.com/show_bug.cgi?id=328848#c3
--- Comment #3 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c4
Petr Baudis
https://bugzilla.novell.com/show_bug.cgi?id=328848#c5
--- Comment #5 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c6
Ralf Haferkamp
Ralf, could this be the same fd leak we've hit in nss_ldap before? Was that supposed to be fixed in 10.3? You are talking about http://bugzilla.padl.com/show_bug.cgi?id=304? Yes, that should be fixed in 10.3.
@mmarek: Any hints how the problem can be reproduced? You mentioned that it all started after a network outage, what kind of outage was that? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=328848#c7
Michal Marek
@mmarek: Any hints how the problem can be reproduced? You mentioned that it all started after a network outage, what kind of outage was that?
One of our switches broke and had to be restarted. IIRC the link was physically up, but the packets didn't go either direction. I don't remember if nscd run amok already during the outage or after the reboot of the switch. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=328848#c8
Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=328848
Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c11
--- Comment #11 from Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=328848#c12
Petr Baudis
Hmm, but it should be able to process only max-threads requests in parallel, and usual ulimit on open files is 1024... Is that really 8 fd per fork? Or did you mean that the fd usage growth is not linear? Seems so. It seems I only had 32 threads running and nscd was hitting the
https://bugzilla.novell.com/show_bug.cgi?id=328848#c13
Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=328848#c14
--- Comment #14 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c15
--- Comment #15 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c16
--- Comment #16 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c17
--- Comment #17 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c18
--- Comment #18 from Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=328848#c19
--- Comment #19 from Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=328848#c20
--- Comment #20 from Ralf Haferkamp
Attachment 183482 [details] has full output of 'netstat -nap'.
Oh, sorry. Missed that. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=328848
Michal Marek
participants (1)
-
bugzilla_noreply@novell.com