[Bug 1198877] New: Firefox with high CPU usage on network change event(?)
http://bugzilla.opensuse.org/show_bug.cgi?id=1198877 Bug ID: 1198877 Summary: Firefox with high CPU usage on network change event(?) Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Linux Status: NEW Severity: Normal Priority: P5 - None Component: Firefox Assignee: factory-mozilla@lists.opensuse.org Reporter: jengelh@inai.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- After about 2 hours of firefox 99 running (was fine with earlier versions i.e. 98/etc.), the subthread known as "Netlink Monitor" in /proc/xxx/stat goes into an infinite loop. strace shows: poll([{fd=22, events=POLLIN}, {fd=24, events=POLLIN}], 2, -1) = 1 ([{fd=24, revents=POLLERR}]) poll([{fd=22, events=POLLIN}, {fd=24, events=POLLIN}], 2, -1) = 1 ([{fd=24, revents=POLLERR}]) over and over again. lsof: firefox 30312 jengelh 22r FIFO 0,13 0t0 4037553 pipe firefox 30312 jengelh 24u netlink 0t0 4041789 ROUTE gdb: (gdb) bt #0 0x00007f5c92e8052f in poll () at /lib64/libc.so.6 #1 0x00007f5c88c8d68b in poll () at /usr/include/bits/poll2.h:39 #2 operator() () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/netwerk/system/netlink/NetlinkService.cpp:1205 #3 eintr_retry<mozilla::net::NetlinkService::Run()::<lambda()> > () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/netwerk/system/netlink/NetlinkService.cpp:45 #4 mozilla::net::NetlinkService::Run() () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/netwerk/system/netlink/NetlinkService.cpp:1203 #5 0x00007f5c8882b4a5 in nsThread::ProcessNextEvent(bool, bool*) () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/xpcom/threads/nsThread.cpp:1167 #6 0x00007f5c888043d8 in NS_ProcessNextEvent(nsIThread*, bool) () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/xpcom/threads/nsThreadUtils.cpp:467 #7 0x00007f5c88d5ae0d in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/ipc/glue/MessagePump.cpp:300 #8 0x00007f5c88d349b5 in MessageLoop::RunInternal() () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/ipc/chromium/src/base/message_loop.cc:331 #9 MessageLoop::RunHandler() () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/ipc/chromium/src/base/message_loop.cc:324 #10 MessageLoop::Run() () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/ipc/chromium/src/base/message_loop.cc:306 #11 0x00007f5c888238c9 in nsThread::ThreadFunc(void*) () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/xpcom/threads/nsThread.cpp:389 #12 0x00007f5c92c3e110 in () at /lib64/libnspr4.so #13 0x000055b617ce67ea in set_alt_signal_stack_and_start(PthreadCreateParams*) () at /usr/src/debug/MozillaFirefox-99.0.1-1.1.x86_64/toolkit/crashreporter/pthread_create_interposer/pthread_create_interposer.cpp:80 #14 0x00007f5c92e042ba in start_thread () at /lib64/libc.so.6 #15 0x00007f5c92e8e460 in clone3 () at /lib64/libc.so.6 code in question: 1193 while (!shutdown) { 1194 if (mOutgoingMessages.Length() && !mOutgoingMessages[0]->IsPending()) { 1195 if (!mOutgoingMessages[0]->Send(netlinkSocket)) { 1196 LOG(("Failed to send netlink message")); 1197 mOutgoingMessages.RemoveElementAt(0); 1198 // try to send another message if available before polling 1199 continue; 1200 } 1201 } 1202 1203 int rc = eintr_retry([&]() { 1204 AUTO_PROFILER_THREAD_SLEEP; 1205 return poll(fds, 2, GetPollWait()); 1206 }); 1207 1208 if (rc > 0) { 1209 if (fds[0].revents & POLLIN) { 1210 // shutdown, abort the loop! 1211 LOG(("thread shutdown received, dying...\n")); 1212 shutdown = true; 1213 } else if (fds[1].revents & POLLIN) { 1214 LOG(("netlink message received, handling it...\n")); 1215 OnNetlinkMessage(netlinkSocket); 1216 } 1217 } else if (rc < 0) { 1218 rv = NS_ERROR_FAILURE; 1219 break; 1220 } 1221 } stepping and repetition pattern: (gdb) n 1194 if (mOutgoingMessages.Length() && !mOutgoingMessages[0]->IsPending()) { (gdb) 1203 int rc = eintr_retry([&]() { (gdb) 1208 if (rc > 0) { (gdb) 1209 if (fds[0].revents & POLLIN) { (gdb) 1213 } else if (fds[1].revents & POLLIN) { (gdb) 1194 if (mOutgoingMessages.Length() && !mOutgoingMessages[0]->IsPending()) { As shown above by strace, there's a POLLERR condition on the netlink socket, and FF fails to handle it. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1198877 http://bugzilla.opensuse.org/show_bug.cgi?id=1198877#c1 --- Comment #1 from Jan Engelhardt <jengelh@inai.de> --- Well, it's an infinite loop by design. But it goes into high CPU use because FF does not handle the POLLERR case, proceeds to the next iteration right away, calling poll() again which immediately returns again with the same POLLERR. -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1198877 http://bugzilla.opensuse.org/show_bug.cgi?id=1198877#c2 Wolfgang Rosenauer <wolfgang@rosenauer.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wolfgang@rosenauer.org --- Comment #2 from Wolfgang Rosenauer <wolfgang@rosenauer.org> --- I do not see anything like this with FF 99 but my Tumbleweed wasn't updated for some days. Wondering if that is triggered by an external change? Especially as there just was a report yesterday in bug 1198817 which I also cannot reproduce. Does it sound related? -- You are receiving this mail because: You are the assignee for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1198877 http://bugzilla.opensuse.org/show_bug.cgi?id=1198877#c3 --- Comment #3 from Jan Engelhardt <jengelh@inai.de> --- I'd ask bug 1198817 to run strace just to confirm it's the same issue. I have IPv6 with frequent address changes (sysctl as follows): net.ipv6.conf.all.use_tempaddr=2 net.ipv6.conf.all.router_solicitation_interval=60 net.ipv6.conf.all.max_addresses=1000 net.ipv6.conf.all.temp_prefered_lft=900 net.ipv6.conf.eth0.max_addresses=1000 net.ipv6.conf.eth0.temp_prefered_lft=900 # ip mon route Deleted local xxxx:xxxx:xxxx:34ec:2ac2:f742:4e62:a58d dev eth0 table local proto kernel metric 0 pref medium local xxxx:xxxx:xxxx:34ec:670a:f284:d22a:b1dc dev eth0 table local proto kernel metric 0 pref medium (and this about every 15 minutes) Not sure how the POLLERR comes to be. There really is just one place in the Linux kernel that seems to bear any relevancey, (kernel)/net/core/datagram.c:datagram_poll:: /* exceptional events? */ if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue)) mask |= POLLERR; This itself is fed from: net/netlink/af_netlink.c: sk->sk_err = ENOBUFS; net/netlink/af_netlink.c: sk->sk_err = p->code; net/netlink/af_netlink.c: sk->sk_err = ENOBUFS; This goes to show that Netlink _can_ cause POLLERR itself and that POLLERR is not necesarily a result of, for example, erroneously calling poll() on a closed file descriptor. -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com