Hi, Not even sure as to how to describe the issue or where to start looking. What is happening appears to be an issue with oom or process limits, but I have no idea how to nail this down. My work habbits have not changed, meaning compared to when I was running Leap 42.2 I have a many browser tabs open as before, a lot, and as many terminal and editor windows open as well, again, a lot of those. The symptoms I am seeing are - systemd-coredump process running off and on and when it runs it sucks up 100% cpu - I get "no more process" errors when trying to open a new terminal window - Rhythembox appears to be the mos frequent victim and it gets killed or dies dmesg does not have any "trace" information, but there is a message pointing to some kind of trouble: [178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) = 002777f2 Anyway, the whole thing is annoying and I'd sure like to figure out what's going on. Thoughts? Thanks, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On 09/06/2017 04:49 PM, Robert Schweikert wrote:
Hi,
Not even sure as to how to describe the issue or where to start looking.
What is happening appears to be an issue with oom or process limits, but I have no idea how to nail this down. My work habbits have not changed, meaning compared to when I was running Leap 42.2 I have a many browser tabs open as before, a lot, and as many terminal and editor windows open as well, again, a lot of those.
The symptoms I am seeing are
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu - I get "no more process" errors when trying to open a new terminal window - Rhythembox appears to be the mos frequent victim and it gets killed or dies
dmesg does not have any "trace" information, but there is a message pointing to some kind of trouble:
[178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) = 002777f2
Anyway, the whole thing is annoying and I'd sure like to figure out what's going on.
And to follow up with a bit more information, I find lots of these in the system log: Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers. Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown. Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths. Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID 11897 (kill). Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session): session closed for user root But well I am not logged in as root, i.e. I am not constantly logging in and out. I do have a terminal window open where I am root, but that's it. Or maybe I get one of these blocks every time I run osc build? Help is appreciated, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On 2017-09-06 23:13, Robert Schweikert wrote:
On 09/06/2017 04:49 PM, Robert Schweikert wrote:
The symptoms I am seeing are
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu
This means that some other process has crashed, and systemd-coredump is collecting and compacting its garbage. You have to find out what that other process is. Run "coredumpctl" and it will tell you the list. In "journalctl you will find some info around the times listed above.
- I get "no more process" errors when trying to open a new terminal window - Rhythembox appears to be the mos frequent victim and it gets killed or dies
dmesg does not have any "trace" information, but there is a message pointing to some kind of trouble:
[178155.684433] Corrupted low memory at ffff880000004000 (4000 phys) = 002777f2
Probably unrelated.
Anyway, the whole thing is annoying and I'd sure like to figure out what's going on.
And to follow up with a bit more information, I find lots of these in the system log:
Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers. Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown. Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths. Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID 11897 (kill). Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session): session closed for user root
Irrelevant. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" (Minas Tirith))
07.09.2017 00:13, Robert Schweikert пишет: ...
And to follow up with a bit more information, I find lots of these in the system log:
Sep 06 16:30:01 mountain systemd[1]: Stopping User Manager for UID 0... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Default. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Basic System. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Sockets. Sep 06 16:30:01 mountain systemd[11849]: Stopped target Timers. Sep 06 16:30:01 mountain systemd[11849]: Reached target Shutdown. Sep 06 16:30:01 mountain systemd[11849]: Starting Exit the Session... Sep 06 16:30:01 mountain systemd[11849]: Stopped target Paths. Sep 06 16:30:01 mountain systemd[11849]: Received SIGRTMIN+24 from PID 11897 (kill). Sep 06 16:30:01 mountain systemd[11852]: pam_unix(systemd-user:session): session closed for user root
But well I am not logged in as root, i.e. I am not constantly logging in and out.
E.g. cron does. Check logs before this what opens this session.
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu
We see such errors when Chromium in development mode crashes. -- Johannes Weberhofer Weberhofer GmbH, Austria, Vienna -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2017-09-08 12:44, Johannes Weberhofer wrote:
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu
We see such errors when Chromium in development mode crashes.
The log should say what it is. systemd-coredump uses a lot of CPU because it compresses the core images, and they are always large, perhaps huge files (gigabytes in a case of mine). You can configure it in "/etc/systemd/coredump.conf" Ah, ulimit -c doesn't work. I think systemd-coredump should be improved: initially dump without compression. Later, after dumping, compress just one file at a time, in background. If a process dumps repeatedly, because it is started again and again, don't dump. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)
On 09/08/2017 07:03 AM, Carlos E. R. wrote:
I think systemd-coredump should be improved: initially dump without compression. Later, after dumping, compress just one file at a time, in background. If a process dumps repeatedly, because it is started again and again, don't dump.
Isn't that what you get when you set in /etc/systemd/coredump.conf that storage is external and compression is off? These will still be cleaned by systemd-tmpfiles wouldn't they? -- After all is said and done, more is said than done.
On 2017-09-08 20:22, John Andersen wrote:
On 09/08/2017 07:03 AM, Carlos E. R. wrote:
I think systemd-coredump should be improved: initially dump without compression. Later, after dumping, compress just one file at a time, in background. If a process dumps repeatedly, because it is started again and again, don't dump.
Isn't that what you get when you set in /etc/systemd/coredump.conf that storage is external and compression is off?
I don't know what "external" means, but I guess it means "inside the log or outside" (man confirms). Compression can be set to off, yes. And they are deleted in a week by default. But there is no adjustment possible when something dies and restarts repeatedly: kills all CPU cores. Also, there is no choice of compressor or options, like use the fastest method, nor of a filter to not collect the coredump of some processes. I have a process that sometimes crashes and dump core, sometimes huge, and the CPU is busy for many minutes; during this time I can not start it again, and I need the process. The most I can do is disable compression.
These will still be cleaned by systemd-tmpfiles wouldn't they?
Yes. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)
On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu
We see such errors when Chromium in development mode crashes.
Yeah I think Chrome is to blame here :( -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
Am 11.09.2017 um 22:26 schrieb Robert Schweikert:
On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu We see such errors when Chromium in development mode crashes.
Yeah I think Chrome is to blame here :(
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table. To see how many you have, use: ulimit -a|grep proc The command to check how many processes you use is: ps -fTu $USER | wc -l If the numbers are close, opening new tabs or terminals can fail because these operations create many new processes (BASH will need them to process the information in the start up scripts). To fix this, add these two lines to /etc/security/limits.conf: * hard nproc 1700 * soft nproc 1200 Regards, -- Aaron "Optimizer" Digulla a.k.a. Philmann Dark "It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits." http://blog.pdark.de/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/13/2017 02:46 PM, Aaron Digulla wrote:
Am 11.09.2017 um 22:26 schrieb Robert Schweikert:
On 09/08/2017 06:44 AM, Johannes Weberhofer wrote:
Am 06.09.2017 um 22:49 schrieb Robert Schweikert:
- systemd-coredump process running off and on and when it runs it sucks up 100% cpu We see such errors when Chromium in development mode crashes.
Yeah I think Chrome is to blame here :(
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table. To see how many you have, use:
ulimit -a|grep proc
~> ulimit -a|grep proc max user processes (-u) 1200
The command to check how many processes you use is:
ps -fTu $USER | wc -l
1056 Probably close enough to the limit to cause trouble when opening a few more tabs and windows.....
If the numbers are close, opening new tabs or terminals can fail because these operations create many new processes (BASH will need them to process the information in the start up scripts).
To fix this, add these two lines to /etc/security/limits.conf:
* hard nproc 1700 * soft nproc 1200
Ahh configuration knobs, cranked up a bit. Thanks, Robert -- Robert Schweikert MAY THE SOURCE BE WITH YOU Distinguished Architect LINUX Team Lead Public Cloud rjschwei@suse.com IRC: robjo
On 13/09/17 02:46 PM, Aaron Digulla wrote:
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table.
Just out of interest, what is the algorithm for dealing with the proc table? I gather from what you write that it is a (somewhat) static array as opposed to a dynamically created tree? What is the search and/or insert or compression algorithm? Is there some hash which might also be expanded for faster lookup in the nearly full situation? -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am 13.09.2017 um 21:29 schrieb Anton Aylward:
On 13/09/17 02:46 PM, Aaron Digulla wrote:
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table. Just out of interest, what is the algorithm for dealing with the proc table? I gather from what you write that it is a (somewhat) static array as opposed to a dynamically created tree?
What is the search and/or insert or compression algorithm? Is there some hash which might also be expanded for faster lookup in the nearly full situation?
The process list is already dynamic. It's a security feature: https://en.wikipedia.org/wiki/Fork_bomb In a nutshell: This is to prevent your computer from locking up because someone made a mistake (program endlessly creates processes in a loop) or a denial of service attack (creating processes to bring the performance to a crawl). Now, this is 2017 and people are starting to use all those nice CPU cores so the "1000 processes per user should be enough for anyone" is no longer true. On my computer, Chrome needs 400 entries (each thread counts as one process), Thunderbird 50, Firefox 45. With the Version 55 of Firefox, the situation will get worse. Maybe openSUSE should set the default to 2000? Regards, -- Aaron "Optimizer" Digulla a.k.a. Philmann Dark "It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits." http://blog.pdark.de/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
* Aaron Digulla
Am 13.09.2017 um 21:29 schrieb Anton Aylward:
On 13/09/17 02:46 PM, Aaron Digulla wrote:
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table. Just out of interest, what is the algorithm for dealing with the proc table? I gather from what you write that it is a (somewhat) static array as opposed to a dynamically created tree?
What is the search and/or insert or compression algorithm? Is there some hash which might also be expanded for faster lookup in the nearly full situation?
The process list is already dynamic. It's a security feature: https://en.wikipedia.org/wiki/Fork_bomb
In a nutshell: This is to prevent your computer from locking up because someone made a mistake (program endlessly creates processes in a loop) or a denial of service attack (creating processes to bring the performance to a crawl).
Now, this is 2017 and people are starting to use all those nice CPU cores so the "1000 processes per user should be enough for anyone" is no longer true.
On my computer, Chrome needs 400 entries (each thread counts as one process), Thunderbird 50, Firefox 45. With the Version 55 of Firefox, the situation will get worse.
Maybe openSUSE should set the default to 2000?
my Tw's are set to 4096 and I didn't change them :) -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Registered Linux User #207535 @ http://linuxcounter.net Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet freenode -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/09/17 05:10 PM, Aaron Digulla wrote:
What is the search and/or insert or compression algorithm? Is there some hash which might also be expanded for faster lookup in the nearly full situation?
The process list is already dynamic. It's a security feature: https://en.wikipedia.org/wiki/Fork_bomb
In a nutshell: This is to prevent your computer from locking up because someone made a mistake (program endlessly creates processes in a loop) or a denial of service attack (creating processes to bring the performance to a crawl).
Yes, I'm quite aware of a fork bomb; it is not something new. Applying a per-user process limit as opposed to merely the system wide process limit is likely adequate. Having for each user on a heavily multi-user system Patrick's 4K per user setting nearly filled by Chrome -- or are we really talking about THREADS rather than complete processes?[1] --leads to a pretty big main proc table. Or are they indexed on a per-user basis as well. The research I can find googling around on kernel hashing seems a bit out of date and it general. I've seem mention that it is is - a linear table - a linked list - a hash-indexed "table" but format of the "table" unspecified. Please not, I'm not commenting on the data structure itself, only on its access methods, creation, destruction. One thing I do realise: with a VM kernel, tables can be resized. Grab a new page set, copy into the larger space giving the table a new upper limit, reset pointers, release old page set. Whether you SHOULD is quite another matter. The circumstances that force you to do this might be a problem that is in need of a solution first and foremost. [1] I understand that for Chrome is it actually processes, but other applications seem to spawn threads, which look remarkably like processes to some process-listing tools. I run htop or "ps -eLf" and find firefox has 54 threads, thunderbird has 91. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am 13.09.2017 um 20:46 schrieb Aaron Digulla:
Another possible reason is that you don't have enough processes. Chrome renders each tab in a different process. That means chrome needs a ton of entries in the process table. To see how many you have, use: ... To fix this, add these two lines to /etc/security/limits.conf:
* hard nproc 1700 * soft nproc 1200
Thanks for all your responses! Today I got another core-dump: Sep 18 11:54:51 c-web1 kernel: mmap: chromium (8060): VmData 2147631104 exceed data ulimit 2147483647. Update limits or use boot option ignore_rlimit_data. Sep 18 11:54:51 c-web1 kernel: do_trap: 117 callbacks suppressed Sep 18 11:54:51 c-web1 kernel: traps: chromium[8060] trap int3 ip:5630a8cc7d7e sp:7ffe4b3e3610 error:0 Sep 18 11:54:52 c-web1 systemd-coredump[14985]: Process 8060 (chromium) of user 1027 dumped core. In my Leap 42.3 installation the values you suggested for /etc/security/limits.conf were already set. I have now increased them to 2200/2000, I wonder if that PC now stops core-dumping. Best regards -- Johannes Weberhofer Weberhofer GmbH, Austria, Vienna -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (8)
-
Aaron Digulla
-
Andrei Borzenkov
-
Anton Aylward
-
Carlos E. R.
-
Johannes Weberhofer
-
John Andersen
-
Patrick Shanahan
-
Robert Schweikert