Re: [opensuse] How Much Swap for 512-GB of RAM?

12 Jan 2020

      On 12/01/2020 12:13, Per Jessen wrote:
...
Anton Aylward wrote:
...
On 12/01/2020 11:51, Per Jessen wrote:
...
jdd@dodin.org wrote:
...
...
...
When the system starts to run out of memoru, the OOM killer will kick
in
and try to identify who is gobbling up the memory.  If it pick an
innocent process, it could be seen as the system crashing.
That is possible with the default setting, yes.
However you can also configure it so the OOM_killer targets the
process that caused the OOM condition and only that.
Care to share an example?
The details of the what and how are in the long document on the virtual memory
settings that I've mentioned here a number of times before:

You have control via VM settings of what happens in OOM conditions.
Sadly the default is to scan ALL processes for candidates to kill or default to
a PANIC.   You can, if you read though the docco I referred to,
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
alter that.

==============================================================

oom_kill_allocating_task

This enables or disables killing the OOM-triggering task in
out-of-memory situations.

If this is set to zero, the OOM killer will scan through the entire
tasklist and select a task based on heuristics to kill.  This normally
selects a rogue memory-hogging task that frees up a large amount of
memory when killed.

If this is set to non-zero, the OOM killer simply kills the task that
triggered the out-of-memory condition.  This avoids the expensive
tasklist scan.

If panic_on_oom is selected, it takes precedence over whatever value
is used in oom_kill_allocating_task.

The default value is 0.

==============================================================

and

=============================================================

panic_on_oom

This enables or disables panic on out-of-memory feature.

If this is set to 0, the kernel will kill some rogue process,
called oom_killer.  Usually, oom_killer can kill rogue processes and
system will survive.

If this is set to 1, the kernel panics when out-of-memory happens.
However, if a process limits using nodes by mempolicy/cpusets,
and those nodes become memory exhaustion status, one process
may be killed by oom-killer. No panic occurs in this case.
Because other nodes' memory may be free. This means system total status
may be not fatal yet.

If this is set to 2, the kernel panics compulsorily even on the
above-mentioned. Even oom happens under memory cgroup, the whole
system panics.

The default value is 0.
1 and 2 are for failover of clustering. Please select either
according to your policy of failover.
panic_on_oom=2+kdump gives you very strong tool to investigate
why oom happens. You can get snapshot.

=============================================================

More to the point here, you have settings that let you ANALYSE why the OOM occured.
...
In my experience, the oom killer kills the _best_ process, which is the
one that uses the most memory.
NOT!

Suppose you have a memory intensive analytic program that goes backwards and
forwards over rows & columns of data and its been running a couple of weeks ...
Then you start up a small interactive program with a memory leak.  There's not
much merry available since the big analytic one has most of it, so this one
leaks what it can get, which isn't much by comparison, and causes the OOM condition.

You do not want the biggest, that analysis program, to be the one that is
killed!  You might wonder at the interactive one dying on you, but you
definitely don't want to loose all that work!
...
There is way of influencing the choice,
I'm not sure how.
See above
...
I had a situation over Christmas where a customer's webserver
essentially died (I may have mentioned this situation before) due to
trying to serve too many requests.  Loads of apache threads gobbled up
the memory and the OOM killer decided to get rid of mysql. (an
innocent, but important victim).
In days of old the front end Apache started up some threads, just a few, and had
them waiting for requests.  After servicing a request the thread died and Apache
started a new one. so the downstream code, perhaps a Perl application, also
does.  memory leaks associated also went away.

Then some genius decided that the overhead of startup was too much so lets have
long lived processes.   Never mind if they have memory leaks.

If your Apache is continuously spawning threads then you have a problem.  It may
be a configuration problem.  IIR there is a setting of which determines how many
threads there can be.   It may be that you are being overwhelmed, either by read
traffic or a DenaialOfService attack.  And the way you have Apache set up it is
not throttling and so servicing network connections as fast as they come in,
regardless of the rate-of-service.

-- 
         A: Yes.
     >   Q: Are you sure?
     >>  A: Because it reverses the logical flow of conversation.
     >>> Q: Why is top posting frowned upon?

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org