Re: [opensuse] Hardware failure or software problem? Swap fills up, system unresponsive and dmesg is flooded

30 Jan 2020

      It's a HARDWARE problem.
Sort of.
Or a limitation of the software design imposed by a hardware limitation.

On 2020-01-30 10:53 a.m., Per Jessen wrote:
...
Andrei Borzenkov wrote:
...
On Thu, Jan 30, 2020 at 4:29 PM Per Jessen <per@computer.org> wrote:
...
In the process lists from dmesg, I see 266 processes using up about
1Gb (RSS) which doesn't seem like a lot?
It's in pages, so multiply by 4K. Active anonymous memory alone is
around 6GB.
Ah, thanks - I was thinking it was kilobytes.
...
This is known problem. There is a lot of active memory so every time
kernel has to search a lot to find free page (or page that can be
reclaimed).
that could lead to an oom condition?
You asked earlier about 'fragmentation.
it SHOULD NOT happen with a VM system but there IS a syndromic situation where
it COULD happen and this seems to be it.

Some applications need HUGEPAGES, that is block of pages that correspond to
large areas of physical memory.

It's a hardware problem!

Why is it a hardware problem?
Well think of DMA.  As long as DMA is doing the 4K buffered IO it can do that
behind the scenes of the VM system.  The VM-IO can tell the DMA the 'real
address' and that's what the DMA uses.  The DMA does not use the VM mapping
because the hardware doesn't work that way.

<sidebar>
Not all hardware worked that way.
To the best of my knowledge, the old DEC VAX had some way round this but I don't
recall what it was[1].  It had other problems though that involved a
not-very-reliable method of tagging the page descriptors.   Kludges about to
make up for hardware shortcomings.
</sidebar>

But some application such as some database apps are more 'raw' and need huge
spans of address space to IO into (or out of).  If the DMA was using the same
mapping tables then everything would be OK, but its not, it's 'behind the
scenes'.   So the VM system has to deal with HUGEPAGES, that is a block of pages
that corresponds to a large block of physical memory.  And it has to create those.

Now the regular VM operation might well have broken up memory so there isn't
such a span.  It will go into a frenzy of deallocation and swapping stuff out to
try to create such a span, perhaps more than one.  The deallocation might cause
anomalies in other execution as code pages vanish.

meanwhile, the application has requested this and is waiting, waiting ...
	....
	mysqld: page allocation stalls for 21920ms,
	....

The irony is that there might be a great amount 'free' (for some value of that
meaning) memory.  The VM system isn't good at, because that's not what it was
designed for, juggling that around to create the large available spans.   A
human would be good at that, but this is a general purpose computer, and it's
just gone beyond the more general use-case it was intended for.  We can do the
Times Crossword puzzle. (OK, some of us can. other's weren't designed for that.)

Can you 'tune' for this?  Probably.  I don't know.
I think I'm coming down with the 'flu, so please don't ask me to find out.  Try
reading the kernel VM docco yourself.  I'm off to get some hot lemon flu cure.

[1] I could look it up but I'm not going to.
-- 
         A: Yes.
     >   Q: Are you sure?
     >>  A: Because it reverses the logical flow of conversation.
     >>> Q: Why is top posting frowned upon?

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org

Re: [opensuse] Hardware failure or software problem? Swap fills up, system unresponsive and dmesg is flooded

Anton Aylward