It's a HARDWARE problem. Sort of. Or a limitation of the software design imposed by a hardware limitation. On 2020-01-30 10:53 a.m., Per Jessen wrote:
Andrei Borzenkov wrote:
On Thu, Jan 30, 2020 at 4:29 PM Per Jessen <per@computer.org> wrote:
In the process lists from dmesg, I see 266 processes using up about 1Gb (RSS) which doesn't seem like a lot?
It's in pages, so multiply by 4K. Active anonymous memory alone is around 6GB.
Ah, thanks - I was thinking it was kilobytes.
This is known problem. There is a lot of active memory so every time kernel has to search a lot to find free page (or page that can be reclaimed).
that could lead to an oom condition?
You asked earlier about 'fragmentation. it SHOULD NOT happen with a VM system but there IS a syndromic situation where it COULD happen and this seems to be it. Some applications need HUGEPAGES, that is block of pages that correspond to large areas of physical memory. It's a hardware problem! Why is it a hardware problem? Well think of DMA. As long as DMA is doing the 4K buffered IO it can do that behind the scenes of the VM system. The VM-IO can tell the DMA the 'real address' and that's what the DMA uses. The DMA does not use the VM mapping because the hardware doesn't work that way. <sidebar> Not all hardware worked that way. To the best of my knowledge, the old DEC VAX had some way round this but I don't recall what it was[1]. It had other problems though that involved a not-very-reliable method of tagging the page descriptors. Kludges about to make up for hardware shortcomings. </sidebar> But some application such as some database apps are more 'raw' and need huge spans of address space to IO into (or out of). If the DMA was using the same mapping tables then everything would be OK, but its not, it's 'behind the scenes'. So the VM system has to deal with HUGEPAGES, that is a block of pages that corresponds to a large block of physical memory. And it has to create those. Now the regular VM operation might well have broken up memory so there isn't such a span. It will go into a frenzy of deallocation and swapping stuff out to try to create such a span, perhaps more than one. The deallocation might cause anomalies in other execution as code pages vanish. meanwhile, the application has requested this and is waiting, waiting ... .... mysqld: page allocation stalls for 21920ms, .... The irony is that there might be a great amount 'free' (for some value of that meaning) memory. The VM system isn't good at, because that's not what it was designed for, juggling that around to create the large available spans. A human would be good at that, but this is a general purpose computer, and it's just gone beyond the more general use-case it was intended for. We can do the Times Crossword puzzle. (OK, some of us can. other's weren't designed for that.) Can you 'tune' for this? Probably. I don't know. I think I'm coming down with the 'flu, so please don't ask me to find out. Try reading the kernel VM docco yourself. I'm off to get some hot lemon flu cure. [1] I could look it up but I'm not going to. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org