Add some OOM Killer by default in openSUSE
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
Hello us! Some distributions have OOM Killer by default. For example, Fedora, Ubuntu (both use systemd-oomd since 34 and 22.04 respectively) and Garuda (nohang) There are next available "killers" : * systemd-oomd (https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html) * nohang (https://github.com/hakavlad/nohang) * earlyoom (https://github.com/rfjakob/earlyoom) * oomd (Facebook variant, https://github.com/facebookincubator/oomd) From me : maybe, systemd-oomd will be the best solution. It supports by Red Hat, it is a part of systemd and it has good configuration for desktop/server. I would like not just to find out which is better, but to achieve the addition of a "killer" as default to our favourite distribution :)
![](https://seccdn.libravatar.org/avatar/d977e460744bc9591586ffd46b60adf0.jpg?s=120&d=mm&r=g)
On Wed, 2022-07-13 at 17:17 +0000, Ivan Vorstanenko wrote:
Hello us! Some distributions have OOM Killer by default. For example, Fedora, Ubuntu (both use systemd-oomd since 34 and 22.04 respectively) and Garuda (nohang) There are next available "killers" : * systemd-oomd (https://www.freedesktop.org/software/systemd/man/systemd-oomd.servic e.html) * nohang (https://github.com/hakavlad/nohang) * earlyoom (https://github.com/rfjakob/earlyoom) * oomd (Facebook variant, https://github.com/facebookincubator/oomd)
From me : maybe, systemd-oomd will be the best solution. It supports by Red Hat, it is a part of systemd and it has good configuration for desktop/server.
I would like not just to find out which is better, but to achieve the addition of a "killer" as default to our favourite distribution :)
The Kernel already has an exceptionally useful oom killer. All of the above suggestions primarily add just more complexity trying to kill things before swap fills up, because a system that is swapping is (normally) utterly unresponsive. And this is fundementally never going to change unless disk access times dramatically increase while memory access times stop thier historically faster pace of increase. I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default. The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022. Better to accept that problem and remove the root cause, rather than add more junk to the stack to make the already obsolete safety blanket a waste of diskspace that serves no purpose at all. Regards, Richard
![](https://seccdn.libravatar.org/avatar/ed90d0132a4f59f2d3a1cf82a1b70915.jpg?s=120&d=mm&r=g)
On 13.07.22 21:56, Richard Brown wrote:
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today? I personally configure a low amount of swap (counted in megabytes, not gigabytes) which allows these mechanisms to still work and which fill up fast enough to let the kernel's oom killer do its job quickly in case of something going really wrong. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
![](https://seccdn.libravatar.org/avatar/77cb4da5f72bc176182dcc33f03a18f3.jpg?s=120&d=mm&r=g)
On 13/07/2022 22.12, Stefan Seyfried wrote:
On 13.07.22 21:56, Richard Brown wrote:
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
I personally configure a low amount of swap (counted in megabytes, not gigabytes) which allows these mechanisms to still work and which fill up fast enough to let the kernel's oom killer do its job quickly in case of something going really wrong.
I would simply not be able to use this laptop without swap. 4 GiB of Ram, 11 of swap, of which 2.3 are in use this minute. This allows having 1.8 of free ram and 832 of buffer. Sure, having more ram would be nicer, but it is not going to happen -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.3 (Legolas))
![](https://seccdn.libravatar.org/avatar/bce881f00c17a1bf997473f19b54e1d4.jpg?s=120&d=mm&r=g)
On Wed, Jul 13, Stefan Seyfried wrote:
On 13.07.22 21:56, Richard Brown wrote:
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
No idea, but none of my desktop machines which still have swap enabled are using it... -- Thorsten Kukuk, Distinguished Engineer, Senior Architect SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany Managing Director: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman (HRB 36809, AG Nürnberg)
![](https://seccdn.libravatar.org/avatar/e62278afb8c40f7be938fa93c59f10a2.jpg?s=120&d=mm&r=g)
On 14.07.22 08:47, Thorsten Kukuk <kukuk@suse.de> wrote:
On Wed, Jul 13, Stefan Seyfried wrote:
On 13.07.22 21:56, Richard Brown wrote:
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
No idea, but none of my desktop machines which still have swap enabled are using it...
In the HPC area lots of deployment is done in ramdisks, with images sizes of several GBs. If you want to use now the full RAM of your machine, you simply add swap space and the ramdisk will be swapped out on memory pressure. So there are valid cases for swap space in 2022. But if a OOM killer gets active you are doomed anyway. kind regards, Christian
![](https://seccdn.libravatar.org/avatar/77cb4da5f72bc176182dcc33f03a18f3.jpg?s=120&d=mm&r=g)
On 14/07/2022 08.54, cgoll@suse.de wrote:
On 14.07.22 08:47, Thorsten Kukuk <kukuk@suse.de> wrote:
On Wed, Jul 13, Stefan Seyfried wrote:
On 13.07.22 21:56, Richard Brown wrote:
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Forgot to say, as one that uses it, that it actually works better (faster) in 2022: there is SSD and NVME. There are two improvements: i/o speed, and nearly nil effect of fragmentation. The comparison with swap on rotating rust is dramatic.
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
No idea, but none of my desktop machines which still have swap enabled are using it...
In the HPC area lots of deployment is done in ramdisks, with images sizes of several GBs. If you want to use now the full RAM of your machine, you simply add swap space and the ramdisk will be swapped out on memory pressure. So there are valid cases for swap space in 2022. But if a OOM killer gets active you are doomed anyway.
Right. You can only expect a graceful system death :-) -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.3 (Legolas))
![](https://seccdn.libravatar.org/avatar/ed90d0132a4f59f2d3a1cf82a1b70915.jpg?s=120&d=mm&r=g)
On 14.07.22 08:47, Thorsten Kukuk wrote:
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
No idea, but none of my desktop machines which still have swap enabled are using it...
for this use case it is usually in the order of a few memory pages. If I understood it correctly, in case of fragmented memory the kernel did just swap out the used pages that were at a bad place (e.g. splitting a large potentially free memory region into two smaller ones) and swapped them back in at another place (possibly later, in case they were really accessed again). My understanding back then was that it was relatively hard to do this (atomically, in case of memory pressure, ...) without swap. But it is totally possible that this has been solved a decade ago. This is why I wanted to ask for someone with knowledge on the kernel internals to confirm this and not for anecdotal evidence. Reading my original mail I see that I have failed to do that and just asked too general. So I'm asking again if there is someone with knowledge on the kernel implementation who can confirm that swap is no longer advised to have. Anecdotal evidence will not be useful IMHO. For me many things work just fine which you certainly would not want to use / keep (NIS... :-P) Have fun :-) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
![](https://seccdn.libravatar.org/avatar/ca718a38fd132c320ab1c8a9ab68d372.jpg?s=120&d=mm&r=g)
On 14.07.22 08:47, Thorsten Kukuk wrote:
On Wed, Jul 13, Stefan Seyfried wrote:
Historically, the kernel has also used swap to do things like memory defragmentation. Is this no longer the case today?
No idea, but none of my desktop machines which still have swap enabled are using it...
And how much paging to filesystems is going on? Regards Oliver
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
Thanks for your feedback! As I see, today, for example, Fedora has decided to use another tool (systemd-oomd) instead something like "experimental patch for kernel oom killer". Is this not an indication that kernel's oom killer is not enough? Also, are there any ways of kernel's oom killer to see a bad responsiveness of user interface? AFAIK, when RAM is filled, tty and services work normal and user can interract with them, what I can't say about DE/WM.
![](https://seccdn.libravatar.org/avatar/c21fc637ef5f0f7d2923ee4000ad0a86.jpg?s=120&d=mm&r=g)
On 7/13/22 2:24 PM, Ivan Vorstanenko wrote:
Also, are there any ways of kernel's oom killer to see a bad responsiveness of user interface? AFAIK, when RAM is filled, tty and services work normal and user can interract with them, what I can't say about DE/WM. I don't know if this is what you've been seeing, but a few years I was having severe performance problems when loading a (very) large Google spreadsheet (and other quick demands for large amounts of RAM).The system didn't run out of RAM+swap space, but it did exceed RAM, and response would, essentially, stall for minutes at a time. After reading https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuni..., I now set:
sysctl vm/watermark_scale_factor=1000 and I haven't had that problem since. I assume kswapd is causing higher CPU overhead, but I don't notice it. David
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
''' After reading https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuni..., I now set: sysctl vm/watermark_scale_factor=1000 and I haven't had that problem since. I assume kswapd is causing higher CPU overhead, but I don't notice it.''' I'm not sure that end user want to do all these tips to fix freezes when RAM is full :) There is a reason, why systemd-oomd will be created and added to Fedora (intead earlyoom). And, I think, main reason - the kernel's oom killer can't cope with DE/WM freeze when RAM is full 'out-of-box'.
![](https://seccdn.libravatar.org/avatar/db61becba2ff9d091e7d462f1b3178ac.jpg?s=120&d=mm&r=g)
On Thursday 14 July 2022, Ivan Vorstanenko wrote:
Thanks for your feedback!
As I see, today, for example, Fedora has decided to use another tool (systemd-oomd) instead something like "experimental patch for kernel oom killer". Is this not an indication that kernel's oom killer is not enough?
Also, are there any ways of kernel's oom killer to see a bad responsiveness of user interface? AFAIK, when RAM is filled, tty and services work normal and user can interract with them, what I can't say about DE/WM.
As far as the desktop is concerned, an Out of Memory killer seems like a sledgehammer approach. I suspect that in many cases a termination may be inappropriate. On a desktop system it's usually a user process that decides to run away with itself. It would be easy to put in place some kind of session-minder that monitors user processes and potentially auto-suspends runaways before things get out of hand. It session-miner could asks the user what to do or apply some preset policies. On my own desktop I've gone part way down that track. I have a system-tray application that reports any process consuming excessive CPU or memory: https://github.com/digitaltrails/procno - procno has a bling GUI-aspect, but the part most important to me is that it sits minimised in the tray and raises alerts when necessary. It allows me to bring up info on the process, and even to signal it with a terminate. I just have to get around to adding auto-suspend for non-critical processes. Also in respect to swap. As others have written, running deeply into swap is less of an issue if swapping to NVME. My desktop is still responsive enough to take action while swap is rapidly filling up. As more of use start using NVME, different approaches are likely to be appropriate rather than any one size fits all OOM killer. Michael
![](https://seccdn.libravatar.org/avatar/8df291265238395e793d45e1e572336d.jpg?s=120&d=mm&r=g)
On Thursday, 14 July 2022 5:26:09 AM ACST Richard Brown wrote:
[...] The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Better to accept that problem and remove the root cause, rather than add more junk to the stack to make the already obsolete safety blanket a waste of diskspace that serves no purpose at all.
Regards,
Richard
I haven't used swap for the last 5-6 years (at least). Totally unneccesary if you have enough RAM for your normal workload + headroom. -- ================================================================================================================== Rodney Baker rodney.baker@iinet.net.au ==================================================================================================================
![](https://seccdn.libravatar.org/avatar/c7f67274b53ad9fa88dd65366acee36e.jpg?s=120&d=mm&r=g)
On Wed, 2022-07-13 at 21:56 +0200, Richard Brown wrote:
On Wed, 2022-07-13 at 17:17 +0000, Ivan Vorstanenko wrote:
Hello us! Some distributions have OOM Killer by default. For example, Fedora, Ubuntu (both use systemd-oomd since 34 and 22.04 respectively) and Garuda (nohang) There are next available "killers" : * systemd-oomd ( https://www.freedesktop.org/software/systemd/man/systemd-oomd.servic e.html) * nohang (https://github.com/hakavlad/nohang) * earlyoom (https://github.com/rfjakob/earlyoom) * oomd (Facebook variant, https://github.com/facebookincubator/oomd)
From me : maybe, systemd-oomd will be the best solution. It supports by Red Hat, it is a part of systemd and it has good configuration for desktop/server.
I would like not just to find out which is better, but to achieve the addition of a "killer" as default to our favourite distribution :)
The Kernel already has an exceptionally useful oom killer.
All of the above suggestions primarily add just more complexity trying to kill things before swap fills up, because a system that is swapping is (normally) utterly unresponsive.
And this is fundementally never going to change unless disk access times dramatically increase while memory access times stop thier historically faster pace of increase.
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
The idea of having swap to give a system a buffer when memory is running out just doesn't work well in 2022.
Better to accept that problem and remove the root cause, rather than add more junk to the stack to make the already obsolete safety blanket a waste of diskspace that serves no purpose at all.
Full agreement with one thing to add. Running a desktop with many containers and/or electron apps might put us back in a situation where swap space might be useful. Since having flatpak apps with >500MB in size (looking at you teams) is a reality, I highly doubt that all of this ram is going to be hot all the time. If you run multiple of such big monsters on your desktop you might end up in a situation where a lot of bloated code is never going to be used at all. This does not allow the Linux kernel to use this memory as disk buffer with the subsequent performance malus. This is wasted memory on a not insignificant scale. Especially when looking forward to a containerized desktop application in conjunction with the reality that bloaty desktop containers exist, swap space might still be a relevant (yet complicated) tuning factor for a desktop system. Best, Felix
Regards,
Richard
![](https://seccdn.libravatar.org/avatar/5b748275c3dbb1ceee18ed554486547d.jpg?s=120&d=mm&r=g)
On Thursday 2022-07-14 09:52, Felix Niederwanger wrote:
Since having flatpak apps with >500MB in size (looking at you teams) is a reality, I highly doubt that all of this ram is going to be hot all the time.
It should be pretty hot. Program code is mmapped, will not make it to swap (if you read it back, you can just read it back from its original /usr location). So swap mostly contains the runtime variable data, which is normally a lot hotter than program code already. And then it's even hotter, because of suboptimal programming factors like unnecessary padding between variables that contributes to low utilization within a block of memory, or the use of lots of pointers contributes to cache unfriendliness.
![](https://seccdn.libravatar.org/avatar/c7f67274b53ad9fa88dd65366acee36e.jpg?s=120&d=mm&r=g)
On Thu, 2022-07-14 at 11:19 +0200, Jan Engelhardt wrote:
On Thursday 2022-07-14 09:52, Felix Niederwanger wrote:
Since having flatpak apps with >500MB in size (looking at you teams) is a reality, I highly doubt that all of this ram is going to be hot all the time.
It should be pretty hot.
Program code is mmapped, will not make it to swap (if you read it back, you can just read it back from its original /usr location).
So swap mostly contains the runtime variable data, which is normally a lot hotter than program code already. And then it's even hotter, because of suboptimal programming factors like unnecessary padding between variables that contributes to low utilization within a block of memory, or the use of lots of pointers contributes to cache unfriendliness.
Thanks for the clarification! And also: ouch ... X-D
![](https://seccdn.libravatar.org/avatar/37ce46f3bb7af09b1da428d24b87bd4a.jpg?s=120&d=mm&r=g)
On Wed, Jul 13, 2022, at 3:56 PM, Richard Brown wrote:
The Kernel already has an exceptionally useful oom killer.
The kernel's oomkiller is designed to protect the kernel. It's not intended to maintain responsiveness of a system from the user perspective. That's the realm of resource control. If (open)SUSE folks want to consider a user space oom killer, I think the simplest one is earlyoom. More sophisticated is systemd-oomd but in that case you should also include and enable uresourced, which includes cgroupify. https://gitlab.freedesktop.org/benzea/uresourced The effect of uresourced is to set minimum resources for the desktop environment. What should be true, if the minimum resources are available for all the core desktop processes that could be responsible or depended on for the responsiveness of the UI, it won't hang or be significantly delayed. There is some pending work still to wire up the IO isolation for this, but the good news on (open)SUSE is the use of Btrfs which is known to have good IO isolation capability with cgroup 2. The not so great news is that anything on device-mapper (including dm-crypt) doesn't yet, and I'm not sure how much work is needed or pending to make that happen - hence one of the reasons behind the Btrfs fscrypt effort. The effect of cgroupify is application specific, particularly web browsers which spin off sub processes for each tab. Since oomd only kills based on cgroups, when e.g. Firefox and all its tabs are in a single cgroup, oomd will kill the whole program rather than a particular tab. Whereas cgroupify will split out the tabs into their own cgroup, permitting oomd to, in effect, kill off specific tabs. There is as yet no notification mechanism on any desktops for this. I think there should be notification for the user rather than leave it to them to figure out that, oh yeah in fact some tabs are missing and it's not my memory that's at fault! These events are logged in the journal but I think from a desktop UI/UX perspective that's insufficient. But the question then is what is sufficient while also not being confusing? And then we'd need the user space oom kill service to have something like a dbus API in order to communicate the kill even info to the desktop.
All of the above suggestions primarily add just more complexity trying to kill things before swap fills up, because a system that is swapping is (normally) utterly unresponsive.
Yeah so in later kernels this has improved considerably but was so bad before that I can't really quantify the improvement. It is more tolerable than before but still depends on the workload. There is upstream kernel work pending to improve things further, so that there aren't so many knobs and configuration to choose from for users (or distros for that matter). e.g. mm, swap, and zswap. Right now swap is also cgroup aware when it's on a plain partition or a btrfs swapfile, but zswap (not to be confused with zram) is not yet cgroup aware but that's planned. But anyway the idea is to make all these things just work correctly together.
And this is fundementally never going to change unless disk access times dramatically increase while memory access times stop thier historically faster pace of increase.
It is possible to put all the desktop GUI related things into their own cgroup, and set a minimum amount of resources: IO, memory, CPU. With uresourced 2 of 3 are set. The missing one is IO, and this right now needs knowledge of actual device bandwidth and latency. One idea is to use a database. Another is to dynamically determine device performance, but I'm not sure if that complexity is worth it just to better handle some SSDs with crappy firmware and significant performance dropoffs in certain heavy write workloads.
I think a far more reasonable solution would be to follow what we've already done in MicroOS and do not use swap by default.
This is worse. You end up with no method to remove anonymous pages from RAM, and that means in memory low situations the kernel has no choice but to drop file pages. When the file pages are needed again, they have to be read from disk. And now you get behavior that looks very much like swap thrashing. https://chrisdown.name/2018/01/02/in-defence-of-swap.html You definitely need some swap to allow stale anonymous pages to be evicted from memory. Otherwise you're just wasting RAM *and* CPU *and* IO which results in the resource control problem you're trying to avoid. The central problem with overcommitted resources is how to properly limit different classes of programs? There is no single one opinion that's correct, it's all a bunch of tradeoffs. But we might be able to follow an 80/20 rule and do the right thing 80% of the time for most programs; and then 20% of the time it's wrong, and we'll need those programs or a wrapper script for them to give a hint to systemd so that those apps are exempt from oomd or have different than default limits on their resources as needed. But in any case I don't think it's a completely unsolveable problem, we can't let perfect be the enemy of good. -- Chris Murphy
![](https://seccdn.libravatar.org/avatar/37ce46f3bb7af09b1da428d24b87bd4a.jpg?s=120&d=mm&r=g)
Upstream issues that relate to resource control and oomd: https://github.com/systemd/systemd/issues/23557 https://github.com/systemd/systemd/issues/22903 https://github.com/systemd/systemd/pull/22937 https://github.com/systemd/systemd/issues/16403 https://github.com/systemd/systemd/pull/23325 https://github.com/systemd/systemd/issues/20649 https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/295 -- Chris Murphy
![](https://seccdn.libravatar.org/avatar/44739d06bdcd4d7db592a906f33292a3.jpg?s=120&d=mm&r=g)
Dne 13. 07. 22 v 19:17 Ivan Vorstanenko napsal(a):
Hello us! Some distributions have OOM Killer by default. For example, Fedora, Ubuntu (both use systemd-oomd since 34 and 22.04 respectively) and Garuda (nohang) There are next available "killers" : * systemd-oomd (https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html) * nohang (https://github.com/hakavlad/nohang) * earlyoom (https://github.com/rfjakob/earlyoom) * oomd (Facebook variant, https://github.com/facebookincubator/oomd)
From me : maybe, systemd-oomd will be the best solution. It supports by Red Hat, it is a part of systemd and it has good configuration for desktop/server.
I would like not just to find out which is better, but to achieve the addition of a "killer" as default to our favourite distribution :)
I am not sure, how good they are as Ubuntu had issues after introduction of systemd-oomd: https://www.omgubuntu.co.uk/2022/06/ubuntu-22-04-systemd-oom-killing-apps I have higher hopes for proper fixing kernel memory management https://www.phoronix.com/scan.php?page=search&q=MGLRU or https://lwn.net/Articles/900288/
![](https://seccdn.libravatar.org/avatar/af8a9293484ed04b89081d848929b19a.jpg?s=120&d=mm&r=g)
On Wed, Jul 13, 2022 at 7:47 PM Daniel Noga <noga.dany@gmail.com> wrote:
Dne 13. 07. 22 v 19:17 Ivan Vorstanenko napsal(a):
Hello us! Some distributions have OOM Killer by default. For example, Fedora, Ubuntu (both use systemd-oomd since 34 and 22.04 respectively) and Garuda (nohang) There are next available "killers" : * systemd-oomd (https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html) * nohang (https://github.com/hakavlad/nohang) * earlyoom (https://github.com/rfjakob/earlyoom) * oomd (Facebook variant, https://github.com/facebookincubator/oomd)
From me : maybe, systemd-oomd will be the best solution. It supports by Red Hat, it is a part of systemd and it has good configuration for desktop/server.
I would like not just to find out which is better, but to achieve the addition of a "killer" as default to our favourite distribution :)
I am not sure, how good they are as Ubuntu had issues after introduction of systemd-oomd: https://www.omgubuntu.co.uk/2022/06/ubuntu-22-04-systemd-oom-killing-apps
We haven't had the same problems in Fedora, so I don't know what they're doing wrong in Ubuntu. Maybe it has to do with snaps vs RPMs and Flatpaks?
I have higher hopes for proper fixing kernel memory management https://www.phoronix.com/scan.php?page=search&q=MGLRU or https://lwn.net/Articles/900288/
That's not the same as PSI handling and such. And kernel OOM management is not going to improve. We went through this discussion for a long time before we finally shipped earlyoom and later replaced it with systemd-oomd + swap on zram. The issue and change documents: * https://pagure.io/fedora-workstation/issue/98 * https://fedoraproject.org/wiki/Changes/CGroupsV2 * https://fedoraproject.org/wiki/Changes/EnableEarlyoom * https://fedoraproject.org/wiki/Changes/EnableSystemdOomd * https://fedoraproject.org/wiki/Changes/SwapOnZRAM Linux systems need _some_ kind of swap, and swap on ZRAM works pretty well, even in low memory environments. Combined with cgroupv2, systemd-oomd, and btrfs, it's usually a pretty responsive experience. -- 真実はいつも一つ!/ Always, there's only one truth!
![](https://seccdn.libravatar.org/avatar/fd11770e2c72c423dd3e236b1ec2e094.jpg?s=120&d=mm&r=g)
Am 14.07.22 um 02:39 schrieb Neal Gompa:
* https://fedoraproject.org/wiki/Changes/SwapOnZRAM
Linux systems need _some_ kind of swap, and swap on ZRAM works pretty well, even in low memory environments.
Thanks for the tip! I've always wondered if something like this exists. The package is also available on Tumbleweed, though I guess for the regular user some kind of YaST integration would be nice. Aaron
![](https://seccdn.libravatar.org/avatar/ba6138f793e72be6644854fdc3ec2f02.jpg?s=120&d=mm&r=g)
Hello, Am 14.07.22 um 02:39 schrieb Neal Gompa:
* https://fedoraproject.org/wiki/Changes/SwapOnZRAM
Linux systems need _some_ kind of swap, and swap on ZRAM works pretty well, even in low memory environments.
the above article contains a link to https://chrisdown.name/2018/01/02/in-defence-of-swap.html which seems to be from beginning of 2018 so perhaps that is already outdated? I know nothing about kernel internal memory management so the following could be stupid: I wonder when the kernel needs some space to do certain memory management optimization tasks, why the kernel cannot automatically reserve such space so it could work without manual admin configuration? By the way: In the past I had learned that for hibernation swap is needed that is about the size of the RAM and it seems this is not outdated according to https://help.ubuntu.com/community/SwapFaq which is "last edited 2022-02-01". But swap that is about the size of the RAM contradicts with an efficient OOM killer - perhaps except when swap is on SSD which might be fast enough to avoid a "system hung up" user experience i.e. system gets slow (of course) but is still sufficiently usable. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Frankenstr. 146 - 90461 Nuernberg - Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman (HRB 36809, AG Nuernberg)
![](https://seccdn.libravatar.org/avatar/af8a9293484ed04b89081d848929b19a.jpg?s=120&d=mm&r=g)
On Fri, Jul 15, 2022 at 6:49 AM Johannes Meixner <jsmeix@suse.de> wrote:
Hello,
Am 14.07.22 um 02:39 schrieb Neal Gompa:
* https://fedoraproject.org/wiki/Changes/SwapOnZRAM
Linux systems need _some_ kind of swap, and swap on ZRAM works pretty well, even in low memory environments.
the above article contains a link to https://chrisdown.name/2018/01/02/in-defence-of-swap.html which seems to be from beginning of 2018 so perhaps that is already outdated?
Nope, still true.
I know nothing about kernel internal memory management so the following could be stupid:
I wonder when the kernel needs some space to do certain memory management optimization tasks, why the kernel cannot automatically reserve such space so it could work without manual admin configuration?
Where would it reserve that space? It does some rudimentary stuff by not freeing memory automatically when pages are deallocated, but it doesn't do a lot more than that without swap being plugged into the system.
By the way:
In the past I had learned that for hibernation swap is needed that is about the size of the RAM and it seems this is not outdated according to https://help.ubuntu.com/community/SwapFaq which is "last edited 2022-02-01".
You cannot use hibernation on any system that is booted with UEFI secure boot at this time. The Linux kernel lockdown mode prevents it from working. There was some effort a while ago to fix this when Matthew Garrett was at Google, but it stalled out after he left. The majority of computers today will have lockdown triggered, so you can't use hibernation at all.
But swap that is about the size of the RAM contradicts with an efficient OOM killer - perhaps except when swap is on SSD which might be fast enough to avoid a "system hung up" user experience i.e. system gets slow (of course) but is still sufficiently usable.
Swap on ZRAM makes it so that you have one level of swap that is compressed pages on RAM. This can be complemented by a second swap file/partition on disk for memory pages that are not needed anytime soon (a "cold" page) or for supporting hibernation if your system permits it. -- 真実はいつも一つ!/ Always, there's only one truth!
![](https://seccdn.libravatar.org/avatar/ba6138f793e72be6644854fdc3ec2f02.jpg?s=120&d=mm&r=g)
Hello, On 2022-07-15 13:43, Neal Gompa wrote:
On Fri, Jul 15, 2022 at 6:49 AM Johannes Meixner <jsmeix@suse.de> wrote:
Am 14.07.22 um 02:39 schrieb Neal Gompa:
* https://fedoraproject.org/wiki/Changes/SwapOnZRAM
Linux systems need _some_ kind of swap, and swap on ZRAM works pretty well, even in low memory environments. ... I wonder when the kernel needs some space to do certain memory management optimization tasks, why the kernel cannot automatically reserve such space so it could work without manual admin configuration?
Where would it reserve that space?
Where would the kernel reserve space for ZRAM? When it can reserve space for ZRAM to be used as swap why can it not reserve some space directly for its memory management optimization tasks? It may even do it via ZRAM - the implementation details do not matter. Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Frankenstr. 146 - 90461 Nuernberg - Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman (HRB 36809, AG Nuernberg)
![](https://seccdn.libravatar.org/avatar/ed90d0132a4f59f2d3a1cf82a1b70915.jpg?s=120&d=mm&r=g)
(Disclaimer: my understanding of kernel memory management is weak, at best) On 15.07.22 16:05, Johannes Meixner wrote:
Where would the kernel reserve space for ZRAM?
In main memory. But it is reserved once when setting up the zram device and then fixed. Maybe it is in the same "memory pool" as tmpfs and thus could even be swapped out onto real swap? I don't know.
When it can reserve space for ZRAM to be used as swap why can it not reserve some space directly for its memory management optimization tasks?
It certainly could. It's (as I understood it back then) just hard to do right. Atomic operations, locking, other ugly things, and all that hitting typically in times when memory is tight and you'd like a quick response time. And it is a tradeoff to make because this memory can then be used for nothing else (while ZRAM will actually be used to "swap out" other memory areas while compressing it, so it is not really lost for applications).
It may even do it via ZRAM - the implementation details do not matter.
Maybe this is the reason why "we only have swap on zram" works better than "no swap": because when the "normal" (non-zram) memory is fragmented, contiguous areas of it can be reclaimed by pushing some pages to zram swapdevice and paging it back in at some other place. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
![](https://seccdn.libravatar.org/avatar/ca718a38fd132c320ab1c8a9ab68d372.jpg?s=120&d=mm&r=g)
On 15.07.22 16:05, Johannes Meixner wrote:
Where would the kernel reserve space for ZRAM?
The kernel does not need to. The pages used for ZRAM come from the page allocator, like most other pages.
When it can reserve space for ZRAM to be used as swap why can it not reserve some space directly for its memory management optimization tasks? It may even do it via ZRAM - the implementation details do not matter.
ZRAM is effectively only a compression layer + cache in front of the real swap space. Regards Oliver
![](https://seccdn.libravatar.org/avatar/0e482cdf263bd0e0421da766878b774c.jpg?s=120&d=mm&r=g)
On 15/07/2022 13.43, Neal Gompa wrote:
On Fri, Jul 15, 2022 at 6:49 AM Johannes Meixner <jsmeix@suse.de> wrote:
Hello,
Am 14.07.22 um 02:39 schrieb Neal Gompa:
...
By the way:
In the past I had learned that for hibernation swap is needed that is about the size of the RAM and it seems this is not outdated according to https://help.ubuntu.com/community/SwapFaq which is "last edited 2022-02-01".
You cannot use hibernation on any system that is booted with UEFI secure boot at this time. The Linux kernel lockdown mode prevents it from working. There was some effort a while ago to fix this when Matthew Garrett was at Google, but it stalled out after he left.
The majority of computers today will have lockdown triggered, so you can't use hibernation at all.
I am confused. I am routinely using hibernation on machines with UEFI secure boot. Leap 15.3 default kernel, normal swap on SSD or NVME. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.3 (Legolas))
![](https://seccdn.libravatar.org/avatar/ca718a38fd132c320ab1c8a9ab68d372.jpg?s=120&d=mm&r=g)
On 15.07.22 13:43, Neal Gompa wrote:
You cannot use hibernation on any system that is booted with UEFI secure boot at this time. The Linux kernel lockdown mode prevents it from working. There was some effort a while ago to fix this when Matthew Garrett was at Google, but it stalled out after he left.
Upstream now has a solution just about ready for inclusion. HTH Oliver
![](https://seccdn.libravatar.org/avatar/37ce46f3bb7af09b1da428d24b87bd4a.jpg?s=120&d=mm&r=g)
On Tue, Jul 19, 2022, at 9:16 AM, Oliver Neukum wrote:
On 15.07.22 13:43, Neal Gompa wrote:
You cannot use hibernation on any system that is booted with UEFI secure boot at this time. The Linux kernel lockdown mode prevents it from working. There was some effort a while ago to fix this when Matthew Garrett was at Google, but it stalled out after he left.
Upstream now has a solution just about ready for inclusion.
HTH Oliver
Can you elaborate? -- Chris Murphy
![](https://seccdn.libravatar.org/avatar/ca718a38fd132c320ab1c8a9ab68d372.jpg?s=120&d=mm&r=g)
On 20.07.22 19:17, Chris Murphy wrote:
On Tue, Jul 19, 2022, at 9:16 AM, Oliver Neukum wrote:
Upstream now has a solution just about ready for inclusion.
Can you elaborate?
There is an implementation that generates a key in the kernel and stores it in TPM. https://lore.kernel.org/lkml/20220504232102.469959-1-evgreen@chromium.org/#t In this case we had very little objections. Regards Oliver
![](https://seccdn.libravatar.org/avatar/c21fc637ef5f0f7d2923ee4000ad0a86.jpg?s=120&d=mm&r=g)
On 7/15/22 3:49 AM, Johannes Meixner wrote:
I wonder when the kernel needs some space to do certain memory management optimization tasks, why the kernel cannot automatically reserve such space so it could work without manual admin configuration?
It was this kind of thinking that led me to look at /proc/sys/vm/watermark_scale_factor. Its setting doesn't reserve memory; rather it determines when kswapd starts running to free memory and how much memory to free. Its units are 0.01% of memory in the system, and the default value is 10 (0.1% of memory). (This is described in https://doc.opensuse.org/documentation/leap/tuning/html/book-tuning/cha-tuni...) This default may be good for systems with relatively predictable loads, as it would keep kswapd's overhead low, but desktop/laptop systems do not have predictable loads, and sudden demands for memory can be much greater than 0.1%. Of course, if you truly run out of memory, you have to kill something off, but I've found that using a startup script to set /proc/sys/vm/watermark_scale_factor to its maximum value of 1000 (10% of memory) makes degradation much smoother without noticeable overhead. I suggest reconsideration of the default value of /proc/sys/vm/watermark_scale_factor for desktop configurations. It's really much too low currently. David
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
Also, I read Fedora's page of adding systemd-oomd and, I think, we need to add this version of oom killer by default : https://fedoraproject.org/wiki/Changes/EnableSystemdOomd
![](https://seccdn.libravatar.org/avatar/977e0d76dacb86a9385d625f612fc0b3.jpg?s=120&d=mm&r=g)
On Thu, 2022-07-14 at 09:25 +0000, Ivan Vorstanenko wrote:
Also, I read Fedora's page of adding systemd-oomd and, I think, we need to add this version of oom killer by default : https://fedoraproject.org/wiki/Changes/EnableSystemdOomd
This is a matter of resources on our side https://bugzilla.opensuse.org/show_bug.cgi?id=1200456
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
Due to different opinions, I think, it would be right to try to add "mainstream" solution (systemd-oomd), and, if there are problems, we will think about replacing with analogues (or will countinue to testing and fix it). Everybody, who want follow the changes, I has created bugreport, where described lack of OOM Killer as a bug (because in default system configuration with a lot of oppened apps can freezed), link : https://bugzilla.opensuse.org/show_bug.cgi?id=1201505. So, if somebody from developers want to add some another OOM Killer, they can write there. And created bugreport of a problem with building the most "popular" OOM Killer at the moment - systemd-oomd. Link : https://bugzilla.opensuse.org/show_bug.cgi?id=1200456 I will try to fix systemd-oomd building as I can (I am not a "system" programmer, but openSUSE will help me become one :) ), so, if someone is also concerned about this problem, he can help to fix building of systemd-oomd :)
![](https://seccdn.libravatar.org/avatar/ed90d0132a4f59f2d3a1cf82a1b70915.jpg?s=120&d=mm&r=g)
On 17.07.22 14:25, Ivan Vorstanenko wrote:
Due to different opinions, I think, it would be right to try to add "mainstream" solution (systemd-oomd), and, if there are problems, we will think about replacing with analogues (or will countinue to testing and fix it).
Everybody, who want follow the changes, I has created bugreport, where described lack of OOM Killer as a bug (because in default system configuration with a lot of oppened apps can freezed), link : https://bugzilla.opensuse.org/show_bug.cgi?id=1201505. So, if somebody from developers want to add some another OOM Killer, they can write there.
And created bugreport of a problem with building the most "popular" OOM Killer at the moment - systemd-oomd. Link : https://bugzilla.opensuse.org/show_bug.cgi?id=1200456
I'm missing the "how to reproduce" section in this bugreport. Or, in other words: "fancy feature $FOO everyone else has (but which is not universally decided on being a good idea at all) is missing" is IMHO a weak bugreport. I'd rather see something like: "If I do $THIS, problem $THAT occurs. I'm pretty sure it can be avoided using $SOFTWARE", even better if it's "I avoided $THAT by using $SOFTWARE". Right now I personally am not convinced that userspace oom killers are such a brilliant idea, even though it's obvious that the implementation of systemd-oomd has certainly some merit (looking at the memory pressure of the cgroups might be a good idea after all), I'm missing the hard facts besides some fuzzy "it feels better" placebo effect. And if it is only for a placebo effect, the implementation is quite complex. Maybe the same effect (or even some real improvement?) can be had much cheaper by putting vm.watermark_scale_factor = 1000 into sysctl.conf? -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
![](https://seccdn.libravatar.org/avatar/59d914ad47e5c3fcd4c89668adcd43a2.jpg?s=120&d=mm&r=g)
Ivan Vorstanenko schrieb:
Everybody, who want follow the changes, I has created bugreport, where described lack of OOM Killer as a bug (because in default system configuration with a lot of oppened apps can freezed), link : https://bugzilla.opensuse.org/show_bug.cgi?id=1201505. So, if somebody from developers want to add some another OOM Killer, they can write there.
As others in this thread have said, there is no "lack of OOM Killer" in our openSUSE systems even now. The kernel has an OOM Killer built-in and has had that for ages, with a lot of improvements done over the years. That said, adding a user-space OOM killer that can act before the emergency situation that the kernel one kicks in has proven useful in some situations, so adding systemd-oomd or similar will definitely be a good idea for a slice of installations (not sure how large that slice is, may be a big one). Still, it's just wrong to talk like there would be no OOM Killer around at all yet because the kernel OOM Killer is always there. Robert Kaiser
![](https://seccdn.libravatar.org/avatar/0effd334ce454cd0a6157ad75536eff8.jpg?s=120&d=mm&r=g)
Yes, you're right, Robert Kaiser I'm used to saying that because in user space it doesn't work the way I'd like it to (and not only me). But in fact, it is. I'll get out of the habit and change the description of the bug report, thanks!
participants (20)
-
Aaron Puchert
-
Carlos E. R.
-
Carlos E. R.
-
cgoll@suse.de
-
Chris Murphy
-
Daniel Noga
-
David Walker
-
Felix Niederwanger
-
Ivan Vorstanenko
-
Jan Engelhardt
-
Johannes Meixner
-
Lubos Kocman
-
Michael Hamilton
-
Neal Gompa
-
Oliver Neukum
-
Richard Brown
-
Robert Kaiser
-
Rodney Baker
-
Stefan Seyfried
-
Thorsten Kukuk