[opensuse] How Much Swap for 512-GB of RAM?
Hi Folks, Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap? The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation. Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM? Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM?
Much depends on your type of workload - anything thst size we only use for virtual hosting, so no swap. If you're not doing virtual hosting, I expect you know the workload really well, there are not many things that require that amount of memory. -- Per Jessen, Zürich (3.9°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 01:34, Per Jessen wrote:
there are not many things that require that amount of memory.
Pardon me playing semantic games: I'd say there are not many things that DEMAND that amount of memory, never mind needing any swap. I have a Firefox heavy, Thunderbird heavy desktop on 8G that doesn't go to swap. it doesn't go to swap because I've configured my VM not to swap unless it is absolutely necessary. As it happens, my swap is on rotating rust so its slow and any swapping will sow the system down, so the adage is "don't do that!" If I had a SSD? well, it's clear that I don't NEED to swap so why would I degrade a SSD by using it for swap when I don't swap in the first place? However I can imagine there are applications that make use of databases that can cache the DB in memory to improve performance. They don't need to run to swap either. There are application that can make use of 'shared memory' for cooperative coprocessors and if they are told of larger amounts of the shared memory can perform better. The adages about swap grew up, partially in the days before virtual memory when roll-on/roll-out systems were the norm for UNIX and a process was forked by swapping it out and then calling the in-core copy the fork by playing games with the process table and pointers. EVERYTHING to do with swap assumes inadequate memory. If you have adequate memory then, ipso facto, you don't swap. INADEQUATE MEMORY usually meant that you couldn't afford it, back in the days of wired for or 4kx1 memory chips on boards for the PDP-11 or the PC-286. The progression DDR, DDR2, DDR3, DDR4 has made memory chips denser and cost-per-G has fallen. That you are even talking about 0.5T RAM -- DROOL -- and a mobo that can handle it tells about how the industry is progressing. Can we expect discussions of double that here before long? Yes, it will depend on your workload and it's profile. One way yo find out is run without swap and see what crashes. Or what thrashes the VM. Where I'm at: # free -h --si total used free shared buff/cache available Mem: 7.8G 3.5G 2.5G 162M 1.8G 3.9G Swap: 5.7G 0B 5.7G but # cat /proc/sys/vm/swappiness 10 And it is made permanent at boot time by # tail /boot/sysctl.conf-5.4.8-1.g582f5cb-default kernel.msgmnb = 65536 # Increase defaults for IPC (bnc#146656) (64-bit, 4k pages) kernel.shmmax = 0xffffffffffffffff # SHMALL = SHMMAX/PAGE_SIZE*(SHMMNI/16) kernel.shmall = 0x0fffffffffffff00 # The desktop workload is sensitive to latency, so start writeout earlier # (bnc#552883) vm.dirty_ratio=20 # vm.swappiness = 10 The point being that you can tune your VM system to control many aspects of how the VM system works. There's a lot of information on the various ways and means and reasons out there. Googling drowns you in it, many of the pages quite old. <quote src="https://www.howtoforge.com/tutorial/linux-swappiness/"> Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap. Why change it? -------------- The default value is a one-fit-all solution that can't possibly be equally efficient in all of the individual use cases, hardware specifications and user needs. Moreover, the swappiness of a system is a primary factor that determines the overall functionality and speed performance of an OS. That said, it is very important to understand how swappiness works and how the various configurations of this element could improve the operation of your system and thus your everyday usage experience. As RAM memory is so much larger and cheaper than it used to be in the past, there are many users nowadays that have enough memory to almost never need to use the swap file. The obvious benefit that derives from this is that no system resources are ever occupied by the swapping process and that cached files are not moved back and forth from the RAM to the swap and vise Versa for no reason. .... The parameter value set to “60” means that your kernel will swap when RAM reaches 40% capacity. Setting it to “100” means that your kernel will try to swap everything. Setting it to 10 (like I did on this tutorial) means that swap will be used when RAM is 90% full, so if you have enough RAM memory, this could be a safe option that would easily improve the performance of your system. </quote> There's a countervailing feeling that emerges from the old dictum about VM: "Virtual memory means virtual performance" and "eight megabytes and constantly Swapping" description of an editor. Most of here are old enough to remember the days of MS-DOS, the highly interactive responsivity of the applications on the original PC, despite the useless CPU, the 8-bit bus. Yes, it was a dedicated processor; yes there was just the single thread -- the application. But more importantly the whole program was in memory. No paging, no swapping. Especially not on the systems that only had floppy disks. ==================================== No, really Lew, asking us to advise on how much swap is like asking on this forum how long it well take to get to ... London. You ask me, I might replay "about three hours", to which you ask "are they still running Concord?" and I say, no, I'll use the 401. There are 29 cities in the world called "London". There's even a "London" on Christmas Island. As far as the United States goes, you'll find two Londons in Alabama and Ohio and one each in Arizona, California, Indiana, Kentucky, Michigan, Minnesota, Missouri, Oregon, Pennsylvania, Tennessee, Texas, West Virginia and Wisconsin. No, without knowing a lot about your workload and profiling it, we can't tell you how much swap to use .. or not. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2020-01-09 at 08:33 -0500, Anton Aylward wrote:
On 09/01/2020 01:34, Per Jessen wrote:
there are not many things that require that amount of memory.
Pardon me playing semantic games:
I'd say there are not many things that DEMAND that amount of memory, never mind needing any swap.
I have a Firefox heavy, Thunderbird heavy desktop on 8G that doesn't go to swap. it doesn't go to swap because I've configured my VM not to swap unless it is absolutely necessary. As it happens, my swap is on rotating rust so its slow and any swapping will sow the system down, so the adage is "don't do that!" If I had a SSD? well, it's clear that I don't NEED to swap so why would I degrade a SSD by using it for swap when I don't swap in the first place?
This is not correct, but it is a common misconception. Your system will have less free memory and less memory used for buffers/cache. A machine with a certain amount of ram and the same workload performs faster with swap used than without. About using SSD for swap, it is not currently a concern: SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 11291 ******* 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 942 177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 22 ****************
Where I'm at: # free -h --si total used free shared buff/cache available Mem: 7.8G 3.5G 2.5G 162M 1.8G 3.9G Swap: 5.7G 0B 5.7G
Your Thunderbird/Firefox are not that heavy. Telcontar:~ # free -h --si total used free shared buff/cache available Mem: 8.0G 4.2G 2.3G 215M 1.4G 3.2G Swap: 24G 7.5G 17G Telcontar:~ # - -- Cheers, Carlos E. R. (from openSUSE 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhcwwxwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfV3mUAn2x2C8Pjo/d7wSlnfxT6 wTSCEK80AKCIF6vhdZWxA5FpuvZ8yPmHp76fXA== =qM01 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 08:55, Carlos E. R. wrote:
This is not correct, but it is a common misconception. Your system will have less free memory and less memory used for buffers/cache. A machine with a certain amount of ram and the same workload performs faster with swap used than without.
???? Yes, it will have less free memory, but so what? What do you need the free memory FOR? I can see settings where you need it for very dynamic new process creation, but let's face it, Linux takes the old UNIX model of shared binaries to a fantastic degree. The reality is that for many of us there is very little new process creation going on. Yes, I can see that buffer space is needed. I'm not talking about absolute memory starvation/allocation. I still have around 2G available to be used for IO/network buffers/caching. I'm sure, even so, that there is already significant consumption for inode and dns caching. The reality is that if, like me, you never swap AT ALL though the day, then what's the point of even creating, enabling swap? No, the point I'm trying to make is to do PROFILING to we what is going on with your system. I'm saying that if you let swappiness=60 then you'll start swapping when only 40% of you memory is used. I think that is too low a threshold. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 15.32, Anton Aylward wrote: | On 09/01/2020 08:55, Carlos E. R. wrote: |> |> This is not correct, but it is a common misconception. Your |> system will have less free memory and less memory used for |> buffers/cache. A machine with a certain amount of ram and the |> same workload performs faster with swap used than without. | | ???? | | Yes, it will have less free memory, but so what? What do you need | the free memory FOR? Buffers and cache. You need those two as big as possible. They make filesystem faster. This is measurable. And free memory is needed because Linux is constantly starting and stopping processes, and if not, it is needed to enlarge the buffers/cache space when needed. | | I can see settings where you need it for very dynamic new process | creation, but let's face it, Linux takes the old UNIX model of | shared binaries to a fantastic degree. The reality is that for | many of us there is very little new process creation going on. | | Yes, I can see that buffer space is needed. I'm not talking about | absolute memory starvation/allocation. I still have around 2G | available to be used for IO/network buffers/caching. I'm sure, | even so, that there is already significant consumption for inode | and dns caching. You see 2G because your Thunderbird and Firefox are not as big as you claim they are. :-P I have 4G free this minute. | | The reality is that if, like me, you never swap AT ALL though the | day, then what's the point of even creating, enabling swap? Bigger buffer/cache space. | | No, the point I'm trying to make is to do PROFILING to we what is | going on with your system. I'm saying that if you let | swappiness=60 then you'll start swapping when only 40% of you | memory is used. I think that is too low a threshold. My system simply crashes and OOMS by killing swap. I know what to do, and is purchasing another board that can take more ram. Meanwhile, swap in ssd has delayed that by two years at least. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXheD6QAKCRC1MxgcbY1H 1TCuAKCUepwfIUT2Ewep8GdzXuu6Hqm2WACbB65AkVOxJF+mSl5BS0JPCDepBeo= =pkm9 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 14:50, Carlos E. R. wrote:
On 09/01/2020 15.32, Anton Aylward wrote: | On 09/01/2020 08:55, Carlos E. R. wrote: |> |> This is not correct, but it is a common misconception. Your |> system will have less free memory and less memory used for |> buffers/cache. A machine with a certain amount of ram and the |> same workload performs faster with swap used than without. | | ???? | | Yes, it will have less free memory, but so what? What do you need | the free memory FOR?
Buffers and cache. You need those two as big as possible. They make filesystem faster. This is measurable.
Up to a point. Beyond that it is of no benefit. In fact this isn't something you have control over in any direct manner. Many of those 'buffers' are actually code or data pages that are memory mapped disk pages. The VM system manages those. You only have indirect control over that.
And free memory is needed because Linux is constantly starting and stopping processes, and if not, it is needed to enlarge the buffers/cache space when needed.
I've discussed that. The starting and stopping makes used of shared resources, it has always has going all the way back, in my personal experience, to UNIX V5 in the 1970s. A new user logs in and get a shell .. whatever ... the code and some static data and the code is already there. Only one instance of the code for the shell ... whatever ... no matter how many users. All done by pointers. Modern Linux with VM does this on a more fine grained matter with the library modules being shared across different applications. And that's where the buffer/cache becomes the VM, because when a DOT-SO file is opened for use the disk image is mapped for pages that are loaded on demand as the execution progresses. AND ONLY THEN. So whether you call it a code page or a buffer is moot. What makes that terminology even further confused is that the whole VM system consists of a series of pages on a linked list of some nature... clean, as yet unused pages, dirty pages that are in use, dirty pages that have not been used for a while. As a page gets accessed it get pulled to the tail of the queue. The head of the queue becomes the candidate for swapping -- MAYBE, according to an algorithm that uses many variables. That a page has been 'swapped out' doesn't mean that it isn't still in memory, on a queue. In fact it might get accesses and pulled to the tail. But the shortcoming of the way this works means that its image is still out there on the swap. That doesn't get erased. There's a 'ratchet' mechanism, so to speak. More to the point, teh way this algorithm works, stuff gets swapped out when there is no shortage and no forceable need for recovered pages. This is what the 'swappiness' is about.
| I can see settings where you need it for very dynamic new process | creation, but let's face it, Linux takes the old UNIX model of | shared binaries to a fantastic degree. The reality is that for | many of us there is very little new process creation going on. | | Yes, I can see that buffer space is needed. I'm not talking about | absolute memory starvation/allocation. I still have around 2G | available to be used for IO/network buffers/caching. I'm sure, | even so, that there is already significant consumption for inode | and dns caching.
You see 2G because your Thunderbird and Firefox are not as big as you claim they are. :-P
I have 4G free this minute.
| | The reality is that if, like me, you never swap AT ALL though the | day, then what's the point of even creating, enabling swap?
Bigger buffer/cache space.
Which is, the way you are justifying it, an illusion. In actuality, not all memory is the same. There are special properties in low memory for pointers and certain types of tables that HAVE to be in low memory and they get allocated differently. There are page clustering effects that are NECESSARY so sometimes HUGE PAGES are created. The buffer/cache you speak of is not realistic and as a conceptual model, unhelpful and misleading.
| No, the point I'm trying to make is to do PROFILING to we what is | going on with your system. I'm saying that if you let | swappiness=60 then you'll start swapping when only 40% of you | memory is used. I think that is too low a threshold.
My system simply crashes and OOMS by killing swap.
You have control via VM settings of what happens in OOM conditions. Sadly the default is to scan ALL processes for candidates to kill or default to a PANIC. You can, if you read though the docco I referred to, https://www.kernel.org/doc/Documentation/sysctl/vm.txt alter that. ============================================================== oom_kill_allocating_task This enables or disables killing the OOM-triggering task in out-of-memory situations. If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed. If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan. If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task. The default value is 0. ============================================================== and ============================================================= panic_on_oom This enables or disables panic on out-of-memory feature. If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive. If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet. If this is set to 2, the kernel panics compulsorily even on the above-mentioned. Even oom happens under memory cgroup, the whole system panics. The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover. panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. ============================================================= More to the point here, you have settings that let you ANALYSE why the OOM occured.
I know what to do, and is purchasing another board that can take more ram. Meanwhile, swap in ssd has delayed that by two years at least.
-- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 23.26, Anton Aylward wrote: | More to the point here, you have settings that let you ANALYSE why | the OOM occured. I don't need to. Simply the sum of my processes is above 8 GB, and some need to be killed. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXherHwAKCRC1MxgcbY1H 1YyoAJ4iSFBNMVUelpNjijRJ+rxDBrLcYQCeMoiviDNKeNJrgp5ZXsjjmPQQkkg= =J7IQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 23.37, Carlos E. R. wrote: | On 09/01/2020 23.26, Anton Aylward wrote: | More to the point here, | you have settings that let you ANALYSE why | the OOM occured. | | I don't need to. Simply the sum of my processes is above 8 GB, and | some need to be killed. Like a few minutes ago I updated and restarted both Firefox and Thunderbird, and 4.5 GB were freed. If both apps use 4.5 GB, and the rest of the load is 3 or 4 GB, simply the machine doesn't run with 8 GB of RAM. No need for studies. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhesLwAKCRC1MxgcbY1H 1QGEAKCOLb3TCPt3Qe0hY/4BOD/q3vgJ3wCcDJZdeLX6uC9vgmx4rqLEXYRyufE= =yB5S -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 17:41, Carlos E. R. wrote:
If both apps use 4.5 GB, and the rest of the load is 3 or 4 GB, simply the machine doesn't run with 8 GB of RAM. No need for studies.
What you are telling us is that you don't understand the purpose of a virtual memory system. There is no need for all of the pages that make up application to be loaded, only what is termed the 'working set. Strictly speaking, not even that. A process can start running a program WITH NONE OF ITS PAGES IN PHYSICAL MEMORY. As it accesses each non-resident page it generates a page fault and the kernel handles this by allocating an available (read: free) page and initializing appropriately. This goes on, building up, as the program loos and iterates, the working set. It may be that the code concerned with the start of the program only does thing like read the command line and the config file, and after that is no longer accessed. It falls out of the working set and ages its way out of use. Code pages, be they from the program or from a shared library do not need to be saved in swap for obvious reasons. if they get aged out of existence they can be read back in from the code file. Recall: the 'executables' are all mmap'd. But they DO contribute to the memory use. That 40% threshold of swapiness for example. Really it should only measure the DATA pages that need to be saved. Integrating the buffer cache with the VM rather than treating it as the separately allocated size makes it flexible. Pointing to it as a 'sized' resource, a fixed resource is meaningless. It isn't. it is dynamic, allocated out of the VM lists as needed. So, for example, a compute intense system that is not network connected would load from disk the run, run, run, run, and all that so called 'buffer cache' would become memory used by the 'intensified' application. Contrariwise, because the buffer cache depends on how the system is being used, a network server that used a long-lived PHP application as a web service would see a 'buffer cache' that made intense use of network and disk allocation. Those are extremes, of course. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 17:41, Carlos E. R. wrote:
Like a few minutes ago I updated and restarted both Firefox and Thunderbird, and 4.5 GB were freed.
Yes. When you killed of the processes you, until they were restarted, removed all mention of them from ALL internal tables, and that deallocates EVERYTHING to do with them, including the mappings of their pages to swap. POOF! All gone. No reference to the pages in swap. If you can't see where they are, no pointers or references, no way to find them, reference them, they might as well not be there. POOF! All gone. Perhaps the stuff is still there on the disk, but so what? you've no way of finding it. The allocation table has been blanked. POOF! All gone. Quite possibly the swap actually still occupied & referenced (if there is some) is scattered over a larger area of the disk. But so what? The whole point is this is about virtual memory and that need not be contiguous, so why should it be contiguous and reallocated towards the start? No reason at all.When more needs to be allocated it will be allocated to what space is available. The pages getting sent to swap aren't necessarily contiguous. This isn't the V7 roll-out/roll-in where it is done with one DMA burst, as it was on PDP-11/ UNIX V7. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 17:37, Carlos E. R. wrote:
On 09/01/2020 23.26, Anton Aylward wrote: | More to the point here, you have settings that let you ANALYSE why | the OOM occured.
I don't need to. Simply the sum of my processes is above 8 GB, and some need to be killed.
ERR ... the whole point of a virtual memory system is to let to run under those conditions. It can also be configured so that the needs of a single process exceed the amount of physical memory. Strictly speaking, swap enabled the old V5/6/7 UNIX on the PDP-11 (And the Interdata in Australia thanks to Richard Miller) to run with a sum of the processes that exceeded the physical memory. And that's before using the shared code feature. That shared code feature allowed us to run with 40 users logged in, using shell scripts (rather than binary compiled programs) and editing on a PDP-11/45 with 4 megabyes of memory and get tolerable performance. Yes the Bourne shell was smaller, but there only needed to be one copy of it. My memory informs me that the code segment of the old Bourne shell on the PDP-11/45 was 24K. The per user data was initially no larger, but the forked-off copies that interpreted scripts could have much larger data segments. However there was one and one only copy of the code no matter how many user, no matter how many scripts were being run. Sorry, Carlos, you are thinking in terms of the PC of the early 1980s or some other NON-UNIX model of memory management. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 08:55, Carlos E. R. wrote:
Your Thunderbird/Firefox are not that heavy.
Telcontar:~ # free -h --si total used free shared buff/cache available Mem: 8.0G 4.2G 2.3G 215M 1.4G 3.2G Swap: 24G 7.5G 17G Telcontar:~ #
You do not disprove my point; if anything you make it. BECAUSE you have swapiness set to 60 your VM swaps pages out even when it is not necessary. That doesn't free up memory, it just means that there is an image of that page on swap. That starts happening the moment you get to use just 40% of your memory. I think this is ridiculous. There's a simple test: turn swap off and see what crashes! <sidebar> I've found that swap, once created, is about high-water marks, not really a 'tidal flow'. If the need goes away then swap doesn't retreat. I've demonstrated this by doing a swap-off that resulted in nothing crashing and everything operating fine. There was no NEED for swap. I turned swap back on and nothing appeared immediately, not for a long while. Now I simply run with swapiness=10 and swap only happens 'in extremis'. </sidebar> Or a less aggressive one, set swapiness=10 in your /boot config for your kernel as I showed for mine in an earlier post. This will not stop swap happening if your system runs out of memory, it just sets the threshold higher. if you don't like that, try swapiness - 20. The problem with Linux swapping is that it doesn't get released easily. That swapiness=60 means that stuff gets paged out when IT IS NOT NECESSARY. Depending n your job profile (e.g. rate of creation of processes, amount of disk IO, amount of network IO, there's a lot of tuning of the VM you can do. If you really want to be aggressive about it, you can, for example, create a CGROUP for Firefox with a limit on how much memory it is allowed (use ntop to see what it IS) using. You can force that low, never mind swapiness, and by forcing that low also force the CGROUPed FF to be paged out to swap earlier, without affecting the memory of other processes. Also without creating a virtual machine to run FF in :-) Some ISPs use this to control the allocation of the virtual machines they grant to customers :-) -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 15.51, Anton Aylward wrote: | On 09/01/2020 08:55, Carlos E. R. wrote: |> |> Your Thunderbird/Firefox are not that heavy. |> |> Telcontar:~ # free -h --si total used free |> shared buff/cache available Mem: 8.0G 4.2G |> 2.3G 215M 1.4G 3.2G Swap: 24G |> 7.5G 17G Telcontar:~ # | | You do not disprove my point; if anything you make it. | | BECAUSE you have swapiness set to 60 your VM swaps pages out even | when it is not necessary. That doesn't free up memory, it just | means that there is an image of that page on swap. That starts | happening the moment you get to use just 40% of your memory. I | think this is ridiculous. It does free memory, instantly. | | There's a simple test: turn swap off and see what crashes! No, I'm not going to do that _again_. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXheEcQAKCRC1MxgcbY1H 1QWoAKCCqOrPAXDj+m1qy0UQbTo0rYK0AACfUXh96enHOj6W5Qg/pRSei+PMSSU= =VITQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 14:52, Carlos E. R. wrote:
On 09/01/2020 15.51, Anton Aylward wrote: | On 09/01/2020 08:55, Carlos E. R. wrote: |> |> Your Thunderbird/Firefox are not that heavy. |> |> Telcontar:~ # free -h --si total used free |> shared buff/cache available Mem: 8.0G 4.2G |> 2.3G 215M 1.4G 3.2G Swap: 24G |> 7.5G 17G Telcontar:~ # | | You do not disprove my point; if anything you make it. | | BECAUSE you have swapiness set to 60 your VM swaps pages out even | when it is not necessary. That doesn't free up memory, it just | means that there is an image of that page on swap. That starts | happening the moment you get to use just 40% of your memory. I | think this is ridiculous.
It does free memory, instantly.
No it doesn't. The page is still in memory. It MIGHT have been put on the 'dirty' queue marked as swapped out, so that it has a priority for being reclaimed for use, but then again, because of the image on it it might get reused for what it is. The concept of 'free memory' is sort-of meanness in the VM system. At best you have memory on a queue that is available for re-use if the demand to destroy its contents is there.
| | There's a simple test: turn swap off and see what crashes!
No, I'm not going to do that _again_.
So you'll never know. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/01/2020 00.17, Anton Aylward wrote: | On 09/01/2020 14:52, Carlos E. R. wrote: |> On 09/01/2020 15.51, Anton Aylward wrote: | On 09/01/2020 08:55, |> Carlos E. R. wrote: |> |> Your Thunderbird/Firefox are not that |> heavy. |> |> Telcontar:~ # free -h --si total used |> free |> shared buff/cache available Mem: 8.0G |> 4.2G |> 2.3G 215M 1.4G 3.2G Swap: |> 24G |> 7.5G 17G Telcontar:~ # | | You do not disprove my |> point; if anything you make it. | | BECAUSE you have swapiness |> set to 60 your VM swaps pages out even | when it is not |> necessary. That doesn't free up memory, it just | means that |> there is an image of that page on swap. That starts | happening |> the moment you get to use just 40% of your memory. I | think |> this is ridiculous. |> |> It does free memory, instantly. | | No it doesn't. The page is still in memory. It MIGHT have been put | on the 'dirty' queue marked as swapped out, so that it has a | priority for being reclaimed for use, but then again, because of | the image on it it might get reused for what it is. | | The concept of 'free memory' is sort-of meanness in the VM system. | At best you have memory on a queue that is available for re-use if | the demand to destroy its contents is there. | | | | |> | | There's a simple test: turn swap off and see what crashes! |> |> No, I'm not going to do that _again_. | | So you'll never know. I said "not again". I already did and it was a disaster. I *know* that it is disastrous. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXho5XwAKCRC1MxgcbY1H 1SyOAJ9N1u+v4wMb3lnJam1n4qi4eIfBBQCfaDkqgisvKvEZ/7ZIxHvcmR1CI/Q= =JBD0 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 11/01/2020 à 22:08, Carlos E. R. a écrit :
I said "not again". I already did and it was a disaster. I *know* that it is disastrous.
just a question. What is the difference between being short of real memory and being short of memory + swap? jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/01/2020 22.15, jdd@dodin.org wrote: | Le 11/01/2020 à 22:08, Carlos E. R. a écrit : | |> I said "not again". I already did and it was a disaster. I *know* |> that it is disastrous. |> | | just a question. | | What is the difference between being short of real memory and being | short of memory + swap? If you are short of memory but there is swap available, swap will be used instead. If there is no swap at all, or it is also fully used, the kernel will kill "something". The algorithm for choosing that "something" is not ideal, and might kill something that is needed by the desktop and it will crash, or kill something very important, like a calculation that has been running for a month, so that killing that one is a disaster. Or, you may be lucky and it kills something that is not important and the machine survives. There is a thread in factory mail list where the OOM strategy is being discused. There is a promising method for new kernels but it is not yet operative, I understand. I don't know if it is possible in that circumstance to tell the kernel to create on the fly a swap file, automatically, and delete it when the crisis has passed. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhpARwAKCRC1MxgcbY1H 1aKPAJoCtFWIe6NXDJuW0FtCM0Ku1661qgCePrwixWeAUVvVPwjCiMk0MoqwGRU= =bsQb -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 11/01/2020 16:38, Carlos E. R. wrote:
The algorithm for choosing that "something" is not ideal, and might kill something that is needed by the desktop and it will crash, or kill something very important, like a calculation that has been running for a month, so that killing that one is a disaster. Or, you may be lucky and it kills something that is not important and the machine survives.
Yes and no. If you go bacl the the page I refedd to on the virtual memory settings, one of the controls the action on OOM. You are correct in describing the default setting; the oom-killer goes though the process list to find one or more likely candidates. But another setting means that the specific program that caused the OOM is the one that is killed. https://www.kernel.org/doc/Documentation/sysctl/vm.txt ============================================================== oom_kill_allocating_task This enables or disables killing the OOM-triggering task in out-of-memory situations. If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed. If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan. If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task. The default value is 0. ============================================================== There are a few thresholds that can also affect if an when a OOM occurs. You can control the amount of overhead 'reserved' memory, the amount of memory dedicated to DMA block IO, and to some degree the amount of memory involved in the mapping tables. You might get a OOM even if there is adequate unused/available memory because the mapping tables are too small. This is something Lew might have to watch out for with his outsized memory. many VM structures are 'sized' at boot time, but there are resources that HAVE to be in low memory becuase of the way the Intel architecture works. I've seen warnings about those but never drilled down on the details. Of curse processes aren't the only things that make demands on, consume or release virtual memory: ============================================================== vfs_cache_pressure ------------------ This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are. ============================================================= and ============================================================== zone_reclaim_mode: Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs. Allocations will be satisfied from other zones / nodes in the system. This is value ORed together of 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages zone_reclaim_mode is disabled by default. For file servers or workloads that benefit from having their data cached, zone_reclaim_mode should be left disabled as the caching effect is likely to be more important than data locality. zone_reclaim may be enabled if it's known that the workload is partitioned such that each partition fits within a NUMA node and that accessing remote memory would cause a measurable performance reduction. The page allocator will then reclaim easily reusable pages (those page cache pages that are currently not used) before allocating off node pages. Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone reclaim will write out dirty pages if a zone fills up and so effectively throttle the process. This may decrease the performance of a single process since it cannot use all of system memory to buffer the outgoing writes anymore but it preserve the memory on other nodes so that the performance of other processes running on other nodes will not be affected. Allowing regular swap effectively restricts allocations to the local node unless explicitly overridden by memory policies or cpuset configurations. ============================================================== All this is further complicated by 'not all memory is the same" in a number of aspects. Quite apart from special properties of low physical memory (Intel Architecture) and high physical memory, the are forms of clustering that result in 'big pages'. Then there are CGROUPS. if you have a high demand single process, as Thunderbird of Firefox or a specific user might be, then creating a CGROUP allows much finer control over the resources any single process and its children might have, CPU, memory, disk, IO bandwidth... Why should memory be grouped into larger amounts? It has to do with mapping of the virtual to physical. The tables involved do a base/range operation The limit on the number of tables a single process can have is 64K, which is essentially not a limit but could be a cause for OOM if memory gets too fragmented. What I'm saying is that what's going on with virtual memory is not simplistic, is far removed from the residential model of PC-DOS. The Linux model is also far removed form the IBM model that is taught to CS undergrads! -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 11/01/2020 à 22:38, Carlos E. R. a écrit :
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 11/01/2020 22.15, jdd@dodin.org wrote: | Le 11/01/2020 à 22:08, Carlos E. R. a écrit : | |> I said "not again". I already did and it was a disaster. I *know* |> that it is disastrous. |> | | just a question. | | What is the difference between being short of real memory and being | short of memory + swap?
If you are short of memory but there is swap available, swap will be used instead.
sure, but do the kernel care that it's swap, not physical memory? in an other way, is 5 Gb of physical memory and 5 Gb of swap identical of 10Gb of physical memory? if there is no difference, crash will happen whatever swap is created in a time where ram is nearly as big as disk space... jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
12.01.2020 11:57, jdd@dodin.org пишет:
Le 11/01/2020 à 22:38, Carlos E. R. a écrit :
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 11/01/2020 22.15, jdd@dodin.org wrote: | Le 11/01/2020 à 22:08, Carlos E. R. a écrit : | |> I said "not again". I already did and it was a disaster. I *know* |> that it is disastrous. |> | | just a question. | | What is the difference between being short of real memory and being | short of memory + swap?
If you are short of memory but there is swap available, swap will be used instead.
That is wrong.
sure, but do the kernel care that it's swap, not physical memory?
in an other way, is 5 Gb of physical memory and 5 Gb of swap identical of 10Gb of physical memory?
No. If there is no free memory to satisfy allocation request, kernel will look for cold data in memory and will try to reclaim it. Swap is used to preserve content of anonymous memory that is reclaimed. To actually *use* swapped out data kernel must read it back into memory, which requires free memory so we are back to square one. If no memory is available or can be reclaimed at this moment, data cannot be read back and so cannot be used. And of course not all memory content can actually be swapped out in the first place. So no, swap is not memory extension. Oh, and not every reclaimed memory needs swap (which is why I said "anonymous" above). So even without swap it is possible to have more virtual memory in use than is physically available.
if there is no difference, crash will happen whatever swap is created in a time where ram is nearly as big as disk space...
jdd
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2020 10.15, Andrei Borzenkov wrote: | 12.01.2020 11:57, jdd@dodin.org пишет: |> Le 11/01/2020 à 22:38, Carlos E. R. a écrit : |>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 |>> |>> On 11/01/2020 22.15, jdd@dodin.org wrote: | Le 11/01/2020 à |>> 22:08, Carlos E. R. a écrit : | |> I said "not again". I |>> already did and it was a disaster. I *know* |> that it is |>> disastrous. |> | | just a question. | | What is the difference |>> between being short of real memory and being | short of memory |>> + swap? |>> |>> If you are short of memory but there is swap available, swap |>> will be used instead. |> | | That is wrong. In manner of speaking, no, because you say the same as I do. | |> sure, but do the kernel care that it's swap, not physical |> memory? |> |> in an other way, is 5 Gb of physical memory and 5 Gb of swap |> identical of 10Gb of physical memory? |> | | No. If there is no free memory to satisfy allocation request, | kernel will look for cold data in memory and will try to reclaim | it. Swap is used to preserve content of anonymous memory that is | reclaimed. Of course. | To actually *use* swapped out data kernel must read it back into | memory, which requires free memory so we are back to square one. Of course. | If no memory is available or can be reclaimed at this moment, data | cannot be read back and so cannot be used. | | And of course not all memory content can actually be swapped out in | the first place. Of course. | | So no, swap is not memory extension. | | Oh, and not every reclaimed memory needs swap (which is why I said | "anonymous" above). So even without swap it is possible to have | more virtual memory in use than is physically available. | |> if there is no difference, crash will happen whatever swap is |> created in a time where ram is nearly as big as disk space... |> |> jdd |> |> | | - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhr01AAKCRC1MxgcbY1H 1e2KAJwNQ+kT515JeJvn/YmbK0JhlCxHWQCeN24VtQTHttqRTJsZ2FrtiO1qGVc= =g2jy -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 04:15, Andrei Borzenkov wrote:
12.01.2020 11:57, jdd@dodin.org пишет:
If you are short of memory but there is swap available, swap will be used instead.
That is wrong.
It's not that he's "wrong" per se but that he'd worded it badly. You go on to more accurately describe what is happening.
sure, but do the kernel care that it's swap, not physical memory?
in an other way, is 5 Gb of physical memory and 5 Gb of swap identical of 10Gb of physical memory?
No. If there is no free memory to satisfy allocation request, kernel will look for cold data in memory and will try to reclaim it.
All memory is either statically allocated for the kernel and/or it's threads or is available to the VM system. The latter is ALL on one of a number of linked lists. Not all memory has a corresponding swap location, so there is no need for the size(swap) = size(RAM) Why is this? You never need to swap out code! Late model VM has memory mapped files. This is wonderful for the VM's "load on demand" capability if the CPU can do instruction restart. Most can these days but it was a problem with some of the original set of 16-bit CPUs and unavailable with the 8-it. SUN pulled a trick with a pair of 68000 chips, one a step behind the other, the fudge the restart capability on a chip that didn't have it. Fantastic engineering kludge! You never need to swap out code, you just have to mark that page as one that can be put on the 'dirty' queue and as it fails to be accessed it 'ages' along to the end, where it can, if necessary, be reclaimed. How fast that happens, how aggressively the pages are plucked from that queue for re-use are tunable parameters. Note I said "as it fails to be accessed". If it is accessed, that pages code gets accessed -- i.e. executed -- then it is brought back to the tail of the queue. Suppose that code page ages out and gets re-used 'cos the application isn't needing it ... for a while, but then the application gets round and needs that code fragment again. The page isn't there, there is a page-fault triggered, execution is suspended and the page needs to be brought back in. It is a page of a memory mapped file, so it is NOT in swap. It never was; it never will be.
Swap is used to preserve content of anonymous memory that is reclaimed.
I don't know that I'd call it 'anonymous'. This is the volatile memory that belongs to processes. "Volatile memory"? Consider: at startup a processes reads one or more config files. It opens them in read-only mode. The VM opens them as memory mapped files; the process digested them then closes them. They may be in the data space, but they are not volatile, they are read-only. When closed, their mapping tables entries and their mapped virtual memory are freed. But while they are, strictly speaking, in the data space of the application, they are like code pages. if the demand of multi-tasking necessitates suspending that process it may be that those pages get released so something else can run. (I agree, context and circumstances make it unlikely that a startup config read gets this treatment. But the logic does apply.) Like the code pages mentioned above the contents of the read-only file can be re-mapped 'on demand'. They don't need to go to swap. I think the term 'cold' is an interesting one. "Hot' pages are the ones that see the traffic, the access.
To actually *use* swapped out data kernel must read it back into memory, which requires free memory so we are back to square one. If no memory is available or can be reclaimed at this moment, data cannot be read back and so cannot be used.
And of course not all memory content can actually be swapped out in the first place.
Damn right! Obviously you don't want to swap out essential kernel code, least of all the code that implements the VM[1].
So no, swap is not memory extension.
I'm not going to give this an expletive agreement. I insist that many people mis-use swap. Linux lets you swap out pages on a 'precautionary' basis even when there is a lot of memory still available. This is the 'swapiness' setting. Turning it down but not off means that I'm not hitting my rotating rust gratuitously. Swapping ALWAYS means delay and usually means a configuration problems if there is still free memory available. A lot of your ideas about VM are contaminated by the days when you still had to pre-allocate resources. I use LVM and dynamic provisioning 'cos I don't want to tie things down. We often see people asking how they can grow a partition. OUCH! I avoid the whole issue with LVM. Strictly speaking you don't even need to configure you disk for a swap partition. As in "don't do that". The VM system will accept a file on a file system as swap. This makes things even more 'dynamic' and 'deferred design'. This gives you the opportunity to play with the allocation of swap even more than using LVM does! You can experiment with different types of file system and file allocation policies.
Oh, and not every reclaimed memory needs swap (which is why I said "anonymous" above). So even without swap it is possible to have more virtual memory in use than is physically available.
that is what the 'working set' is all about. [1] File under "You really don't want to know about this". The ICL 29000 series machine allowed for language-specific microcode. (You don't want to know about the compilers that did that, either). As such, the microcode also had to be pageable. That machine was nightmare. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand somebody said "when memory leaks, the system crashes". is this true (*crash*)? is it different with swap? (given the same total amount of memory)? jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
jdd@dodin.org wrote:
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand
somebody said "when memory leaks, the system crashes".
is this true (*crash*)?
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing. -- Per Jessen, Zürich (3.1°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 11:51, Per Jessen wrote:
jdd@dodin.org wrote:
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand
somebody said "when memory leaks, the system crashes".
is this true (*crash*)?
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing.
That is possible with the default setting, yes. However you can also configure it so the OOM_killer targets the process that caused the OOM condition and only that. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 12/01/2020 11:51, Per Jessen wrote:
jdd@dodin.org wrote:
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand
somebody said "when memory leaks, the system crashes".
is this true (*crash*)?
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing.
That is possible with the default setting, yes.
However you can also configure it so the OOM_killer targets the process that caused the OOM condition and only that.
Care to share an example? In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory. There is way of influencing the choice, I'm not sure how. I had a situation over Christmas where a customer's webserver essentially died (I may have mentioned this situation before) due to trying to serve too many requests. Loads of apache threads gobbled up the memory and the OOM killer decided to get rid of mysql. (an innocent, but important victim). -- Per Jessen, Zürich (2.9°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
12.01.2020 20:13, Per Jessen пишет:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
Which is not necessary the process that caused OOM and that you want to kill. As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Andrei Borzenkov wrote:
12.01.2020 20:13, Per Jessen пишет:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
Which is not necessary the process that caused OOM and that you want to kill.
Exactly. That's what I was trying to say.
As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill?
Yep. Very much the situation my customer experienced over Christmas. The right thing would have been to kill off some apache threads, instead mysql got killed .... -- Per Jessen, Zürich (2.3°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun, 12 Jan 2020 18:27:40 +0100 Per Jessen <per@computer.org> wrote:
Andrei Borzenkov wrote:
12.01.2020 20:13, Per Jessen пишет:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
Which is not necessary the process that caused OOM and that you want to kill.
Exactly. That's what I was trying to say.
As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill?
Yep. Very much the situation my customer experienced over Christmas. The right thing would have been to kill off some apache threads, instead mysql got killed ....
But surely the customer shot himself in the foot? Apache offers various directives for tuning the number of threads. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2020 18.37, Dave Howorth wrote: | On Sun, 12 Jan 2020 18:27:40 +0100 Per Jessen <per@computer.org> | wrote: | |> Andrei Borzenkov wrote: |> |>> 12.01.2020 20:13, Per Jessen пишет: |>>> |>>> In my experience, the oom killer kills the _best_ process, |>>> which is the one that uses the most memory. |>> |>> |>> Which is not necessary the process that caused OOM and that you |>> want to kill. |> |> Exactly. That's what I was trying to say. |> |>> As trivial example - you have long running computation |>> (something with matrices or whatever) using almost all of |>> available memory, but not going to allocate more. You start |>> browser that needs memory and provokes OOM. Which program would |>> you want to kill? |> |> Yep. Very much the situation my customer experienced over |> Christmas. The right thing would have been to kill off some |> apache threads, instead mysql got killed .... | | But surely the customer shot himself in the foot? Apache offers | various directives for tuning the number of threads. Why not automatic? I Apache sees memory pressure, give out? - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhtdpAAKCRC1MxgcbY1H 1VlCAJ9sAOTjR8CsbxDDOW0/u3bdPP06NwCeKr/m6qeTbVRKPzCtAff8ESMk/yI= =NjSO -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Sun, 12 Jan 2020 18:27:40 +0100 Per Jessen <per@computer.org> wrote:
Andrei Borzenkov wrote:
As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill?
Yep. Very much the situation my customer experienced over Christmas. The right thing would have been to kill off some apache threads, instead mysql got killed ....
But surely the customer shot himself in the foot? Apache offers various directives for tuning the number of threads.
Certainly. Put it down to lack of skill & experience. With everyone running webservers and mailserver, the necessary skill pool was exhausted several years back :-( It was the default openSUSE apache config, I think that's max 150 threads. The server does not have the oomph to serve that, and would never under normal circumstance exceed maybe 5 (being very generous). it's running a PHP app, every request is processed. A distributed attempt to find a weakness by firing several random requests per second will quickly become a DDoS attack, with 150 threads gobbling up 2-3Gb of memory. The solution was indeed to just restrict the max number of apache threads. It still means a denial-of-service, but the server survives. -- Per Jessen, Zürich (2.5°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 12:27, Per Jessen wrote:
Andrei Borzenkov wrote:
12.01.2020 20:13, Per Jessen пишет:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
Which is not necessary the process that caused OOM and that you want to kill.
Exactly. That's what I was trying to say.
As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill?
Yep. Very much the situation my customer experienced over Christmas. The right thing would have been to kill off some apache threads, instead mysql got killed ....
Actually this sounds like a situation that could best be managed by using CGROUPS to allocate and constrain resources. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 12/01/2020 12:27, Per Jessen wrote:
Andrei Borzenkov wrote:
12.01.2020 20:13, Per Jessen пишет:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
Which is not necessary the process that caused OOM and that you want to kill.
Exactly. That's what I was trying to say.
As trivial example - you have long running computation (something with matrices or whatever) using almost all of available memory, but not going to allocate more. You start browser that needs memory and provokes OOM. Which program would you want to kill?
Yep. Very much the situation my customer experienced over Christmas. The right thing would have been to kill off some apache threads, instead mysql got killed ....
Actually this sounds like a situation that could best be managed by using CGROUPS to allocate and constrain resources.
I'll certainly pass it on to the customer, but I doubt it. It is an exceptional situation, essentially a DDoS attack. The best defense was simply to restrict the number of apache threads, second best to add some swap space. -- Per Jessen, Zürich (2.8°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 12:13, Per Jessen wrote:
Anton Aylward wrote:
On 12/01/2020 11:51, Per Jessen wrote:
jdd@dodin.org wrote:
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing.
That is possible with the default setting, yes.
However you can also configure it so the OOM_killer targets the process that caused the OOM condition and only that.
Care to share an example?
The details of the what and how are in the long document on the virtual memory settings that I've mentioned here a number of times before: You have control via VM settings of what happens in OOM conditions. Sadly the default is to scan ALL processes for candidates to kill or default to a PANIC. You can, if you read though the docco I referred to, https://www.kernel.org/doc/Documentation/sysctl/vm.txt alter that. ============================================================== oom_kill_allocating_task This enables or disables killing the OOM-triggering task in out-of-memory situations. If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed. If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan. If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task. The default value is 0. ============================================================== and ============================================================= panic_on_oom This enables or disables panic on out-of-memory feature. If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive. If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet. If this is set to 2, the kernel panics compulsorily even on the above-mentioned. Even oom happens under memory cgroup, the whole system panics. The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover. panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. ============================================================= More to the point here, you have settings that let you ANALYSE why the OOM occured.
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
NOT! Suppose you have a memory intensive analytic program that goes backwards and forwards over rows & columns of data and its been running a couple of weeks ... Then you start up a small interactive program with a memory leak. There's not much merry available since the big analytic one has most of it, so this one leaks what it can get, which isn't much by comparison, and causes the OOM condition. You do not want the biggest, that analysis program, to be the one that is killed! You might wonder at the interactive one dying on you, but you definitely don't want to loose all that work!
There is way of influencing the choice, I'm not sure how.
See above
I had a situation over Christmas where a customer's webserver essentially died (I may have mentioned this situation before) due to trying to serve too many requests. Loads of apache threads gobbled up the memory and the OOM killer decided to get rid of mysql. (an innocent, but important victim).
In days of old the front end Apache started up some threads, just a few, and had them waiting for requests. After servicing a request the thread died and Apache started a new one. so the downstream code, perhaps a Perl application, also does. memory leaks associated also went away. Then some genius decided that the overhead of startup was too much so lets have long lived processes. Never mind if they have memory leaks. If your Apache is continuously spawning threads then you have a problem. It may be a configuration problem. IIR there is a setting of which determines how many threads there can be. It may be that you are being overwhelmed, either by read traffic or a DenaialOfService attack. And the way you have Apache set up it is not throttling and so servicing network connections as fast as they come in, regardless of the rate-of-service. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 12/01/2020 12:13, Per Jessen wrote:
Anton Aylward wrote:
On 12/01/2020 11:51, Per Jessen wrote:
jdd@dodin.org wrote:
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing.
That is possible with the default setting, yes.
However you can also configure it so the OOM_killer targets the process that caused the OOM condition and only that.
Care to share an example?
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory.
NOT!
My apologies Anton, but that _is_ my experience and that _is_ the default OOM killer behaviour. -- Per Jessen, Zürich (2.8°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2020 19.10, Per Jessen wrote: | Anton Aylward wrote: |> On 12/01/2020 12:13, Per Jessen wrote: |>> Anton Aylward wrote: |>>> On 12/01/2020 11:51, Per Jessen wrote: |>>>> jdd@dodin.org wrote: |> |>>>> When the system starts to run out of memoru, the OOM killer |>>>> will kick in and try to identify who is gobbling up the |>>>> memory. If it pick an innocent process, it could be seen |>>>> as the system crashing. |>>> |>>> That is possible with the default setting, yes. |>>> |>>> However you can also configure it so the OOM_killer targets |>>> the process that caused the OOM condition and only that. |>> |>> Care to share an example? |> |> |>> In my experience, the oom killer kills the _best_ process, |>> which is the one that uses the most memory. |> |> NOT! | | My apologies Anton, but that _is_ my experience and that _is_ the | default OOM killer behaviour. Mine too. It is being redesigned upstream. And factory is making changes meanwhile - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhtm5gAKCRC1MxgcbY1H 1a22AJ45fMy97xznVPUCU/yZeOTvtDZwCQCeJhim2J9ghcmFXZj09c8m1+gvuYQ= =qsRC -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 13:10, Per Jessen wrote:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory. NOT! My apologies Anton, but that _is_ my experience and that _is_ the default OOM killer behaviour.
Yes, it is the default, just like, elsewhere in this thread, we have the default of the Apache server at 150 threads resulting in a terrible DoS->crash. No-one in that thread was bothered by the idea that what was needed was to configure Apache to limit the number of threads. Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM? Why be system configuration droids in the one case and not the other? -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 12/01/2020 à 20:26, Anton Aylward a écrit :
Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM? Why be system configuration droids in the one case and not the other?
because all these problems are exceptions, not usual. Many (like me) don't feel better than kernel developers as how to set defaults. eventually open a bugzilla on the subject, just in case this is an old setup forgotten... jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sun, 12 Jan 2020 14:26:45 -0500 Anton Aylward <opensuse@antonaylward.com> wrote:
Yes, it is the default, just like, elsewhere in this thread, we have the default of the Apache server at 150 threads resulting in a terrible DoS->crash.
No, that isn't the Apache default. That seems like a terribly misguided openSUSE change. I'm slightly interested to know why they would do that?
No-one in that thread was bothered by the idea that what was needed was to configure Apache to limit the number of threads.
Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM? Why be system configuration droids in the one case and not the other?
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 12/01/2020 13:10, Per Jessen wrote:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory. NOT! My apologies Anton, but that _is_ my experience and that _is_ the default OOM killer behaviour.
Yes, it is the default, just like, elsewhere in this thread, we have the default of the Apache server at 150 threads resulting in a terrible DoS->crash. No-one in that thread was bothered by the idea that what was needed was to configure Apache to limit the number of threads.
Just for completeness - tuning apache only means the system will survive such an attack, but the denial-of-service remains. Adding swap will have the same effect. The firewall could also be tuned to reject more than X new requests per second, same result. The real issue is that the php app is being cranked up for every request, also invalid ones that lead to a 404. The real solution is probably to add some rewriting rules to only have valid requests processed by the php app. That is what I have recommended to the customer.
Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM?
That's no better than the default. Still plenty of options for killing an innocent process. -- Per Jessen, Zürich (4.3°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, 13 Jan 2020 11:24:26 +0100 Per Jessen <per@computer.org> wrote:
Anton Aylward wrote:
On 12/01/2020 13:10, Per Jessen wrote:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory. NOT! My apologies Anton, but that _is_ my experience and that _is_ the default OOM killer behaviour.
Yes, it is the default, just like, elsewhere in this thread, we have the default of the Apache server at 150 threads resulting in a terrible DoS->crash. No-one in that thread was bothered by the idea that what was needed was to configure Apache to limit the number of threads.
Just for completeness - tuning apache only means the system will survive such an attack, but the denial-of-service remains. Adding swap will have the same effect. The firewall could also be tuned to reject more than X new requests per second, same result. The real issue is that the php app is being cranked up for every request, also invalid ones that lead to a 404. The real solution is probably to add some rewriting rules to only have valid requests processed by the php app. That is what I have recommended to the customer.
Another possibility (which I'm sure would go down like a lead balloon) would be to switch away from PHP to some technology that doesn't start an app for each request. Let me think - Perl had that about twenty years ago? :)
Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM?
That's no better than the default. Still plenty of options for killing an innocent process.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dave Howorth wrote:
On Mon, 13 Jan 2020 11:24:26 +0100 Per Jessen <per@computer.org> wrote:
Anton Aylward wrote:
On 12/01/2020 13:10, Per Jessen wrote:
In my experience, the oom killer kills the _best_ process, which is the one that uses the most memory. NOT! My apologies Anton, but that _is_ my experience and that _is_ the default OOM killer behaviour.
Yes, it is the default, just like, elsewhere in this thread, we have the default of the Apache server at 150 threads resulting in a terrible DoS->crash. No-one in that thread was bothered by the idea that what was needed was to configure Apache to limit the number of threads.
Just for completeness - tuning apache only means the system will survive such an attack, but the denial-of-service remains. Adding swap will have the same effect. The firewall could also be tuned to reject more than X new requests per second, same result. The real issue is that the php app is being cranked up for every request, also invalid ones that lead to a 404. The real solution is probably to add some rewriting rules to only have valid requests processed by the php app. That is what I have recommended to the customer.
Another possibility (which I'm sure would go down like a lead balloon) would be to switch away from PHP to some technology that doesn't start an app for each request. Let me think - Perl had that about twenty years ago? :)
Switching is not an option, the app does not come in any other implementation :-) I guess they could use php fastcgi, but I think they are happy with the system surviving even if clients can't be served. -- Per Jessen, Zürich (5.6°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 05:45, Per Jessen wrote:
Switching is not an option, the app does not come in any other implementation :-) I guess they could use php fastcgi, but I think they are happy with the system surviving even if clients can't be served.
Elsewhere, a similar problems was solved ... The firewall did a round-robin direction of the requests to a number of small machines running the application front end. There was no sophisticated load balancing, feedback from the machines. The round-robin serving was enough. There were enough machines so that any DoS attack was lessened and programmatic attack only took out one machine. The front end machines described above insulated the back end database server. not only could attackers not take down the DB, they could not directly attack it either. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Anton Aylward wrote:
On 13/01/2020 05:45, Per Jessen wrote:
Switching is not an option, the app does not come in any other implementation :-) I guess they could use php fastcgi, but I think they are happy with the system surviving even if clients can't be served.
Elsewhere, a similar problems was solved ...
The firewall did a round-robin direction of the requests to a number of small machines running the application front end. There was no sophisticated load balancing, feedback from the machines.
Sounds like an LVS system. Yes, such a setup is very useful.
The round-robin serving was enough. There were enough machines so that any DoS attack was lessened and programmatic attack only took out one machine.
I'll suggest to the customer they may want to rent many more machines from us :-) -- Per Jessen, Zürich (7.6°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 09:22, Per Jessen wrote:
I'll suggest to the customer they may want to rent many more machines from us :-)
Depending on how heavy (size, CPU/memory) the front end are vs the database side processing, this may be a case for 'toilet paper' computing. Low end, disposable, munchkin machines stacked endlessly. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 05:34, Dave Howorth wrote:
Another possibility (which I'm sure would go down like a lead balloon) would be to switch away from PHP to some technology that doesn't start an app for each request. Let me think - Perl had that about twenty years ago? :)
Yea, I remember doing that. Perhaps that was why I never learnt PHP. Back then, Perl was necessary and it has always been sufficient for my needs. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 05:24, Per Jessen wrote:
Why should this be different? What's the object to alter the default behaviour of the OOM_killer so that it kills the process that caused the OOM?
That's no better than the default. Still plenty of options for killing an innocent process.
I will grant you that the people who write technical documentation are often not the best at precisely expressing the purpose and intent, and even we two see subtle differences in the way we've grown up using the English language, but this one puzzles me. "Innocent"? To my mind an innocent process is one that didn't cause the OOM. Now the way I read the docco there are two settings. One walks though the process table killing of processes. In all probability the first one will be innocent. How much memory will be released? Why should the OOM_killer proceed to another one? The docco does make clear just what constitutes 'necessary' and 'sufficient' in this case. Bad Docco! The alternative setting causes the OOM_killer to select, first and foremost, the process that caused the OOM. Certainly if this is a nasty_process for any one of a number of reasons such as a a pernicious memory leak, then this strikes me as a sensible move. The docco also notes that there are ways to take memory dump and debug why this happens. Well, that's a bit of a techie/geek outlook, isn't it. Thee and Mee in a production environment won't have time for all that. Like Der Furhur in Mahogany Row says "Get It Running Again And Keep It Running This Time!" That's what the customers want, that's what the shareholders want. Who are we mere myrmidons to deny them that? So the nasty_process that caused the OOM is killed off, freeing memory. Now we're back in the situation where that randomly selected, possibly - nay, probably - innocent process was killed. "Now What"? What constitutes "necessary" and "sufficient"? What happens next? There are two possibilities here. The one I see is that the whole things finishes there. The immediate cause for the OOM has gone, memory is free. Other process,, ALL other processes continue. Perhaps in case #1 the nasty_process continues, continues to leak memory, and eventually causes another OOM. This is why I prefer case #2. Kill The Bugger. Then everyone else can proceed. The second is that there are some 'hidden variables' about what constitutes 'necessary' and 'sufficient' and the OOM_killer keeps killing process, certainly in setting case 1 and now setting case 2 has degenerated into case 1. the documentation doesn't say anything about what happens next. BAD DOCCO! -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 13/01/2020 à 14:47, Anton Aylward a écrit :
in setting case 1 and now setting case 2 has degenerated into case 1. the documentation doesn't say anything about what happens next.
BAD DOCCO!
may be in the source code? or ask the programmer :-( I guess that this part of the code was once very much checked. But when was this "once"? jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/2020 14.47, Anton Aylward wrote: | On 13/01/2020 05:24, Per Jessen wrote: |> |>> Why should this be different? What's the object to alter the |>> default behaviour of the OOM_killer so that it kills the |>> process that caused the OOM? | |> That's no better than the default. Still plenty of options for |> killing an innocent process. | | I will grant you that the people who write technical documentation | are often not the best at precisely expressing the purpose and | intent, and even we two see subtle differences in the way we've | grown up using the English language, but this one puzzles me. | | "Innocent"? To my mind an innocent process is one that didn't cause | the OOM. | | Now the way I read the docco there are two settings. One walks | though the process table killing of processes. In all probability | the first one will be innocent. How much memory will be released? | Why should the OOM_killer proceed to another one? The docco does | make clear just what constitutes 'necessary' and 'sufficient' in | this case. | | Bad Docco! | | The alternative setting causes the OOM_killer to select, first and | foremost, the process that caused the OOM. Certainly if this is a | nasty_process for any one of a number of reasons such as a a | pernicious memory leak, then this strikes me as a sensible move. | The docco also notes that there are ways to take memory dump and | debug why this happens. Well, that's a bit of a techie/geek | outlook, isn't it. Thee and Mee in a production environment won't | have time for all that. Like Der Furhur in Mahogany Row says "Get | It Running Again And Keep It Running This Time!" That's what the | customers want, that's what the shareholders want. Who are we mere | myrmidons to deny them that? But who is the culprit of the OOM, the last process that requests memory and fails? Maybe there is a much earlier process that is leaking memory and ate most of it, so that the last process is just the last drop in the vase. It can happen that one is running a long calculation that takes a lot of memory; maybe I would prefer something else killed rather than this one, so that the big one finishes. Maybe instead of killing a big process, it could be sent to sleep and swapped to emergency file, to be recovered and restarted later. The admin is sent an email so that he takes action (like creating more swap temporarily, or killing something else: he decides). Maybe instead of killing, the system can create temporary swap file and mail the admin. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhx/wAAKCRC1MxgcbY1H 1QW9AKCNPsvQnapd4mlLRZSt0BgtJoZYQgCcCmUM9m5oOmcU0q/4LsW1/1/DxK0= =Uq+l -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 13/01/2020 à 15:33, Carlos E. R. a écrit :
It can happen that one is running a long calculation that takes a lot of memory; maybe I would prefer something else killed rather than this one, so that the big one finishes.
I get the impression that all this is matter of bad programming. I made calculations on my HP-41 with so little memory my mouse have probably more... when a program knows it have to use much memory, nothing prevent it to ask first to see if the mem is available before using it. think in term of money "so sorry, my banker, I just had an urgent need of money, so I made a huge check. Please accept it"? jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/2020 15.47, jdd@dodin.org wrote: | Le 13/01/2020 à 15:33, Carlos E. R. a écrit : | |> It can happen that one is running a long calculation that takes a |> lot of memory; maybe I would prefer something else killed rather |> than this one, so that the big one finishes. |> | | I get the impression that all this is matter of bad programming. I | made calculations on my HP-41 with so little memory my mouse have | probably more... Those things did not use dynamic memory assignments. | | when a program knows it have to use much memory, nothing prevent it | to ask first to see if the mem is available before using it. It may be impossible to calculate with exactitude. It depends on the calculations and the dataset. It is possible that to predict the memory needed you need to run it... But the program might check available memory and stop at midpoint, or save temporary results. But continuing a long calculation from status file is not easy to (create) program. | | think in term of money | | "so sorry, my banker, I just had an urgent need of money, so I made | a huge check. Please accept it"? It is called a credit line :-) - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhyGFgAKCRC1MxgcbY1H 1SAnAJ4myL07Xe0/q5wLSNjGKiT+bEYYGgCdGC4XVvQqONknjiLyTiB8lMuQjP0= =1pIg -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/01/2020 15.47, jdd@dodin.org wrote: | I get the impression that all this is matter of bad programming. I | made calculations on my HP-41 with so little memory my mouse have | probably more... The lunar lander, I think it was that, had a "near memory exhausting" alarm, on the first mission that landed on the moon. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhyGhAAKCRC1MxgcbY1H 1crBAJ9F7uTtM3m0Isrgjn5XQ5T70USb5QCeMpHj+5Gy1iW4QbdsKy1jdy44VHs= =Q8M4 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 13/01/2020 à 16:02, Carlos E. R. a écrit :
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 13/01/2020 15.47, jdd@dodin.org wrote: | I get the impression that all this is matter of bad programming. I | made calculations on my HP-41 with so little memory my mouse have | probably more...
The lunar lander, I think it was that, had a "near memory exhausting" alarm, on the first mission that landed on the moon.
yes, and it didn't crash :-)) jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, 13 Jan 2020 16:20:28 +0100 "jdd@dodin.org" <jdd@dodin.org> wrote:
Le 13/01/2020 à 16:02, Carlos E. R. a écrit :
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 13/01/2020 15.47, jdd@dodin.org wrote: | I get the impression that all this is matter of bad programming. I | made calculations on my HP-41 with so little memory my mouse have | probably more...
The lunar lander, I think it was that, had a "near memory exhausting" alarm, on the first mission that landed on the moon.
yes, and it didn't crash :-))
https://en.wikipedia.org/wiki/Apollo_Guidance_Computer#PGNCS_trouble
jdd
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 09:33, Carlos E. R. wrote:
But who is the culprit of the OOM, the last process that requests memory and fails? Maybe there is a much earlier process that is leaking memory and ate most of it, so that the last process is just the last drop in the vase.
A very good point! As I said, BAD DOCCO. So long as you can keep coming up with justifications like that, you are pointing out the inherit design flaws that the designers never considered. Good for you, bad for them. The problem with reporting that as a bug is that they are likely to simply dismiss it. Personally, I think that is a VERY realistic scenario!
It can happen that one is running a long calculation that takes a lot of memory; maybe I would prefer something else killed rather than this one, so that the big one finishes.
Agreed. Indumitedly!
Maybe instead of killing a big process, it could be sent to sleep and swapped to emergency file, to be recovered and restarted later. The admin is sent an email so that he takes action (like creating more swap temporarily, or killing something else: he decides).
Maybe instead of killing, the system can create temporary swap file and mail the admin.
maybe a lot of things. maybe a lot of code. maybe a case for UI design. "Who are you going to kill today, Mr Admin" oh, look, all those background processes started by systemd that are questionable value in this context, like the print system, but if I kill them off using your "chose one of the above" UI systemd just starts them again ... It does strike me as a lot of code to go into the kernel and the gurus thereof will, I suspect, argue that this is such an exceptional case that it doesn't warrant that much effort. Heck, I've encountered that even with applications. Oftentimes the design is architecture so as to make fixing such problems difficult. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 13/01/2020 11:52, Anton Aylward wrote:
It does strike me as a lot of code to go into the kernel and the gurus thereof will, I suspect, argue that this is such an exceptional case that it doesn't warrant that much effort. Heck, I've encountered that even with applications. Oftentimes the design is architecture so as to make fixing such problems difficult.
It seems there is a way. Cgroups (or, if you want to approach it that way, 'containers'.) <quote src="https://lwn.net/Articles/590960/"> <BIG, BOLD>User-space out-of-memory handling</> March 19, 2014 by David Rientjes Users of Linux sometimes find themselves faced with the dreaded out-of-memory (OOM) killer, an unavoidable consequence of having overcommitted memory and finding swap completely filled up. The kernel finds itself with no other option than to abruptly kill a process when no memory can be reclaimed. The OOM killer has claimed web browsers, media players, and even X window environments as victims for years. It's very easy to lose a large amount of work in the blink of an eye. Occasionally, the OOM killer will actually do something helpful: it will kill a rogue memory-hogging process that is leaking memory and unfreeze everything else that is trying to make forward progress. Most of the time, though, it sacrifices something of importance without any notification; it's these encounters that we remember. One of my goals in my work at Google is to change that. I've recently proposed a patchset to actually give a process a notification of this impending doom and the ability to do something about it. Imagine, for example, being able to actually select what process is sacrificed at runtime, examine what is leaking memory, or create an artifact to save for debugging later. This functionality is needed if we want to do anything other than simply kill the process on the machine that will end up freeing the most memory — the only thing the OOM killer is guaranteed to do. Some influence on that heuristic is available through /proc/<pid>/oom_score_adj, which either biases or discounts an amount of memory for a process, but we can't do anything else and we can't possibly implement all practical OOM-kill responses into the kernel itself. So, for example, we can't force the newest process to be killed in place of a web server that has been running for over a year. We can't compare the memory usage of a process with what it is expected to be using to determine if it's out of bounds. We also can't kill a process that we deem to be the lowest priority. </quote> well isn't that a précis of what what we've been discussing? I step though the see if I have the memory control he describnes in my kernel and is it mounted. yes, and yes. I see in there mention of memory.oom_control: allows processes to register eventfd() notifications when this memcg is out of memory and control whether the kernel will kill a process or not. which is interesting. Drill down on that at another time. The crucial bit is this: <quote> My patch set adds another control file to this set: memory.oom_reserve_in_bytes: the amount of memory, in bytes, that can be charged by processes waiting for OOM notification. Keep reading to see why this is useful and necessary. The limit of the root memcg is infinite so that processes attached to it may charge as much memory as possible from the kernel. When memory.use_hierarchy is enabled, the usage, limit, and reserves of descendant memcgs are accounted to the parent as well. This allows a memcg to overcommit its resources, an important aspect of memcg that we'll talk about later. If a memcg limits its usage to 512 MiB and has two child memcgs with limits of 512 MiB and 256 MiB each, for example, then the group as a whole is overcommitted. </quote> it doesn't seem there. Ah well. It looks like this specific patch didn't make it. Nevertheless, the CGROUP mechanism, call it 'containers' if you will, is going to let a process run with a certain allocation of memory. It may OOM its allocation, yes. That, too, tells you something, perhaps, about its future behaviour. ======================= Stop, wait, what was that about "oom_score_adj"/ It seems that the OOM-killer isn't quite random. The kernel does assign a score that lets the oom_killer choose one process over another. This, too, needs looking into if this thread is to progress. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Per Jessen wrote:
jdd@dodin.org wrote:
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand
somebody said "when memory leaks, the system crashes".
is this true (*crash*)?
When the system starts to run out of memoru, the OOM killer will kick in and try to identify who is gobbling up the memory. If it pick an innocent process, it could be seen as the system crashing.
There is another setting - you can let the system throw a kernel panic on out-of-memory, that sounds more like a crash. -- Per Jessen, Zürich (2.6°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 11:43, jdd@dodin.org wrote:
Le 12/01/2020 à 17:15, Anton Aylward a écrit : (...)
that is what the 'working set' is all about.
interesting, but not the question I try to understand
somebody said "when memory leaks, the system crashes".
is this true (*crash*)?
Yes, no, maybe. A memory leak is when an application repeatedly malloc()s space and doesn't return it. It's a green consumer. If it is a long lived process then perhaps it last long enough to consume prodigious amounts of memory leading to the crash. Perhaps its a short lived process and dies before that happens. Perhaps it's a only associated with a user being logged in (FF with a leaky plug-in, perhaps) and MAYBE the user logs out before the crash happens. MAYBE.
is it different with swap? (given the same total amount of memory)?
The malloc()'d data counts as volatile so it's going to be eligible for paging out to swap. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 11:15, Anton Aylward wrote:
And of course not all memory content can actually be swapped out in the first place.
Damn right! Obviously you don't want to swap out essential kernel code, least of all the code that implements the VM[1].
I said it before and I'll say it again. Not all memory content can actually be swapped out in the first place because it doesn't need to be swapped out. It's not eligible to be swapped out. It's code. It's non-volatile data (for example in a read only memory mapped file). -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2020-01-12 17:15, Anton Aylward wrote:
On 12/01/2020 04:15, Andrei Borzenkov wrote:
12.01.2020 11:57, jdd@dodin.org пишет:
If you are short of memory but there is swap available, swap will be used instead.
That is wrong.
It's not that he's "wrong" per se but that he'd worded it badly. You go on to more accurately describe what is happening.
sure, but do the kernel care that it's swap, not physical memory?
in an other way, is 5 Gb of physical memory and 5 Gb of swap identical of 10Gb of physical memory?
No. If there is no free memory to satisfy allocation request, kernel will look for cold data in memory and will try to reclaim it.
All memory is either statically allocated for the kernel and/or it's threads or is available to the VM system. The latter is ALL on one of a number of linked lists.
Not all memory has a corresponding swap location, so there is no need for the size(swap) = size(RAM) Why is this? You never need to swap out code! Late model VM has memory mapped files. This is wonderful for the VM's "load on demand" capability if the CPU can do instruction restart. Most can these days but it was a problem with some of the original set of 16-bit CPUs and unavailable with the 8-it. SUN pulled a trick with a pair of 68000 chips, one a step behind the other, the fudge the restart capability on a chip that didn't have it. Fantastic engineering kludge!
You never need to swap out code, you just have to mark that page as one that can be put on the 'dirty' queue and as it fails to be accessed it 'ages' along to the end, where it can, if necessary, be reclaimed. How fast that happens, how aggressively the pages are plucked from that queue for re-use are tunable parameters. Note I said "as it fails to be accessed". If it is accessed, that pages code gets accessed -- i.e. executed -- then it is brought back to the tail of the queue.
Suppose that code page ages out and gets re-used 'cos the application isn't needing it ... for a while, but then the application gets round and needs that code fragment again. The page isn't there, there is a page-fault triggered, execution is suspended and the page needs to be brought back in. It is a page of a memory mapped file, so it is NOT in swap. It never was; it never will be.
Swap is used to preserve content of anonymous memory that is reclaimed.
I don't know that I'd call it 'anonymous'. This is the volatile memory that belongs to processes.
"Volatile memory"? Consider: at startup a processes reads one or more config files. It opens them in read-only mode. The VM opens them as memory mapped files; the process digested them then closes them. They may be in the data space, but they are not volatile, they are read-only. When closed, their mapping tables entries and their mapped virtual memory are freed. But while they are, strictly speaking, in the data space of the application, they are like code pages. if the demand of multi-tasking necessitates suspending that process it may be that those pages get released so something else can run. (I agree, context and circumstances make it unlikely that a startup config read gets this treatment. But the logic does apply.) Like the code pages mentioned above the contents of the read-only file can be re-mapped 'on demand'. They don't need to go to swap.
Nice discussion, revives old (swapped-out) memories. Doesn't the kernel tries to make use of ALL available memory? Files that are written back to disc, but still kept in mem (just in case)... And all the code-pages, that are once loaded into mem, but never used, is there any need for them to stay in mem? (writing a number of mem-segments to swap might be faster, than re-loading binary pages from the executable) Only DEC made it just as cpu-expensive afaicr. It's only those extreme mem-hogs like "some webbrowsers" and ill-programmed database queries that caused systems to go thrashing. But after all, needing 512GB mem or more? I really wonder why... I once had two 64-core machines with 2TB mem, but that was for running hundreds of virtual desktops. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, 18 Jan 2020 22:14:53 +0100 suse@a-domani.nl wrote:
But after all, needing 512GB mem or more? I really wonder why...
Pretty much all searches go a lot faster if they're in memory rather than backing store, even if the backing store these days is SSD. And there are lots of applications where the data to be searched can get very large. Think pretty much any biological query or any query over a large population. Or both. Then it just depends on how many queries you're running. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-01-18 a las 22:52 -0000, Dave Howorth escribió:
On Sat, 18 Jan 2020 22:14:53 +0100 suse@a-domani.nl wrote:
But after all, needing 512GB mem or more? I really wonder why...
Pretty much all searches go a lot faster if they're in memory rather than backing store, even if the backing store these days is SSD.
I wonder what is the time cost if they are in NVMe media. That stuff is directly addressable on the i/o bus. Not as fast as RAM, but the difference is not that big as it was. Orders of magnitude advances. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXiOajRwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVfzMAnA1XNWwDGywK07a+1aNa pR6jlbTmAJ9+KtxH1loctaLatoUxDABY9SbZJw== =So5H -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 El 2020-01-18 a las 22:14 +0100, suse@a-domani.nl escribió:
On 2020-01-12 17:15, Anton Aylward wrote:
On 12/01/2020 04:15, Andrei Borzenkov wrote:
12.01.2020 11:57, jdd@dodin.org пишет:
...
"Volatile memory"? Consider: at startup a processes reads one or more config files. It opens them in read-only mode. The VM opens them as memory mapped files; the process digested them then closes them. They may be in the data space, but they are not volatile, they are read-only. When closed, their mapping tables entries and their mapped virtual memory are freed. But while they are, strictly speaking, in the data space of the application, they are like code pages. if the demand of multi-tasking necessitates suspending that process it may be that those pages get released so something else can run. (I agree, context and circumstances make it unlikely that a startup config read gets this treatment. But the logic does apply.) Like the code pages mentioned above the contents of the read-only file can be re-mapped 'on demand'. They don't need to go to swap.
Nice discussion, revives old (swapped-out) memories. Doesn't the kernel tries to make use of ALL available memory?
Yes.
Files that are written back to disc, but still kept in mem (just in case)... And all the code-pages, that are once loaded into mem, but never used, is there any need for them to stay in mem? (writing a number of mem-segments to swap might be faster, than re-loading binary pages from the executable)
Maybe. Windows did this trick of reloading code from the exe files. I saw it in the documentation of 3.10. It conserves disk space, and writing time, but it is slower reading because the object has to be found in disk, and because exes have a jump table that has to be recalculated; ie, every jump in the code on file has to be recalculated to match the position in ram. Frankly, I don't know myself if Linux does this or not. - -- Cheers Carlos E. R. (from openSUSE 15.1 (Legolas)) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXiOZdxwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVBIoAn3roAOJnFvVnJju428up N3qmLmHUAKCNC9JYFKaBFedcUQ+nj3ve7pqQ8g== =WOxJ -----END PGP SIGNATURE-----
On 18/01/2020 18:49, Carlos E. R. wrote:
El 2020-01-18 a las 22:14 +0100, suse@a-domani.nl escribió:
On 2020-01-12 17:15, Anton Aylward wrote:
On 12/01/2020 04:15, Andrei Borzenkov wrote:
12.01.2020 11:57, jdd@dodin.org пишет:
...
"Volatile memory"? Consider: at startup a processes reads one or more config files. It opens them in read-only mode. The VM opens them as memory mapped files; the process digested them then closes them. They may be in the data space, but they are not volatile, they are read-only. When closed, their mapping tables entries and their mapped virtual memory are freed. But while they are, strictly speaking, in the data space of the application, they are like code pages. if the demand of multi-tasking necessitates suspending that process it may be that those pages get released so something else can run. (I agree, context and circumstances make it unlikely that a startup config read gets this treatment. But the logic does apply.) Like the code pages mentioned above the contents of the read-only file can be re-mapped 'on demand'. They don't need to go to swap.
Nice discussion, revives old (swapped-out) memories. Doesn't the kernel tries to make use of ALL available memory?
Yes.
Files that are written back to disc, but still kept in mem (just in case)... And all the code-pages, that are once loaded into mem, but never used, is there any need for them to stay in mem? (writing a number of mem-segments to swap might be faster, than re-loading binary pages from the executable)
Maybe.
Windows did this trick of reloading code from the exe files. I saw it in the documentation of 3.10. It conserves disk space, and writing time, but it is slower reading because the object has to be found in disk, and because exes have a jump table that has to be recalculated; ie, every jump in the code on file has to be recalculated to match the position in ram.
Frankly, I don't know myself if Linux does this or not.
In late model Linux, that is, post about 2.6, all file IO is mmap9)ed, either explicitly or by the kernel. So the executable binaries, either the stuff in /usr/bin or the DOT-SO files in /usr/lib (or wherever) are opened as files and mmap()d. That is as long as they are in use, that is, the program is not terminated and there is some process using the shared or dedicated libraries. As files, they have file descriptors. That is the important part. Perhaps that is what makes this different from Windows. The inodes are cached, of course, so the pointers to the disk, if and when the pages of code are to be read in, are there already. But once read in ... At worst, they become unused and are marked 'dirty'. and the dirty queue is merely some pages that MIGHT be recoiled. It doesn't mean they have been released, yet. I don't know what you mean by 'jump table' and don't know what you mean by it being recalculated every time ... Yes, the prime binary has the list of the libraries it makes use of. You can see that with the 'ldd' command:
ldd /bin/bash linux-vdso.so.1 (0x00007ffc7d95f000) libreadline.so.7 => /lib64/libreadline.so.7 (0x00007fbc6c498000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fbc6c294000) libc.so.6 => /lib64/libc.so.6 (0x00007fbc6beda000) libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007fbc6bcac000) /lib64/ld-linux-x86-64.so.2 (0x00007fbc6c9ec000)
I don't know if you are talking about the VM mapping when you talk of jump tables. I donb't see what 'every time' has to do with it. Once set up in the processes virtual memory, it is simply that the page tables are there. They constitute a complete entity. If the page is there then it's just like it is there, no recalculation. As far as I know, the OS sets that up for the hardware to do autonomously. If it isn't there then a page fault is raised. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 20/01/2020 18.01, Anton Aylward wrote: | On 18/01/2020 18:49, Carlos E. R. wrote: ... | I don't know what you mean by 'jump table' and don't know what you | mean by it being recalculated every time ... Yes, the prime | binary has the list of the libraries it makes use of. You can see | that with the 'ldd' command: I said "windows 3". - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXiYeEQAKCRC1MxgcbY1H 1VHHAKCVO91Qmu1tYMLga5LJPE7wbjEpdgCgjrmMp8Uus+nmmgro4EQpJA15F7Q= =0g2g -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 09.57, jdd@dodin.org wrote:
Le 11/01/2020 à 22:38, Carlos E. R. a écrit :
On 11/01/2020 22.15, jdd@dodin.org wrote: | Le 11/01/2020 à 22:08, Carlos E. R. a écrit : | |> I said "not again". I already did and it was a disaster. I *know* |> that it is disastrous. |> | | just a question. | | What is the difference between being short of real memory and being | short of memory + swap?
If you are short of memory but there is swap available, swap will be used instead.
sure, but do the kernel care that it's swap, not physical memory?
in an other way, is 5 Gb of physical memory and 5 Gb of swap identical of 10Gb of physical memory?
if there is no difference, crash will happen whatever swap is created in a time where ram is nearly as big as disk space...
The processor can not use something that is in swap this moment; it has to copy it first to RAM. So in a memory almost full situation the kernel first finds some block of memory that is not needed now (maybe by counting how long it has been since last used), writes it to swap, frees that block, then read from swap the block that it needs and writes it to the free ram. So the operation is slow. The situation is called trashing; with a rotating disk you hear it working and the system becomes slow. Approximately. There may be details I do not consider but Andrei does :-) Ideally, if I'm working with Thunderbird, and then go to the workspace where LibreOffice is open, Thunderbird would be sent to swap, and LO would be taken out of swap. But I doubt the kernel knows my intentions of using LO for a while and not Th. Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of memory. That imprecise saying is good enough for me :-) This instant, my 8 GiB system is using 7.2 GiB of swap, but there are 3.1 GiB of ram available (between fully free (2.66) and buffers/cache (1.1)). But it is fresh out of hibernation, the proportion of swap is higher. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
Le 12/01/2020 à 11:47, Carlos E. R. a écrit :
Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of memory. That imprecise saying is good enough for me :-)
in my present understanding, more ram is much better than more swap. I situation is so that ram is nearly full, swap is of little use if there is only one application using most of the ram (it can't be swapped) when one reaches 512Gb ram and have only 1 Tb ssd for swap I wonder if we are not at the extreme possibility of the system? if 1Tb of swap is *really* useful, go for 1Tb more ram. look like in case of lack of ram situation swap is only a solution if several memory chunks are not really used (for example unseen tabs in firefox), and if crash can happen it will happen also soon or later with swap jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2020 11.57, jdd@dodin.org wrote: | Le 12/01/2020 à 11:47, Carlos E. R. a écrit : | |> Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of |> memory. That imprecise saying is good enough for me :-) | | in my present understanding, more ram is much better than more | swap. Obviously, yes :-) For everybody, if we can obtain and install it. For example, my motherboard is maxed at 8 GiB, I can not install more. | I situation is so that ram is nearly full, swap is of little use if | there is only one application using most of the ram (it can't be | swapped) It can be partially swapped. The kernel doesn't normally swap applications fully, just portions of it. | | when one reaches 512Gb ram and have only 1 Tb ssd for swap I wonder | if we are not at the extreme possibility of the system? | | if 1Tb of swap is *really* useful, go for 1Tb more ram. Well, it depends :-) For example, if the situation happens only once a month, it makes sense to go for swap. There is money involved, capability of the motherboard, etc. It may happen that the running processes have ram areas that they use once and then not again till seconds later. Or minutes, hours or even days. In that situation, swapping those areas has limited impact. It depends on the workload. Also, if the swap is on nvme disk, the impact is much reduced, by orders of magnitude. You can choose not to pay for that ram increase and accept some impact in speed. | | look like in case of lack of ram situation swap is only a solution | if several memory chunks are not really used (for example unseen | tabs in firefox), Yes. | and if crash can happen it will happen also soon or later with | swap No, the crash should not happen. It will happen if there is neither RAM nor Swap. If there is swap, it will happen if the kernel can not juggle with the ram and swap to make some ram available for the processing, and gets stuck. It will kill "something". - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhsMvwAKCRC1MxgcbY1H 1ePBAKCHiJSPnvqgAl+A/JwMafiCWd37BwCcCnHu4fnaUxv3YJ5QcX4giv0qFvc= =slK2 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 12/01/2020 à 13:10, Carlos E. R. a écrit :
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/01/2020 11.57, jdd@dodin.org wrote:
| and if crash can happen it will happen also soon or later with | swap
No, the crash should not happen. It will happen if there is neither RAM nor Swap. If there is swap,
you know. If something is available, it will be used soon or later swap is only some worst ram, it may be short as well - we speak of situations where ram is really useful That said, I know of hosting that allow you to change ram/processor%/disk size *on an hourly basis* depending of your needs. You need more ram for two hours, that's ok... but given the number on the subject line, I'm not sure if it's still possible. What if the maximum ram possibly available right now? jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/12/2020 05:54 AM, jdd@dodin.org wrote:
What if the maximum ram possibly available right now?
IIRC the servers we have could have been ordered with 2-TB of ECC RAM. Sixteen-each 128GB DIMMs. That option is a bit costly. https://www.siliconmechanics.com/system/storform-r518.v7 Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Le 12/01/2020 à 23:25, Lew Wolfgang a écrit :
On 01/12/2020 05:54 AM, jdd@dodin.org wrote:
What if the maximum ram possibly available right now?
IIRC the servers we have could have been ordered with 2-TB of ECC RAM. Sixteen-each 128GB DIMMs. That option is a bit costly.
yes, really. I wonder when we will have this on every computer (and found it too small :-)
thanks jdd -- http://dodin.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
jdd@dodin.org wrote:
Le 12/01/2020 à 13:10, Carlos E. R. a écrit :
What if the maximum ram possibly available right now?
An HPE Proliant DL580 Gen10 has 48 DIMM slots and accepts a maximum of 6Tb. -- Per Jessen, Zürich (5.2°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 05:57, jdd@dodin.org wrote:
Le 12/01/2020 à 11:47, Carlos E. R. a écrit :
Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of memory. That imprecise saying is good enough for me :-)
in my present understanding, more ram is much better than more swap.
YES!
I situation is so that ram is nearly full, swap is of little use if there is only one application using most of the ram (it can't be swapped)
when one reaches 512Gb ram and have only 1 Tb ssd for swap I wonder if we are not at the extreme possibility of the system?
I wonder that as well. There are aspects of the way paging tables work in Intel architecture that make demands on space in low memory. I'm not sure how that is scalable. Perhaps, after a point, the size of the pages has to increase. We went through that with file system blocks (back in the 1970s, UNIX BSD 4.1 etc), file system logical blocks and later with disk (even rotating rust) physical blocks. Are we going to go through that with VM page sizes as well?
if 1Tb of swap is *really* useful, go for 1Tb more ram.
Sadly, for many, if not all, of us, the motherboard only allows a certain degree of RAM memory expansion, whereas we can more easily allocate swap space on a drive or file system.
look like in case of lack of ram situation swap is only a solution if several memory chunks are not really used (for example unseen tabs in firefox), and if crash can happen it will happen also soon or later with swap
-- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 05:47, Carlos E. R. wrote:
On 12/01/2020 09.57, jdd@dodin.org wrote:
Le 11/01/2020 à 22:38, Carlos E. R. a écrit :
The processor can not use something that is in swap this moment; it has to copy it first to RAM.
So in a memory almost full situation the kernel first finds some block of memory that is not needed now (maybe by counting how long it has been since last used), writes it to swap, frees that block, then read from swap the block that it needs and writes it to the free ram.
It may not need to be written to swap. If this was a code page or a page from some other, similarly memory mapped file, the there is no need to write it to swap. If it is needed again it can be retrieved from the file system. Swap is only used for a program's volatile data.
So the operation is slow. The situation is called trashing; with a ^^^^^^^^ tHrashing rotating disk you hear it working and the system becomes slow.
Approximately. There may be details I do not consider but Andrei does :-)
Ideally, if I'm working with Thunderbird, and then go to the workspace where LibreOffice is open, Thunderbird would be sent to swap, and LO would be taken out of swap. But I doubt the kernel knows my intentions of using LO for a while and not Th.
Actually it makes some accurate prognostications. For a start, both make use of the same shared libraries. Not just libc but the GTK stuff as well. Both are heavy on startup, but the startup files and the startup code can then be discarded. Neither Thunderbird nor LO are 'sent to swap', only their volatile data. A lot of the difference between you TB volatile data and my TB volatile data lies in the details of configuration. Plugins contribute heavily!
Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of memory. That imprecise saying is good enough for me :-)
This instant, my 8 GiB system is using 7.2 GiB of swap, but there are 3.1 GiB of ram available (between fully free (2.66) and buffers/cache (1.1)). But it is fresh out of hibernation, the proportion of swap is higher.
Unlike older models, that 'buffer/cache' is just an expression of where some of the dynamics of virtual memory is at at the moment. - a disk-less workstation that runs entirely off networking will have a very different view of the buffer/cache! - a CPU compute intensive application (think: mining bitcoins?) is going to have a very different profile - a 'headless' server, one not running X or the associated interactive applications (TB, FF, LO DT) and graphics will use network IO buffer heavily. A simplistic memory report (perhaps to a ssh login) is going to give inadequate information about what buffers are being used. For some of the above ... why would you cache a buffer? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2020 17.35, Anton Aylward wrote: | On 12/01/2020 05:47, Carlos E. R. wrote: |> On 12/01/2020 09.57, jdd@dodin.org wrote: |>> Le 11/01/2020 à 22:38, Carlos E. R. a écrit : | |> The processor can not use something that is in swap this moment; |> it has to copy it first to RAM. |> |> So in a memory almost full situation the kernel first finds some |> block of memory that is not needed now (maybe by counting how |> long it has been since last used), writes it to swap, frees that |> block, then read from swap the block that it needs and writes it |> to the free ram. | | It may not need to be written to swap. If this was a code page or | a page from some other, similarly memory mapped file, the there is | no need to write it to swap. If it is needed again it can be | retrieved from the file system. | | Swap is only used for a program's volatile data. "only". That "only" is typically 5 GiB in my system after both FF and Th have been running a day or two, with a few more apps. Is there somewhere a measure the amount of memory mapped files that are not in RAM this second? |> So the operation is slow. The situation is called trashing; with |> a | ^^^^^^^^ tHrashing Blame Th speller. :-p |> rotating disk you hear it working and the system becomes slow. | | |> Approximately. There may be details I do not consider but Andrei |> does :-) |> |> Ideally, if I'm working with Thunderbird, and then go to the |> workspace where LibreOffice is open, Thunderbird would be sent |> to swap, and LO would be taken out of swap. But I doubt the |> kernel knows my intentions of using LO for a while and not Th. | | Actually it makes some accurate prognostications. For a start, | both make use of the same shared libraries. Not just libc but the | GTK stuff as well. Both are heavy on startup, but the startup | files and the startup code can then be discarded. | | Neither Thunderbird nor LO are 'sent to swap', only their volatile | data. I said approximately ;-) | A lot of the difference between you TB volatile data and my TB | volatile data lies in the details of configuration. Plugins | contribute heavily! | |> Now, 5 gigs of ram + 5 gigs of swap approximates 10 gigs of |> memory. That imprecise saying is good enough for me :-) |> |> This instant, my 8 GiB system is using 7.2 GiB of swap, but |> there are 3.1 GiB of ram available (between fully free (2.66) |> and buffers/cache (1.1)). But it is fresh out of hibernation, |> the proportion of swap is higher. | | Unlike older models, that 'buffer/cache' is just an expression of | where some of the dynamics of virtual memory is at at the moment. | | - a disk-less workstation that runs entirely off networking will | have a very different view of the buffer/cache! | | - a CPU compute intensive application (think: mining bitcoins?) is | going to have a very different profile | | - a 'headless' server, one not running X or the associated | interactive applications (TB, FF, LO DT) and graphics will use | network IO buffer heavily. A simplistic memory report (perhaps to | a ssh login) is going to give inadequate information about what | buffers are being used. | | For some of the above ... why would you cache a buffer? | - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhtR1QAKCRC1MxgcbY1H 1QAYAJ9ML3COUWsJEjrz0BVdbSnnO2bkLwCeJ7LDXlQpAjUP/uVmvQOhdzjJjNk= =8jLN -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 12/01/2020 03:57, jdd@dodin.org wrote:
if there is no difference, crash will happen whatever swap is created in a time where ram is nearly as big as disk space...
You need to rephrase that; that wording is ambiguous. We are not talking about a roll-in/roll-out system. This is virtual memory and only the working set ever needs to be considered. That may include shared libraries. Memory 'exhaustion' may happen not because the applications are using too much memory but because the applications are using fragmented memory and the overhead of the mapping-range tables is a severe contribution. There are VM control characteristics that alter how aggressively (or not) fragmentation is allowed, packing is done. Some packing can be done by paging in different way. There is a lot to be considered in a real world virtual memory system. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/08/2020 10:34 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM? Much depends on your type of workload - anything thst size we only use for virtual hosting, so no swap. If you're not doing virtual hosting, I expect you know the workload really well, there are not many things that require that amount of memory.
My users process large datasets and visualize lots of it. They use python for much of the work, which tends to not use RAM efficiently. Their current machine has 128-GB of RAM, which is rather tight for them. Certainly they don't need 1-TB of swap, which probably wouldn't be effectively usable in the first place. But I vaguely recall that having too much swap was also bad, but that was a long time ago. Maybe I should just partition a smaller swap partition and save the extra space for a rainy day? But how small? Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Thu, 9 Jan 2020 07:23:33 -0800 Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 01/08/2020 10:34 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM? Much depends on your type of workload - anything thst size we only use for virtual hosting, so no swap. If you're not doing virtual hosting, I expect you know the workload really well, there are not many things that require that amount of memory.
My users process large datasets and visualize lots of it. They use python for much of the work, which tends to not use RAM efficiently. Their current machine has 128-GB of RAM, which is rather tight for them. Certainly they don't need 1-TB of swap, which probably wouldn't be effectively usable in the first place. But I vaguely recall that having too much swap was also bad, but that was a long time ago. Maybe I should just partition a smaller swap partition and save the extra space for a rainy day? But how small?
What's the availability requirement of the machine? Using the SSD as a RAID 1 for the OS etc might be useful? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
On 01/08/2020 10:34 PM, Per Jessen wrote:
Lew Wolfgang wrote:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM?
Much depends on your type of workload - anything thst size we only use for virtual hosting, so no swap. If you're not doing virtual hosting, I expect you know the workload really well, there are not many things that require that amount of memory.
My users process large datasets and visualize lots of it. They use python for much of the work, which tends to not use RAM efficiently. Their current machine has 128-GB of RAM, which is rather tight for them. Certainly they don't need 1-TB of swap, which probably wouldn't be effectively usable in the first place. But I vaguely recall that having too much swap was also bad, but that was a long time ago. Maybe I should just partition a smaller swap partition and save the extra space for a rainy day? But how small?
I can't imagine swapping being very good for your users' performance anyway, but at least have some _swap_ to avoid the OOM killer kicking in and killing off that job that was about to finish after running for 6 months. I think I would vote for hpj's suggestion of two partitions of 64Gb. -- Per Jessen, Zürich (5.9°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 16.23, Lew Wolfgang wrote: | On 01/08/2020 10:34 PM, Per Jessen wrote: |> Lew Wolfgang wrote: |> |>> Hi Folks, |>> |>> Back in the old daze we used to allocate swap space three |>> times as large as the installed RAM as a rule of thumb. But |>> I've got two new servers with 512-GB of ECC RAM and now I'm |>> wondering, How Much Swap? |>> |>> The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's |>> tempting to use one for the operating system and the second for |>> swap. Data will be stored on hardware RAID6 arrays and so |>> aren't a part of this calculation. |>> |>> Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM? |> Much depends on your type of workload - anything thst size we |> only use for virtual hosting, so no swap. If you're not doing |> virtual hosting, I expect you know the workload really well, |> there are not many things that require that amount of memory. | | My users process large datasets and visualize lots of it. They | use python for much of the work, which tends to not use RAM | efficiently. Their current machine has 128-GB of RAM, which is | rather tight for them. Certainly they don't need 1-TB of swap, | which probably wouldn't be effectively usable in the first place. | But I vaguely recall that having too much swap was also bad, but | that was a long time ago. Maybe I should just partition a smaller | swap partition and save the extra space for a rainy day? But how | small? Having lots of swap available should not be a problem (only that the kernel needs a map to it; I do not know the details). What is a problem is if a machine actually uses lots of swap continuously. Different thing, it means the machine wants more RAM. Me, I had an issue some months ago (Leap 15.x) when swap usage reached about 6G (RAM is 8), the machine would crash. Currently it does not happen, this moment I'm using 8. I suspect kernel guys have corrected something (because it does not crash, for days). Or the trigger situation (something in Thunderbird or Firefox, I suspect) does not happen. Telcontar:~ # free -h --si ~ total used free shared buff/cache available Mem: 8.0G 3.8G 3.2G 139M 979M 3.8G Swap: 24G 7.8G 16G Telcontar:~ # (as much swap used as ram) In your case, using NVM, and if you fear that swap will be used, I think your idea of dedicating one device to is is a good idea because it allows you to track wear due to swapping. They are very fast devices already, so I suspect that with splitting swap in two devices with the same priority you will not notice an advantage. Besides, you might want to hibernate. One fancy usage of hibernation for servers: on power failure. Suppose a long calculation run: you do not loose it. If the network connections survive, but that's a different issue. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXheH+wAKCRC1MxgcbY1H 1UTmAJ937GKrHQPNA7piAm3aPwYXeOchNACbBQx74zJA7dwzIE3o8L6KJti2Lxs= =rOsG -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 05.22, Lew Wolfgang wrote: | Hi Folks, | | Back in the old daze we used to allocate swap space three times as | large as the installed RAM as a rule of thumb. But I've got two | new servers with 512-GB of ECC RAM and now I'm wondering, How Much | Swap? | | The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's | tempting to use one for the operating system and the second for | swap. Data will be stored on hardware RAID6 arrays and so aren't a | part of this calculation. | | Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM? Yes, you can :-D I have never tried that much, but several gigabytes GB with 500 MB of ram, yes, I have done it. No problem - except that if it has to use them it goes slow on rotating rust. Once upon a time, YaST1 (not 2) had a memory hole on update (YOU). Zypper had not been invented yet. A bunch of memory for every package. To update the machine I needed swap something like ten times bigger than the actual RAM (I don't remember the actual figures, but I have told the anecdote more than once here). The update took a few hours, but it finished. Today, this minute: cer@Telcontar:~> free -h --si ~ total used free shared buff/cache available Mem: 8,0G 4,0G 2,8G 150M 1,2G 3,5G Swap: 24G 7,7G 16G cer@Telcontar:~> Some months ago, this situation would crash Leap 15.x Wait, re-reading. I thought you had 500 MB of RAM. You have 500GB of RAM. No you do not have 512GB of RAM, you said GB, not GiB. ;-P Ok, in that situation you don't actually need swap except hibernation, and the kernel is happier with some swap. You will know the amount after using the machine. Maybe a few GiB will suffice. If you intend to hibernate, you need more than RAM, but I don't know how much. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhcNjAAKCRC1MxgcbY1H 1T8iAJ9Wf75GDEVx7dMkxj8Cv49ytdQtRwCeOwCAGJANSfrxR0l9J7HTOjWh4H8= =o64R -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 06:25, Carlos E. R. wrote:
Today, this minute:
cer@Telcontar:~> free -h --si ~ total used free shared buff/cache available Mem: 8,0G 4,0G 2,8G 150M 1,2G 3,5G Swap: 24G 7,7G 16G cer@Telcontar:~>
Without knowing your 'swappiness' that's meaningless. If you are using the default setting of '60' that means the kernel will swap when RAM reaches 40% capacity. I consider that ridiculous. As it is, you are only using 50% of your RAM (total=8.0G, used=4.0G) Try setting it to 10 sudo sysctl vm.swappiness=10 flush your swap then continuing -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2020-01-09 at 08:24 -0500, Anton Aylward wrote:
On 09/01/2020 06:25, Carlos E. R. wrote:
Today, this minute:
cer@Telcontar:~> free -h --si ~ total used free shared buff/cache available Mem: 8,0G 4,0G 2,8G 150M 1,2G 3,5G Swap: 24G 7,7G 16G cer@Telcontar:~>
Without knowing your 'swappiness' that's meaningless. If you are using the default setting of '60' that means the kernel will swap when RAM reaches 40% capacity. I consider that ridiculous. As it is, you are only using 50% of your RAM (total=8.0G, used=4.0G) Try setting it to 10
sudo sysctl vm.swappiness=10
flush your swap then continuing
I'm using defaults, and I'm not chainging them. It would make my system slower. - -- Cheers, Carlos E. R. (from openSUSE 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXhcxJxwccm9iaW4ubGlz dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVht8AoJVhrJS1HqO86jpGmJNc xTRAIsRhAJ94XYgTgjk5EK35QRIW3WsvKQ8Gig== =SH3h -----END PGP SIGNATURE-----
On 09/01/2020 08:56, Carlos E. R. wrote:
If you are using the default setting of '60' that means the kernel will swap when RAM reaches 40% capacity. I consider that ridiculous. As it is, you are only using 50% of your RAM (total=8.0G, used=4.0G) Try setting it to 10
sudo sysctl vm.swappiness=10
flush your swap then continuing I'm using defaults, and I'm not chainging them. It would make my system slower.
HUMBUG! I'm using rotating rust so any swap activity AT ALL would slow my system. With swappiness=10 I get no swap to rotating rust to slow my system. If YOU set swappiness=10 you'll simply get less, perhaps non, swap to your SSD. It will affect nothing else. It has not altered the tuning, the rate of recirculation or other characteristics of your virtual memory system. That is a separate matter. All it has done is changed the THRESHOLD at which swapping starts. OBVIOUSLY, to me, with slow rotating rust, this matter greatly. BUT it matters to you as well. Although the write to a SSD is fast it is not instantaneous, and there is still the overhead of the doing of it. Avoiding that might not seem much, but it is there. Asserting that you system will run slower because it is not writing to swap so much, so readily, so unnecessarily, is not the case. While it is obvious in my case, it is also true your in your case. It is a degree, not an absolute. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 16.02, Anton Aylward wrote: | On 09/01/2020 08:56, Carlos E. R. wrote: |>> If you are using the default setting of '60' that means the |>> kernel will swap when RAM reaches 40% capacity. I consider |>> that ridiculous. As it is, you are only using 50% of your RAM |>> (total=8.0G, used=4.0G) Try setting it to 10 |>> |>> sudo sysctl vm.swappiness=10 |>> |>> flush your swap then continuing |> I'm using defaults, and I'm not chainging them. It would make my |> system slower. | | HUMBUG! | | I'm using rotating rust so any swap activity AT ALL would slow my | system. With swappiness=10 I get no swap to rotating rust to slow | my system. | | If YOU set swappiness=10 you'll simply get less, perhaps non, swap | to your SSD. It will affect nothing else. And that's a bad thing. | It has not altered the tuning, the rate of recirculation or other | characteristics of your virtual memory system. That is a separate | matter. All it has done is changed the THRESHOLD at which swapping | starts. | | OBVIOUSLY, to me, with slow rotating rust, this matter greatly. BUT | it matters to you as well. Although the write to a SSD is fast it | is not instantaneous, and there is still the overhead of the doing | of it. Avoiding that might not seem much, but it is there. | | Asserting that you system will run slower because it is not writing | to swap so much, so readily, so unnecessarily, is not the case. | | While it is obvious in my case, it is also true your in your case. | It is a degree, not an absolute. Nope. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXheIwwAKCRC1MxgcbY1H 1SqZAJoDzvtJ3OctXBRTeRiaCsLn+frpBACfdQSxdftdvfIbD8Lnu1nf8LUc3og= =vdxE -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
09.01.2020 16:24, Anton Aylward пишет:
On 09/01/2020 06:25, Carlos E. R. wrote:
Today, this minute:
cer@Telcontar:~> free -h --si ~ total used free shared buff/cache available Mem: 8,0G 4,0G 2,8G 150M 1,2G 3,5G Swap: 24G 7,7G 16G cer@Telcontar:~>
Without knowing your 'swappiness' that's meaningless. If you are using the default setting of '60' that means the kernel will swap when RAM reaches 40% capacity.
Do not spread completely false information. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am Donnerstag, 9. Januar 2020, 05:22:33 CET schrieb Lew Wolfgang:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
I declare swapping servers are running in an exceptional state. My rule of thump for *typical* server workloads: reduce swap relative to the amount of physical RAM. Depending on the estimated/desired safety margin, I go for something between 1/2 and 1/8 of RAM, and *disable* it for normal operations. Having that large headroom, that you have available, I would even split that to both NVMes, and place, say 2 * 64G swaps at the end of the NVMes. So you need to describe your specific workload briefly in order to explain the large amounts of SWAP, that you seem to have used before and how do you expect your servers to behave, if significant parts of it are in continuous use (IOW, server is swapping). Cheers, Pete -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang schrieb am 09.01.20 um 05:22:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM?
Regards, Lew
Well, this depends what you want to run on it :) I have 3 servers with 768 GB and one with 1.5 TB RAM. They have a swap space of 20 GB. The boxes are running SAP HANA databases. When they start to use swap, the system seems to be at a standstill, so for my purpose I hardly ever use swap. SAP recommends a maximum of 20 GB just for the same reason: a host should not start swapping, since it becomes slow when it does. When I buy a SAP HANA Appliance ready-to-use, they use the default swap value on SLES, which is 2 GB. Werner -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 09/01/2020 10:08, Werner Flamme wrote:
a host should not start swapping, since it becomes slow when it does.
That is the ultimate issue. As I see it, there were two and ONLY two reasons why swap might happen. And on a VM system, 'swap' means paging to backing store. 1. Actual physical memory exhaustion. in the days when VM was a solution to memory being short because memory was expensive, the ditty was "virtual memory means virtual performance". Not enough physical memory meant the VM system was constnatantly working: page in, page out. Round and round it went. That was then, this is now. Memory is cheap; backing store is fast. (My limit is the Mobo won't take any more memory!) 2. The virtual memory system is badly configured. That means it is doing paging/swapping when it doesn't need to, when there is adequate memory for the demands bing placed upon it. There are a LOT of things you can adjust about the VM system, the page queue, how they age and circulate, but somewhere along the line pages might need to be written to backing store. How aggressively the ageing happens, how readily to swap processes happens is key here. At one level, this is going to be determined by the process churn. There is some interaction with the IO churn as IO buffers will need to be created, but they are queued separately, cached, and there should not be a great deal of 'crossover'. Of course this gets heartily confused with Linux for various reasons. One is 'binary reuse'. UNIX has always had the ability to share binaries, users using the same programs with different data space. With late model UNIX and with Linux this extends to the shared libraries. What makes this more complex is that those pieces of code are actually mapped memory to the disk locations. Every time a location on a 'page' is accessed for execution that page is pulled back in the ageing queue. Similarly with data. It is the unused stuff that gets aged and becomes candidates for page-out. Sometimes the page-out happens because pages are needed for input. That might be code or newly created data. Or data retrieved from swap. Which gets back to old model of inadequate memory and thrashing. Thus 'badly configured' is a very variable concept. While 'one size fits all' is never going to be absolutely true, there's a lot of 'one setting will suffice', be tolerable, for many situations. BUT realistically, the working profile and the specific demands needs to be examined and possibly some experimentation needs to be done. There's a lot you CAN tune! #ls /proc/sys/vm admin_reserve_kbytes lowmem_reserve_ratio oom_kill_allocating_task block_dump max_map_count overcommit_kbytes compact_memory memory_failure_early_kill overcommit_memory compact_unevictable_allowed memory_failure_recovery overcommit_ratio dirty_background_bytes min_free_kbytes page-cluster dirty_background_ratio min_slab_ratio panic_on_oom dirty_bytes min_unmapped_ratio percpu_pagelist_fraction dirty_expire_centisecs mmap_min_addr stat_interval dirty_ratio mmap_rnd_bits stat_refresh dirtytime_expire_seconds mmap_rnd_compat_bits swappiness dirty_writeback_centisecs nr_hugepages unprivileged_userfaultfd drop_caches nr_hugepages_mempolicy user_reserve_kbytes extfrag_threshold nr_overcommit_hugepages vfs_cache_pressure hugetlb_shm_group numa_stat watermark_boost_factor laptop_mode numa_zonelist_order watermark_scale_factor legacy_va_layout oom_dump_tasks zone_reclaim_mode Some guidelines for the informed at https://www.kernel.org/doc/Documentation/sysctl/vm.txt -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2020 16.08, Werner Flamme wrote: | Lew Wolfgang schrieb am 09.01.20 um 05:22: ... | Well, this depends what you want to run on it :) | | I have 3 servers with 768 GB and one with 1.5 TB RAM. They have a | swap space of 20 GB. | | The boxes are running SAP HANA databases. When they start to use | swap, the system seems to be at a standstill, so for my purpose I | hardly ever use swap. SAP recommends a maximum of 20 GB just for | the same reason: a host should not start swapping, since it becomes | slow when it does. You are probably using rotating rust. Swapping in Leap behaves noticeably worse than in pre-leap. For some reason, seeking requests increased; thus when I put swap on SSD the improvement was dramatic. - -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar) -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCXheJ8gAKCRC1MxgcbY1H 1bRuAJ9dCwt9mS9m/CeKkiSz37yYHWG70QCfQ2aGNDs22YwuUGgzlq4XE1FUa44= =/62/ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. schrieb am 09.01.20 um 21:15:
On 09/01/2020 16.08, Werner Flamme wrote: | Lew Wolfgang schrieb am 09.01.20 um 05:22:
...
| Well, this depends what you want to run on it :) | | I have 3 servers with 768 GB and one with 1.5 TB RAM. They have a | swap space of 20 GB. | | The boxes are running SAP HANA databases. When they start to use | swap, the system seems to be at a standstill, so for my purpose I | hardly ever use swap. SAP recommends a maximum of 20 GB just for | the same reason: a host should not start swapping, since it becomes | slow when it does.
You are probably using rotating rust.
A RAID5 with 5 or 6 HDD with 1.2-1.8 TB each (varies between hosts). This is the main LVM physical volume, where nearly everything is located on, included a swap partition. The other RAID5 consists of 2 to 6 SSD with 400/800 GB each, this is for /hana/data only.
Swapping in Leap behaves noticeably worse than in pre-leap. For some reason, seeking requests increased; thus when I put swap on SSD the improvement was dramatic.
SAP tells me that in SLES tuning /proc/sys/vm/swappiness does not bring any advantages. I should use saptune (which uses tuned and adds some profile info) and be fine with it. So I am :) I'm running SAP systems on SLES for 17 years now, and SLES 15 is the most comfortable one for me yet. Only this systemd thing I can't warm up with. SAP HANA is a database that runs completely in memory. Every 15 minutes, a snapshot is written to disk (to /hana/data). If I had any swapping here, the database *does* feel like in a standstill. Running 9 databases on one host caused a permanent load of 2-5. The box has 112 logical CPUs... Only when doing the internal database check, there was a load of ~100 (for about 5 secs, the check does not take more time). Werner -- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 01/08/2020 10:22 PM, Lew Wolfgang wrote:
Hi Folks,
Back in the old daze we used to allocate swap space three times as large as the installed RAM as a rule of thumb. But I've got two new servers with 512-GB of ECC RAM and now I'm wondering, How Much Swap?
The motherboard has two-each 1-TB NVMe M.2 PCIe modules, it's tempting to use one for the operating system and the second for swap. Data will be stored on hardware RAID6 arrays and so aren't a part of this calculation.
Any thoughts? 1-TB of swap on one M.2 for .5-TB of RAM?
Regards, Lew
After RAM passed the 2G mark, I just stayed with a 2G swap no matter how much RAM I have. I'm sure there are use scenarios where more swap makes sense, but on my 8G laptop, I suspend just fine with a 2G swap. If you have a server, then suspend isn't a concern for swap size. I'd just look and see if you ever saturate your RAM, and if not, swap is pretty much superfluous. Like Anton, I set to minimize swap anyway in /etc/sysctl.conf with: vm.swappiness = 10 Good article: https://www.howtogeek.com/449691/what-is-swapiness-on-linux-and-how-to-chang... -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (11)
-
Andrei Borzenkov
-
Anton Aylward
-
Carlos E. R.
-
Dave Howorth
-
David C. Rankin
-
Hans-Peter Jansen
-
jdd@dodin.org
-
Lew Wolfgang
-
Per Jessen
-
suse@a-domani.nl
-
Werner Flamme