I agree with the blog post's technical contents, but I feel we came across too strong in the title. For Ubicloud as a managed Postgres provider, we use strict memory overcommit. Our experience with operating Postgres at scale taught us that it's better to enable this than going with the defaults.
However, I can see many other scenarios, where using strict memory overcommit would have unanticipated side-effects. That's why Linux doesn't go with strict memory commit as its default.
I've gone through this exercise in the past on much older kernels which they cover as well and just me personally I ran into less issues by leaving overcommit to 0 and just dropping the overcommit ratio to 0 and setting the oom_score_adj for programs as high as 1000 if I wanted vmscan to leave them alone and of course using the Redhat formulas for setting vm.min_free_kbytes, vm.admin_reserve_kbytes, vm.user_reserve_kbytes. And of course be vigilant in disallowing app owners from using every last bit of memory.
[1] - https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...
For now, we have overcommit_ratio set to a value that is stable from experience, but there really seems to be no silver lining. Go is very happy to allocate a lot of virtual memory, but so are most managed languages. The best solution would probably be to host the backend and the database on separate servers.
GOMEMLIMIT works very well if you set it to around 90% of available memory as a rough heuristic. You should definitely profile your application to fine tune this number (e.g. if you link with C libraries that hold large memory pools then Go doesn't account for that) but also to identify sources of spikey/leaky allocations. For example, encoding/json is notorious for it's inner sync.Pool hanging on to outsized buffers. There's usually a lot of low hanging fruit.
In my experience Go can be extremely stable in terms of memory footprint at both small (~O(1MiB)) and large (~O(256GiB)) scales, and it takes only a small amount of effort.
As far as GC languages go, it is by far the easiest to work with.
Whether failed transactions are actually so much more desirable than a OOM-killed process isn't quite obvious, but it might be easier to troubleshoot.
Took k8s ages to get Swap support.
We lost something when we accepted that Hyperscalers just tell you to use more moemory. It was shitty 5 years ago and today especially after the ram price increases
And now, with PSI + MGLRU, situation is much better, but there are still missing features/subsystems which would be nice to have. For example there's no simple way to lock memory mlockall-style to ensure that rarely used daemon would not face long no-cache-latency upon accessing the first time after long idle time.
Unfortunately, many programs commit 2x memory than they actually use. Often I see ~32GB committed and ~16GB resident.
https://unix.stackexchange.com/questions/797835/disabling-ov...
I run Firefox, VSCodium with LSP, Discord, Signal and there's still space left for a game like CS2. I'm not a heavy user by any means.
> I'm not sure they would do much better than crash
I have yet to see a program that silently handles allocation failures and doesn't crash. These days everything is coded to crash if no memory :(
> About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed
In my case it killed system critical processes with no way to recover. With disabled overcommit, it freezes for a while (usually for a minute or two), I close some random program of my choosing and then see in Resource Monitor what's eating my ram.
Postgres handles allocation failures
I dont think it has an option for that.
The Linux Kernel OOM killer kills random things. Userspace OOM killers are meant to improve this, and they work well in a server situation when you already know in advance what is likely to go haywire and what is safe to kill. But they don't work well on desktop (some of them are improving but it doesn't seem to be a priority).
The Windows OOM killer by comparison usually kills something sensible (i.e. the program that is actually using all the memory), and asks the user for permission before killing it (when possible). You do see a lot of memes of situations where it fails.
By default, the Linux kernel kills the largest process in the system (unless OOM adjust was applied).
A memory allocator can implement overcommit, because you can separate reserving virtual memory and having it backed by physical memory into two different system calls. But from the point of view of the kernel, any time it promises to give you physical memory that memory is backed either by RAM or by space reserved in the swap file
The purpose of the system commit limit and commit charge is to track all uses of these resources to ensure they are never overcommitted — that is, that there is never more virtual address space defined than there is space to store its contents, either in RAM or in backing store (on disk).
- Windows Internals, 7th EditionIf no memory is available where a page file would make a difference, this leads to application crashes instead. A crash is (usually) worse than paging.
Certain applications, Photoshop being the historical example, will outright fail to run with no page file present.
Same happens if the page file is full. In that case, why don't those programs use disk directly instead?
No such problem would've ever occured if programs hadn't allocated more than they actually use.
Typically, performance drops enough that the user kills the program or reboots before the page file expands to fill the disk. And other threads here suggest there is something that will prompt users to kill programs in states like this.
> No such problem would've ever occured if programs hadn't allocated more than they actually use.
That's part of the issue, but sometimes things do in fact use too much memory as well as allocate too much.
Another part of the issue is that few programs are built to handle allocation failures.
And then you have a metrics issue. There's not really a good metric to know when you're out of memory, other than performance collapse. If your applications don't use disk, it's not too hard; but when they do use disk, performance will collapse once there's insufficient memory to provide the disk caching needed. In my experience, adding a small swap and monitoring swap i/o can be pretty helpful, and a small swap doesn't tend to allow long thrashing when memory use grows. But that's not universal and everybody loves to hate swap these days.
Not in the age of NVMe it doesn't. Swap is fast now. Plus, at least on Linux, you can put zswap in front of the regular swap and introduce an even faster level of memory hierarchy and thereby make page-outs even more profitable.
An application that grows in such a way (besides having backing stores for memory-mapped files, as well) will often perform so poorly that it requires addressing (adding RAM, looking for application faults, etc).
A page file is insurance, one that can last you much longer than available system memory.
You don't need it if you have everything allocated upfront. TigerBeetle does this, everybody else can.
Using something like Rust is already a huge win when compared to shipping a browser or running Node.js.
> Your argument falls flat when a page file can be multi-GB and automatically grow
This doesn't solve the original issue and only masks the underlying problem.