Thursday 28 March 2024

Re-thinking OOM Killer

 

Swap is a performance killer for web applications. Once a component in your stack goes into swap it takes a very long time to recover. If you don't have the capability to move the traffic elsewhere quickly, restarting the service is often the best approach. Given the price of RAM, there's little reason not to add more as a long term fix. Indeed in recent years I have stopped provisioning swap at all on hosts. It felt wrong at first. I felt the same way the first time I built a computer without a floppy drive. While I've never had a reason to regret abandoning floppy disks, I am now questioning whether swap might matter after all.

Enter OOM Killer.

Linux has a facility called memory overcommit[1]. It is predicated on executables asking for more memory than they will actually use, and the complications of actually tracking memory usage on a modern operating system. I've talked about the latter before[2]. Put simply, the OS pretends it has more memory (RAM and swap) than is actually available. It is enabled on most (all?) Linux distributions. And most of the time it really does have a positive impact. But in extremis it can can cause a lot of headaches. OOM Killer starts terminating executables when the OS realises that it can't satisfy all the promises it made to them about memory. And on a server host, that usually means terminating the one job the host is expected to do. It does not ask nicely.

If you go researching on the internet, you will find a LOT of articles recommending that you start playing with oom_score_adj e.g. this article on Baeldung.com [3]. OMG NO! This influences which process the OOM Killer will target first. It does not prevent the situation from arising. There might be an argument for maintaining admin access during an OOM event but if that's dependent on, for example, forking sshd, starting a new session, spinning up a shell then oom_score_adj is not going to help. The mechanism by which the OOM Killer chooses a victim is complex, so even if this were a valid approach, selecting the right values can only be sensibly done on a trial and error basis.


The approach I had been using up to now was 2-fold.

  1. attempt to tune the application to stay within designated boundaries

  2. Set the memory overcommit to a fixed amount (I use the ratio, but this can also be set in kilobytes) and try to tune that ratio to the correct amount. I have found a value of 20 = use (100+20)% of (RAM + SWAP) a good starting point on a host which has exhibited OOM Killer behaviour.


(but simply adding more RAM should always be considered!)


The problem with this is that it is still hit and miss. Either the ratio is too low or its too high. And when it's too high, you only find out when OOM Killer does its thing.


Finally I get to the point....

Swap is bad. But the system can tolerate a little bit of swap usage before performance takes a nose dive. Further, allowing the system to start swapping (just a little) means we can actually see what the peak memory usage was! We have a basis for predicting how much overcommit we should actually allow.

While the gap is narrowing, SSD storage is still 5-10 times cheaper than RAM. Further, in most corporate environments, adding or removing storage is a much more minor exercise than changing RAM.

So my revised strategy for responding to OOM Killer is:

  1. Ensure the monitoring is set to alert when the swap usage increases above base level

  2. Provision swap – around 50% of RAM size

  3. sysctl vm.overcommit_memory=2

  4. sysctl vm.overcommit_ratio=10

If you're seeing the system start to use swap and you can't slim down the application config then it's time to buy more RAM. If it looks like your system is not filling up its RAM, then increase the overcommit_ratio. Repeat until you start tickling the swap, then back off.

Job done.


[1] https://www.kernel.org/doc/html/latest/mm/overcommit-accounting.html

[2] https://lampe2e.blogspot.com/2015/03/accurate-capacity-planning-with-apache.html

[3] https://www.baeldung.com/linux/memory-overcommitment-oom-killer