Nowadays I don't usually configure any swap on my servers. When a web application starts using swap it is already failing - its not really providing any contingency / safety net. At least that's what I thought up until today.
I was looking at an ELK cluster built by one of my colleagues. Where saw this:
$ free -m
total used free shared buff/cache available
Mem: 63922 50151 5180 1 8590 12907
Swap: 16383 10933 5450
Hmmm. Lots of swap used, but lots of free memory. And it was staying like this.
Checking with vmstat, although there was a lot of stuff in swap, nothing was moving in and out of swap.
After checking the value for VmSwap in /proc/*/stat, it was clear that the footprint in swap was made up entirely of gunicorn processes. Gunicorn, in case you hadn't heard of it, is a Python application server. The number of instances it runs is fixed and defined when the server is started. I've not seen a server like that in 20 years :).
- On an event based server such as nginx or lighttpd, a new client connection just requires the server process to allocate RAM to handle the request.
- With the pre-fork servers I am familiar with, the server will adjust the number of processes to cope with level of demand within a defined range. Some, like Apache httpd and php-fpm implement hysteresis - they spin up new instances faster than they reap idle ones - to better cope with spikes in demand.
- Thread based servers are (in my experience) a halfway-house between the event based and (variable) pre-fork servers.
While the kernel is doing the job of ensuring that these idle processes are not consuming resources which could be better used elsewhere, it is perhaps a little over-zealous here. It will be more expensive to recover these from swap than it would be to fork an instance. But changing to a variable number of processes is not really an option here. If I start seeing performance isues when this application comes under load I'll need to look at keeping these out of swap - which unfortunately comes at the cost of reducing available memory for the overnight batch processing handled on the cluster.