Monday, 16 March 2015

Accurate capacity planning with Apache - protecting your performance

While most operating systems support some sort of virtual memory, if the system starts paging memory out to disk, performance will take a nose dive. But performance will typically be heavily degraded even before it runs out of memory as the applications start stealing memory used for I/O caching. Hence setting an appropriate value for ServerLimit in Apache (or the equivalent for any multi-threaded/multi-process server) is good practice. For the remainder of the document I will be specifically focussing on Linux, but the theory and practice apply to all flavours of Unix and MSWindows too.

Tracking resource usage of the system as a whole is also good practice – but beyond the scope of what I'll be talking about today.

The immediate problem is determining what an appropriate limit is.

For pre-fork Apache 2.x, the number of processes is constrained by the serverLimit setting

For most systems the limit will be driven primarily by the amount of memory available. But trying to workout how much memory a process uses is actually surprisingly difficult. The executable code is memory mapped files – these are typically readonly and shared between processes.

Running 'strace /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf' causes over 4000 files to be “loaded” on my local Linux machine. Actually few of the are read from disk – they are shared object files already in memory which the kernel then presents at an address accessible to the httpd process. Code is typically loaded into such shared, read only pages. Linux has a further way of conserving memory. When it needs to copy memory which might be written to, the copy is deferred until a process attempts to write to the memory.
The net result is that the actual footprint on the physical memory is much, much less than the size of the address space that the process has access to.

Different URLs will have different footprints, and even different clients can affect the memory usage. Here is a typical distribution of memory usage per httpd process:

This is further complicated by the fact that our webserver might be doing other things – running PHP, MySQL and a mailserver being obvious cases – which may or may not be linked to the volume of HTTP traffic being processed.

In short, trying to synthetically work out how much memory you will need to support (say) 200 concurrent requests is not practical.

The most effective solution is to start with an optimistic guess for serverLimit, and set MaxSpareServers to around 5% of this value. Note that after the data capture exercise, you should up MaxSpareServers to around 10% of serverLimit +3. Then measure how much memory is unused. To do that you'll need to set up a simple script running periodically as a daemon or from cron, capturing the output of the 'free' command and the number of httpd processes.

Here I've plotted the total memory used (less buffers and cache) against the number of httpd processes:

This system has 1Gb of memory. Without any apache instances running, the usage would be less than the projected 290Mb – but that is outwith the bounds we expect to be operating in. From 2 httpd processes upwards, the average size and variation in size for each httpd process is very consistent – but since the variation in size is consistent that means the size of the total usage envelope will expand as the number of processes increases. The dashed red line is 2 standard deviations above the average usage, and hence there is a 97.5% probability that memory usage will be below the dashed line.
I want to have around 200kb available for the VFS, so here, my ServerLimit is around 175.

Of course the story doesn't end there. How do you protect the server and manage the traffic effectively as it approaches the serverLimit? How do you reduce the memory usage per httpd process to get more capacity? How do you turn around requests faster and therefore reduce concurrency? And how do you know how much memory to set aside for the VFS?

For help with finding the answers, the code run here and more information on capacity and performance tuning Linux, Apache, MySQL and the book!

If you would like to learn more about how Linux Memory Management then this (731 page) document is a very good guide:


  1. Would you be interested in helping us fine tune an windows apache server ? we will gladely pay

    1. This comment has been removed by a blog administrator.

  2. (In case anyone is wondering why I deleted the post above, Scott had helpfully shared his email address for the spambots to harvest. Unfortunately I did not have the time to help)