Scaling your Web App without dying in the attempt — PART 2

Agustin Lucchetti
6 min readFeb 18, 2019

In the first part we learned how to stress test a web application to find the performance bottlenecks, now is time to fix them!

But after reading this post, you will know how.

The idea is to squeeze all the performance we can off our servers. This is key to scale our infrastructure in a cost-efficient manner. If we need 200 U$D in servers for every 100 concurrent users, things will get really expensive really fast.

For this part i personally prefer to do my tests against the simplest expression of the infrastructure required by the application. No autoscaling groups of Web Servers, no database clusters, no redundant load balancers, etc, just the bare minimum. There are a couple reasons why:

  • We will be changing and adjusting settings constantly, and we will need to test the result of each one of those changes. As you can imagine, we will have to run A LOT of stress tests, and having a smaller beast to feed will make our work easier.
  • There are a millon variables that can affect the performance of a complex setup, specially when things like autoscaling are involved. We need to eliminate those variables and being able to zero-in in each part of our application.

Whether you are using NGINX, Apache, Tomcat, uWGSI, Gunircon, PM2, etc or any combination of them, these are a few tips that will help you squeezing the most of your hardware.

Pimp your Web Servers

Making sure that your Web Servers are properly configure for handling high traffic is really important, and is probably the most common cause for sites going down when traffic spikes.

All Web Servers have some sort of settings to tune how many connections they can maintain open simultaneously, and how many processes/threads they can spawn. Default values trend to be super conservative and you are supposed to change them when setting up your production servers. These are the most relevant settings for the two most popular options, NGINX and APACHE.

  • NGINX: worker_proccesses, worker_connections. These two settings manage how many processes NGINX will create and how many connections each one of them can handle. Also don’t forget to match the value of worker_rlimit_nofile with the value of worker_connections to avoid too many open files errors.

These are two great sources for NGINX performance tuning that you should always keep at hand: and The last one being a config generator that is a great starting point for most scenarios.

APACHE: Processes and requests in Apache are handled by modules called MPMs (Multi-Processing Modules). The default one, mpm_prefork is really inefficient, the first thing you should check is that you are using mpm_worker instead (or the newer mpm_event which i haven’t use yet). Then you should check the settings related to that MPM module: ServerLimit, ThreadsPerChild, ThreadsLimit,MaxRequestWorkers are the most important ones.

Make sure that your are not using the older Apache 2.2, is obsolete! Use the 2.4 branch.

Apache configuration can be a bit more tricky compared to NGINX, and the documentation available is in general less user-friendly. This article goes straight to the point and gives your a good starting configuration to use

Pimp your Application Servers

Unless your Web application is just a static page, you probably have an Application Server / Process Manager running your backend. PHP-FPM for PHP, uWGSI or Gunicorn for Python, PM2 for NodeJS, Passenger for Ruby, etc. They are a key part of the infrastructure and configure them properly you must, young padawan.

  • Don’t serve static resources with them! I can’t count how many times I’ve seen this mistake. Put an NGINX in front to serve the static resources with it, instead of passing all the requests to the application server. NGINX is faster and better suited for serving static files (css, js, images, fonts). Here is an example of how your configuration should look like
server {
listen 80;

# Server static files folder with NGINX
location /static/ {
root /home/mysite/static/;
# Pass the rest of the request to the application server
location / {

If your application doesn’t have all the static files grouped in folders, you can use a regular expression to match the files base on their extension.

location ~* \.(jpg|jpeg|gif|png|css|js|ico|svg|woff)$ {
root /home/mysite/
  • Use all CPU cores: Same we did with the Web Servers, make sure that the Application Servers are taking advantage of multi-core CPUs by spawning many processes/threads. This is particularly common with NodeJS applications since NodeJS is single-threaded by design. You will need to clusterize your app using PM2 cluster mode, or spawning your app once per CPU core (in different ports of course) and using NGINX as a local loadbalancer for the different NodeJS processes. If your application is not stateless, you are up for a painful ride….
  • Use sockets if you can. In general there are two ways to communicate your Web Server (e.g NGINX) with your Application Server (e.g PHP-FPM), HTTP or Sockets. If the Application Server is local to the Web Server, use sockets, they are faster and require less memory (but keep in eye on the logs, if NGINX is getting a 502 error when trying to communicate with the socket during the stress tests, you may need to tune net.core.somaxconnon /etc/sysctl.conf). Otherwise, use HTTP.

Reduce the amount of requests per page

Each time a user loads your website, a lot of requests are made. The initial request will pull the HTML document and from there all the JS/CSS/Images/Fonts will be requested to your servers.

This is important if you consider that most browsers can make up to 6 concurrent requests per domain, the amount of requests to your servers will grow exponentially the more users you get.

  • Concat and Minify: While having your Javascript and CSS code organized in different files and folders is great, you don’t need to serve it like that to the browsers. Use tools like Grunt, Gulp or Webpack to squash all those files and reduce their size during the build/deploy of your application.
  • Use cache headers: Browsers don’t need to download your CSS/JS files or Images in every page view. Use the cache-control header to tell them to pull the files from their local cache. Check this article if you want to learn more about HTTP cache headers:
  • Use a CDN: If you can, use a Content Delivery Network (CDN). If you are not familiar with the term, is basically a distributed network that can serve your static files faster and ease the load on your severs. AWS Cloudfront is a great cost-effective CDN service.

Let someone else take care of SSL

HTTPS is a must have nowadays, and with things like LetsEncrypt there are not excuses for not having HTTPS on your site. But there is a downside: SSL encryption is a CPU intensive process, and if you have to handle it for thousands of concurrent connections, the CPUs on your servers are going to take a toll, but luckily there is an alternative.

Offload SSL to the loadbalancers. Cloud providers like AWS or Google Cloud have their on managed Load Balancer services (AWS Elastic Load Balancers, Google Load Balancing). You just point your DNS to them and done, they will handle whatever amount of connections you throw at them, and you won’t have to worry about scaling or availability issues. But there is more, they can also handle SSL! Whether you choose to use their free managed SSL certificates or upload your own, they will handle the SSL for you, with no extra costs.

This is all for part 2. In the next -and final- part, we will take a look at the data-layer of our applications. How to make the best of our database servers, optimize queries, add cache layers and even cache our entire website in memory to reduce load on our databases!