Scaling your Web App without dying in the attempt — PART 1

Agustin Lucchetti
6 min readFeb 11, 2019

After months of hard work the project finally got his first production release and the client is starting to promote it. Everything is going well until you get a phone call at Sunday 4p.m: the site is down, too much traffic.

This meme never gets old

Sounds familiar? maybe it was on Monday instead of Sunday, maybe it was an email not a phone call, but these situations are a common occurrence in today’s world of agile development and fast-growing startups.

All the hard work can be worthless if the site goes down at the instant when the Google Analytics live users count goes above 2 digits. Scalability doesn’t happen by accident, if the team didn’t put the time to optimize & test your application for high traffic, is not ready.

“Scalability doesn’t happen by accident“

— Some random dude with bad English on Medium

Back to the basics: understanding traffic and performance bottlenecks

Before diving 150 pages deep in the documentation of NGINX trying to find that magic setting that will reduce your CPU usage by 40% or spending a fortune by resizing all your AWS instances to x49.7xLarge with 256 CPU cores and 4 petabytes of RAM, you need to identify what is killing your app.

Imagine that traffic is water, and your application is composed of serially connected pipes. As soon as one of the pipes overflows, all the pipes before it will overflow as well, your application is as solid as his weakest link. How do we find the weakest link? with a water pump and a hose!.

Preparing our test bench

We need to generate load, the objective is to get our application slow and unresponsive, to kill it. You probably heard of JMeter, K6 or some other fancy tools, but for this first stage we don’t need them. In my experience, you can crash most of the websites that are not properly optimized with just one command line.

DISCLAIMER: if you are going to try to against your production servers, make sure to preventively schedule some downtime. And please, don’t run stress test on someone else’s website, don’t be that guy.

Our hose will be this little tool call Hey ( With this little program we can flood requests to our application with just one command. For those who are familiar with Apache Bench, this tool is pretty much the same but better (multi-core cpu and http2 support, it doesn’t act funky with SSL and high concurrency, etc).

Now we need a water pump! Go to your AWS/Google Cloud/Azure account and spin up a new VM, here are my tips to pick the right one:

  • Load generation requires a lot of CPU and very little RAM. Pick a VM class that is optimized for CPU intensive task, like AWS C5 instances.
  • Don’t use burstable instances like the AWS T2/T3 instances. We need consistency in our tests, to verify if our changes are having positive effects.
  • Pick a VM with at least 4 CPU Cores. They are a bit pricy, but you will only need them for a couple of minutes. AWS per Second Billing is your best friend, if you use another cloud provider make sure you don’t get billed by 1 hour for a 5 minutes test, it can get expensive if you are running large test with multiple VMs.

Before running the test, we need to tune a few settings in our VM to make sure it can handle a lot of connections, otherwise we will start getting errors like too many open files during our tests.

Edit /etc/security/limits.conf and add this two lines.

* soft nofile 65535
* hard nofile 65535

Close your current shell, and login again to the VM. If you run ulimit -u you should see 65535, we are good to go.

Let’s get the water to flow: Stress testing!

Choosing the right URLs for the test is really important. Take for example, is not the same hitting the static homepage, than the logged user’s wall that has to load a lot of stuff from the database. Also, keep in mind that requests made with Hey are like CURL requests, if you hit an HTML page it won’t fetch all the static resources (CSS, JS, Images, etc) and it won’t execute XHR requests.

If you want to hit pages or API endpoints that are only accessible to logged users, you can login with the user in your browser, take the session cookies from the browser’s developer console and send them with Hey using the -H option.

Let´s run a quick test:

./hey -n 10000 -c 100

  • -c Is the concurrency level, it defines how many concurrent requests will be run. The more concurrent requests, the more aggressive our test will be.
  • -n Is the total amount of request that will be run. This is used to define the duration of the test, the larger the amount of request, the longer the test will last.

Keep in mind that increasing the level of concurrency will make the test end faster, so you will need to adjust the amount of requests to make sure that your test last enough to actually stress your servers (usually 5 minutes is more to enough for this kind of tests).

Total: 3.2972 secs
Slowest: 0.1080 secs
Fastest: 0.0022 secs
Average: 0.0323 secs
Requests/sec: 3032.8725

Response time histogram:
0.002 [1] |
0.013 [123] |■
0.023 [2070] |■■■■■■■■■■■■■■■■■■■
0.034 [4259] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.045 [2789] |■■■■■■■■■■■■■■■■■■■■■■■■■■
0.055 [218] |■■
0.066 [34] |
0.076 [161] |■■
0.087 [153] |■
0.097 [144] |■
0.108 [48] |

Latency distribution:
10% in 0.0207 secs
25% in 0.0240 secs
50% in 0.0295 secs
75% in 0.0365 secs
90% in 0.0420 secs
95% in 0.0681 secs
99% in 0.0943 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0002 secs, 0.0022 secs, 0.1080 secs
DNS-lookup: 0.0003 secs, 0.0000 secs, 0.0621 secs
req write: 0.0000 secs, 0.0000 secs, 0.0165 secs
resp wait: 0.0311 secs, 0.0021 secs, 0.1036 secs
resp read: 0.0003 secs, 0.0001 secs, 0.0175 secs

Status code distribution:
[200] 10000 responses

The most important parts of this report are the Requests/sec and Status code distribution. In the test all requests ended in a 200 and the Requests/sec value looks good (this changes from site to site, it will depend on how fast the site is. The idea of these first series of test is to get the “baseline” value for your site when your servers are not saturated).

Now we can start increasing the level of concurrency and the amount of connections to the point where we start to seeing error codes in the results. Keep in eye on the CPU of the VM running the test with something like htop, if the Load Average goes above (1 * Number of CPU Cores), then you need a bigger VM or you will need to add more VMs.

If you are unfamiliar with how Load Average works on Linux, check this great article about it:

Now that you found the limits of your application, you need to identify the botlenecks. Here a few tips:

  • Monitor CPU/RAM usage of your servers during the test. You can use htop (terminator for Linux, and iTerm2 for macOS have support for split terminals, which are great to have htop running in multiple servers at the same time). You can also use AWS CloudWatch, Google Stackdriver or whatever monitoring stack you are running, just make sure that your metrics intervals are small (no more than 60 seconds), otherwise you can miss the spikes (CloudWatch for example defaults to 5 minutes intervals, turn on Detailed Monitoring and adjust your graphs to intervals of 60 seconds)
  • If you can’t find any server with 100% CPU or RAM utilization, you may have an ‘artificial’ bottleneck, usually as result of default settings on some services (for example NGINX worker_connectionstoo low).
  • Check service logs (NGINX, Apache, Tomcat, etc) and the linux system log: /var/log/messages(RHEL/CentOS) /var/log/syslog(Debian/Ubuntu) for any crashes on the services or OOM killed processes.

In the next part we will talk about common performance bottlenecks, how to solve them and how to improve the performance of your web application by optimizing settings, adding cache layers, CDNs, and more! Stay tuned.

Part 2 is up!