A performance test of SQS in a full stack environment
SQS is a hosted service provided by Amazon. You can start using it without any startup fee. The only thing you pay for is the requests and the bandwidth you use.
Ok, let's try this out in a full stack production environment. I have previously built a Node.js web application to handle analytics tracking from web pages. This application is open source and called pixel-pong.
First I created an image I could run inside EC2 instances. The image is based on Ubuntu 14.03. I then installed Node.js and my application Pixel-pong.
Secondly I created an Amazon Elastic Load Balancer to be able to distribute the traffic over several EC2 instances (c3.large). This is done so I'm able to simulated our current production environment. I want to test the full stack and see how everything plays together. It's important for me to see where the bottle neck is.
Thirdly I've signed up for a LoadImpact account to be able to perform a distributed load test as close to human behavior as possible. I configured my test with the maximum amount of users (10.000) allowed from this subscription. Each users a requesting a new "page" every 2-3 sec.
This should generate a peak traffic of about 3500 req/s in my first test without auto scaling enabled.
We need to enable auto scaling and try again!
This test has been done several times to be sure the results are correct.
As a control check I'm going to test one web server and it's integration against the SQS queue to try to find the limit of 1 instance.
Ssh into the web server and install Apache Benchmark:
$ sudo apt-get install apache2-utils
Running the ab test, 1 mill requests with 30 simultaneous connections:
Apache Benchmark results:
Concurrency Level: 30 Time taken for tests: 1346.329 seconds Complete requests: 1000000 Failed requests: 0 Total transferred: 292000000 bytes HTML transferred: 43000000 bytes Requests per second: 742.76 [#/sec] (mean) Time per request: 40.390 [ms] (mean) Time per request: 1.346 [ms] (mean, across all concurrent requests) Transfer rate: 211.80 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 3 Processing: 1 40 16.4 40 177 Waiting: 0 40 16.3 40 176 Total: 1 40 16.4 40 177
After planning to launch a fleet of tracking servers and have them run our raw Apache Benchmark test simultaneously, I stopped for a second and turned to Google.
A guy called Adam Warski had already done this and written a great blog post about it called Benchmarking SQS.
Results from his tests with 25 threads on each node:
|Number of nodes||1||2||4||8|
|Sender pr node & thread||354,15||338,52||305,03||317,33|
|Sender total||8 853,75||16 925,83||30 503,33||63 466,00|
|Receiver pr node & thread||166,38||159,13||170,09||174,26|
|Receiver total||4 159,50||7 956,33||17 008,67||34 851,33|
The highest results he managed to get was:
Back to the questions I started this post with:
In theory Amazon SQS seems to scale to the traffic amount you want. It's built upon EC2 instances with auto scaling. We just have to remember that auto scaling needs time to scale up. But why do my tests experience problems? Do SQS need more time to scale up?
With the total of 18 servers each capable of handling 742 req/s I should be able to handle 13356 req/s.
Amazon Simple Queue Service is intended as a simple queue for messages as they travel between computers. SQS queues can be created as topics and access to the queues are controlled by roles or direct user access.