Disclaimer: the author is the lead developer of a consistently poorly performing node web framework (as measured by framework benchmarks). I mean, it really sucks.
Benchmarking frameworks is fucking stupid.
Every few months someone comes up with yet another system to benchmark web frameworks. They setup a few simple scenarios like serving static content, a JSON reply, and sometimes rendering views or setting cookies. The typical examples contain almost no business logic. It is a theoretical test of how fast a framework performs when it accomplishes nothing.
In this scenario, the lighter the framework is – that is, the less functionality it is offering out of the box – the faster it is going to perform. It is pathetically obvious. It is one thing to compare the performance of various algorithms but when the biggest factor is how much other “stuff” is performed, you don’t need to write tests – you need to RTFM.
To those who occasionally bring up hapi’s poor performance on these ridiculous charts, I make two points.
First, hapi is slower than bare node and express because it does more. Don’t you want protection against your process going out of memory? What about event queue delay protection? What about client request timeouts? Server response timeouts? Protection against aborted requests? Built-in request lifecycle logging? Input validation? Security headers? Which one of these is optional? If you say most – hapi is clearly not for you.
Second, the Walmart mobile servers built using hapi were able to handle all mobile Black Friday traffic with about 10 CPU cores and 28Gb RAM (of course we used more but they were sitting idle at 0.75% load most of the time). This is mind blowing traffic going through insignificant computing power. Why would anyone spend engineering resources trying to optimize it when it is clearly performant enough?
But this post is not about how stupid framework benchmarking is.
In addition to the optimizer, v8 has to perform continuous garbage collection. This is required to free up memory taken by objects that are no longer being used. In order to minimize its impact on performance, v8 tried to limit garbage collection to application idle time. Also, the longer an object “survives” garbage collection the less likely it is to be removed quickly when it is no longer needed. And the more stuff you do, the more objects are generated and need to be cleaned up.
The other critical component is the node event loop. The event loop is the “single thread” running your code. It is not exactly a single thread but as far as your application is concerned, it is a single threaded engine. Everything that happens in node is called from the event loop. It is a queue of I/O events and timers which trigger your callbacks – basically, your entire node application is nothing but a collection of callbacks.
What allows node to handle a large number of requests is the fact that most activities block the event loop for a very short period of time. For example, typical web requests require some database items. When those are fetched, node puts the request on hold and handles other requests until the database comes back with the item. Node requires this downtime to handle multiple requests. v8 requires this downtime to perform garbage collection.
When v8 is performing garbage collection, the event loop is paused. When a callback takes a long time to return control back to the event loop, all other callbacks, including expired timeout, are paused. If your business logic performs some calculation that takes 100ms to perform, you will not be able to handle even 10 requests per second. Simple math.
Why does this matter for benchmarking? Because these benchmark systems focus on performance at maximum load. They basically measure how many requests a server can handle under heavy load. The goal is to squeeze everything you can out of your computing resources. The problem is that under 100% CPU, node’s performance is dreadful.
At very high CPU loads, node’s event loop is fighting with the v8 garbage collector over resources. They can’t both run at the same time. This means that instead of getting the most out of your resources, you are wasting energy switching between two competing forces. In fact, the vast majority of node applications should be kept at CPU load levels of under 50%. If you want to maximize your resources, run multiple processes on the same hardware (with enough margin for the operating system).
If our production servers show more than single digit CPU load, we consider that a significant problem. If your node process is CPU bound, you are doing something wrong, your deployment is misconfigured, or you don’t have enough capacity.
What makes things worse when doing this sort of benchmarking is that the load is almost exclusively blocking because there is no business logic to go and create that downtime. Most of the internal framework facilities, such as parsing headers, cookies, and payload processing are blocking activities that require better downtime management than an application with empty business logic provides.
There is still great value in benchmarking applications. But if performance under load isn’t meaningful, what is? That’s where performance at rest comes in.
Performance at rest is the best-case-scenario of your application under no load. It’s how fast you can drive from point A to point B without anyone else on the road. It is a very significant number because it directly translates to user experience and relative performance. In other words, if your server can do unlimited number of requests per second, but they each take 60 seconds to complete, your amazing capacity means nothing because all your users will leave.
Measuring performance at rest is actually a bit more involved than just running a single request and measuring how fast it takes to complete. This has a lot to do with the v8 garbage collector and the v8 runtime optimizer. These two are working for and against your application. The first time you make a request, nothing is optimized and your code will be very slow. The 10th time you make a request, the garbage collector might kick in and pause it in the middle. Testing once is not enough to get real numbers.
What you want to do is come up with a scenario in which you are making multiple requests continuously over time, while keeping your server CPU load within an acceptable range.
This is where slow performance indicates a problem. If under these conditions, and with the feature-set you require, your web framework is performing poorly, it should be fixed or replaced. If the overhead of the framework is making your requests too slow at rest, the framework is either too heavy for your use case, or is under performing and should be fixed.
Understanding your application’s performance is critical. Benchmarking without taking into account the very nature of your platform is harmful.