SimonHF's Blog

Just another WordPress.com site

What they didn’t tell you about creating scalable HTTP web services April 10, 2011

Filed under: Uncategorized — simonhf @ 3:22 am

Important (performance) factors to keep in mind when creating HTTP web services (with or without libSXE):

1. Accept()ing new sockets is an expensive and generally single-threaded kernel operation. Therefore it’s important to consider having as many long-lived ‘keep-alive’ HTTP 1.1 connections as possible. For example, your multi-threaded HTTP server might be able to have 500,000 simultaneous HTTP connections and perform at 250,000 keep-alive requests per second, but the server will still only be able to accept() new connections at a rate of 30,000 per second.

2. Parsing HTTP headers can be done with the sub-library lib-sxe-http. However, parsing the key/value structured headers is (unnecessarily) slow. Why slow? Because it’s necessary for the parser to loop over and examine each and every character in the headers in order to determine e.g. whether it’s an end of line character, or whether it’s one of the end of header characters etc. Although such a loop seems simple and fast, when we loop over, say, 250,000 or more headers per second then all that looping adds up into a lot of CPU instructions. Why unnecessarily slow? If you have control over the peer sending the HTTP request — e.g. because the peer is another of your servers or your javascript library etc — then it’s possible to build the never ‘pipelined’ requests and/or responses in such a way that it’s not necessary for your HTTP server to parse the headers. For simple GET requests, your server only needs to examine the last characters read in order to determine if they are the end of header characters. For POST requests and responses, the ‘content-length’ header is useful but in order to avoid parsing the headers then we can embed this info at the end of our body at a fixed, known location. And this technique works even if third-party HTTP proxies manipulate the headers en-route. Conclusion: Avoid parsing HTTP headers if possible. This blog post shows how parsing HTTP headers using node.js causes node.js to more than halve in perfermance: Node.Js Versus SXE “Hello World”; Node.Js Performance Revisited

3. The general rule of thumb is that the HTTP server nearly never has to close() a socket. The close() function is not as slow at the accept() function but it’s not for free either. Using close() might just lead to a malicious peer to connect again causing an expensive accept(). So assuming your server has a lot of memory then it’s probably a better trade off to just keep a badly behaving HTTP peer connection, and maybe even ‘tar pit’ it etc. The idea is that a little bit of memory for the peer buffers is a lesser evil than the potentially expensive close()/accept() bottleneck.

4. Another reason not to close() is that on very heavily loaded networks then there is no guarantee that the TCP packets generated by a close() will even reach a peer. In fact it doesn’t even have to be a heavily loaded network. For example, a client which suddenly powers off and had an idle keep-alive connection to a server will not inform the server of the close; therefore the server will continue to believe that the TCP connection is still alive unless it tries to send data to the client. So a good rule of thumb is not to rely on the close event — generated when a peer close()s — for cleaning up the resources / states / etc belonging to a particular HTTP connection. So how else to clean up a connection if the server never close()s and we cannot 100% rely on the close event? The only answer which makes sense is to timeout the particular connection. But there is an even simpler way; just close and reuse the oldest TCP connection. For example, the server is configured to handle a maximum of 500,000 simultaneous connections. When we want to accept() the 500,001 connection then we simply close() the oldest connection in the connection pool. Chances are that we’re just reaping one of those ‘zombie’ connections.

Advertisements
 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s