SimonHF's Blog

Just another WordPress.com site

Concurrency, performance, threads, and locking brain dump December 3, 2012

As a ‘hard-core’ C developer with over 30,000 hours of hands on development experience then I feel qualified to comment on concurrency and performance issues relating to ‘threads and locks’. So here goes my brain dump by dividing up the territory into 7 common problem areas:

Problem #1: The hidden concurrency in single-threaded algorithms, *or* How developers think memory operations & mechanisms like bsearch() are fast:

It’s almost like — by starting to think about concurrency in terms of threads and locks — we’re jumping the gun. Because *before* even considering the problems of multi-threading, there’s a different concurrency war going on behind the scenes in your single threaded program! Believe it or not your higher level code might look single threaded but at run-time it comes down to assembler op codes being executed and the associated memory accesses. Often our intention is to use concurrency in order to achieve run-time performance in which case it’s critical for the developer to understand how memory access works even in a single-threaded program. Why? Believe it or not it’s easily possible for an inexperienced developer to write a concurrent, multi-threaded algorithm which is way slower than a single-threaded algorithm. Now imagine that much faster single-threaded algorithm made into a multi-threaded algorithm. It’s going to be mega fast.

One common reason for developers inadvertently making their single-threaded algorithm unnecessarily slow is the misconception that memory access is generally pretty fast. The reality is a bit different and more complicated. Why? Because memory is not as fast as you might think. Generally developers think memory is fast and disk is slow so try to access the disk as little as possible. This is a good rule of thumb. However, what very many developers don’t realize is that that memory access times by CPUs these days has a split personality; uncached RAM is much, much slower than CPU cached RAM. So much so that the time difference between accessing uncached RAM and cached RAM is analogous to the idea that one should try to minimize disk access in favour of memory access. So how is RAM cached? In simple terms, RAM is cached in ‘cache lines’ the size of which depends upon the CPU but usually 64 bytes on modern 64 bit CPUs. In the old days then figuring out the speed of some assembler was as easy as determining and adding up the number of clock cycles for a series of op codes, and memory access speed was always the same and therefore predictable. These days, op codes still have an individual execution cost associated with them but the total time to execute is blurred by two factors; pipe-lining of instructions causing variable execution time, and *highly* variable memory access times depending upon how memory has been cached. As this point I bet you’re thinking: Okay, so a bit of caching goes on; big deal; it can’t make that much difference to the execution speed of my program; at most a few percent or 10% or something low like that. Well you’re wrong and an easy way to illustrate the problem is by using bsearch(). Yes, bsearch(). And it’s easy to Google bsearch and find a whole bunch of developers willing to swear by their grandmother than it’s the fastest way to access a large chunk of data. The truth is that it might have been fast 10 years ago but not any more. And the reason it’s not fast is because if the data set being searched is much larger than the total cache line capacity on the CPU, then the bsearch() algorithm by its very nature causes much memory to be accessed which isn’t currently cached in cache lines. Even accessing 1 byte of a 64 byte cache line causes an entire 64 byte cache line to be expensively ‘flown in’ from the RAM to the cache line cache on the CPU. How expensive is this cache line gathering process? As an example we’ll consider a simple C program which creates a sorted array of 30 million equal sized records taking up about 1 GB of RAM. If we use bsearch() to loop through all keys from 1 to 30 million in order, then on my laptop’s Intel i7 CPU then about 8 seconds of wall clock time goes by. Fine you say; 30 million is a lot of records and that sounds like good performance. However, now let’s look up the same 30 million keys but in random order. On the same CPU then about 40 seconds of wall clock time goes by. That’s *five* times slower. Not a few percent slower. Not ten percent slower. But *five* times slower. Obviously this is a huge difference. And it’s all down to cache line exhaustion. In both experiments we accessed exactly the same amount of memory and looped the same number of times, but in experiment #2 then the CPU had to have cache lines expensively ‘flown in’ way more often. As a reference, then the same 30 million records can be accessed in the form of a hash table in under 5 seconds of wall clock time and in any random order. Why? Because less cache lines must be expensively ‘flown in’ ūüôā

What is the moral of this story? There are several: Always be paranoid and question the status quo. Never assume that your algorithm is the fastest no matter how simple it looks. Or that the only way to make it faster is to make it multi-threaded. When working with large data sets in memory then always be mindful of cache line exhaustion. Never try to measure small amounts of something; always performance test your algorithm working flat out, exercising 100% CPU, and with realistic data access patterns in order to determine the average tiny cost per iteration, e.g. 40 seconds divided by 30 million. Always try very hard to avoid multi-threaded code because it’s evil in so many ways.

Problem #2: Turning a single-threaded algorithm into a locked multi-threaded algorithm:

In example #1 then we patted ourselves on the back for optimizing the query algorithm so that instead of taking 40 seconds, it now takes 5 seconds using a hash table. We did this by understanding the not-in-our-face cache line memory concurrency issues happening in the CPU even with a single thread. Now we want to make our hash table multi-threaded. Let’s say we have 8 physical CPU cores and want each core to be able to read and write hash table records. Problem is that we need to lock the hash table so that no two cores simultaneously try to write to the hash table at the same time.

Pitfall #1: Which amazingly I have seen even experienced developers fall into: Put as little code inside the lock as possible. For example, there’s no need to calculate the hash of the key inside the lock!

Pitfall #2: Use the hash of the key to reduce lock contention. For example, instead of locking the entire 1 GB hash table then have e.g. 100 x 10 MB hash tables; each having its own lock. Use the hash of the key to decide which key exclusively goes in which hash table. Now when accessing the hash table concurrently then there’s a much bigger chance that a particular thread won’t block because individual hash tables can be locked and accessed in parallel.

Pitfall #3: Avoid using fancy locks. Some developers are tempted to use fancy hybrid, spin-lock type constructs that do fancy operations, for example, don’t lock if only multi-threaded readers are currently accessing, but only lock if a single writer is writing. These fancy locks sound good but are in fact very expensive to execute — even if no actual locking is performed — due to the fact that multiple atomic assembler instructions must be used to implement the fancy algorithm.

Pitfall #4: Avoid using fancy lock-less algorithms. Some developers think that using fancy lock-less algorithms is the way to get around having to use locks and is therefore a kind of silver bullet solution. Problem is that all these lock-less algorithms rely on using atomic assembler instructions which are expensive to execute. Why? Because they guarantee that a cache line must be expensively ‘flown in’ (see above) as well as doing a bunch of other expensive things like breaking the concurrent op code pipe-line.

Problem #3: Turning a locked multi-threaded algorithm into an unlocked multi-threaded algorithm:

Obviously using no locks is faster than using locks. Plus using fancy lock-less algorithms is kind of cheating and also doesn’t result in the ultimate performance that we’re looking for. So how to turn our locked multi-threaded algorithm into an unlocked multi-threaded algorithm? One way is to let the kernel help us. Ultra high performance algorithms for reading and writing data often have many readers and relatively few writers. What if the readers could all read as much as they want and concurrently, *and* a writer can write as much as it wants and concurrently *and* all without locks? Well it turns out that this is possible. How? Everybody has heard about copy-on-write memory but this is mainly associated with fork()ing and multiple processes. What is little known is that it’s possible to do copy-on-write in a single process and even in a single thread! Let’s go back to our first bsearch() example with the 1 GB memory block. We can memory map that 1 GB memory block and use it for the concurrent readers. However, we can also memory map the same 1 GB to another address range in memory as a copy-on-write copy of the first 1 GB memory map. The second 1 GB memory map uses the same physical memory blocks as the first one and so takes up almost no more physical memory. The concurrent writer can write to it, update multiple areas, and even apply a series of updates as an atomic transaction of updates, all without disturbing the first memory map which the readers are happily reading from. When the writing is complete then the pointers to the memory maps get updated and… tadaa! We have a multi-threaded and truly lock-less algorithm. A side benefit of using copy-on-write is that the memory map will survive — and we can choose whether the memory map is backed to disk or just virtual memory — our process re-starting, which means that if the memory map is holding some sort of cache of data then it will be immediately ‘hot’ upon restarting.

Problem #4: My (unlocked) multi-threaded algorithm works too fast for the network card:

After having taken the CPU cache lines into consideration, and after having removed lock contention using copy-on-write then finally we end up with monster performance which scales across all CPUs. Only thing is that this doesn’t really help us very much if we can’t speak to the outside world faster enough. Let’s say our multi-threaded algorithm can read and write at 10 million transactions per second with all CPU cores pinned at 100%… how does this help us if we’ve deploying to a box which has a 100 Mbit or 1,000 Mbit NIC only capable of a few hundred thousand packets per second? And this is probably the type of NIC commonly available from a service like Amazon EC2. The truth is that unless your NIC is 10,000 Mbit then you probably don’t need a multi-threaded algorithm in the first place. It is even said that C gurus can write code which handles all the packets of a 10,000 Mbit NIC using a single-thread; it depends upon your algorithm of course.

An exception to this is if you’re writing your cloud code in a language other than C. For example, node.js is fast to write but only relatively fast to run. A single-threaded node.js algorithm can easily be an order of magnitude slower than the same algorithm in C. Mainly because in node.js the author has little control over the internal format of data structures and therefore the efficiency of accessing them in terms of CPU cache line invalidation. You would have to hack node.js to take advantage of CPU cache lines and/or copy-on-write memory which will be so complicated that you might as well use C in the first place. It’s a similar story for plain old Java or other higher level languages. This is also the main reason that the operating system itself and high performance software such as databases are generally written in C. This doesn’t mean you have to write everything in complicated C; just the generic, high performance components. Consider separating and writing higher level business logic — the source code which will probably change more often — in a higher level language which leverages the generic, high performance components. If you want the higher level language to interface directly to C then think very carefully about which language to use. Most scripting languages have the ability to call C functions and vice-versa but there can be enormous differences in speed when doing this. For example, some languages store their function names in an internal hash table. This means if C calls the higher level function then the higher level language is going to do a hash table lookup for every call; expensive.

Problem #5: My unlocked multi-threaded algorithm is an efficient work of art but still under performing on the network:

Also of note — and you would have thought this problem was fixed long ago — is that it can’t be taken for granted that a particular operating system and NIC will operate as efficiently as expected. For example, your average 1,000 Mbit NIC will only deliver anywhere close to 1,000 Mbit if optimally sized MTU sized packets are being sent. Try to send 1,000 Mbits using smaller packets and watch system interrupt time go up, while through-put goes down to levels as low as 200 Mbit. This could be partly due to the NIC hardware, partly due to the NIC driver, and/or partly due to operating system network stack tuning. Fact is, that you might only be able to tune it so high and no more. This is the point at which you might want to try a different NIC and/or different kernel version. Always test the operating system / NIC performance independently from your concurrent code. As we have seen before, it may not be necessary to even for you to make your code concurrent in order to fulfill the performance requirements.

Problem #6: My unlocked multi-threaded algorithm works amazingly on my 10,000 Mbit NIC on my 3 servers but took too long to develop:

Oh dear. You optimized everything so much and managed to develop something which is an order of magnitude or two faster than anything else available and it’s running on 3 beefy server boxes with high end NICs, but it took way too long to develop. All that CPU cache line analysis and lower level coding in C took much longer to develop than in other languages. Maybe it would have been financially better to develop everything in node.js which needs 30 servers instead of 3? This could well be the situation you find yourself in. Only your business model knows the answer to this conundrum. 27 extra servers could easily be much cheaper than paying more expensive C programmers to develop fiddly code for longer. However, if you’re expecting the business to grow e.g. 10 fold in a reasonable period of time then maybe it’s worth paying up front for the more complicated C code because suddenly the extra overhead of the C developers looks cheap compared to 300 – 30 = 270 extra servers for the node.js solution.

Problem #7: I’m told that the GalacticTurboNoSQLDB 2.0 is as fast as anything and the ultimate concurrent solution:

Don’t believe them! One solution says 10,000 transactions per second, while another says 100,000, and another says 1,000,000, and yet another says 10,000,000 transactions per second. Always evaluate performance by creating a real world performance / load test. Don’t be afraid to make the test as real as possible. For example, if you are expecting a million TCP connections to a particular server then have the test create a million TCP connections to each server; only then are we properly testing concurrency. Then ask yourself if it could be faster and/or use less memory and/or use less disk? Examine the CPU usage during the test. If they are not at 100% then maybe the algorithm is not optimal. If they are at 100% and it’s a network program then determine whether the maximum NIC through-put has been exhausted? If the NIC through-put has not been exhausted then there’s room for improvement. Once you have tested everything and compared all tests metrics and decided that concurrency and therefore performance is good then ensure that these metrics can be monitored 24/7 during live production. It maybe that the GalacticTurboNoSQLDB 2.0 is blazingly fast for 7 hours and your performance test only lasted for 5 hours. Because GalacticTurboNoSQLDB 2.0 is written in Java then it seemed to work well on your 128 GB monster server until garbage collection kicked in and it went on a bender for half and hour ūüė¶ When production metrics are found to no longer reflect the carefully crafted performance tests then carefully craft the performance tests a bit more!

Not the end of threads and locking concurrency issues, but the end of this brain dump.

Advertisements
 

Screencast: Building libsxe March 27, 2011

Filed under: Uncategorized — simonhf @ 8:09 pm
Tags: , , , , , ,

Screencast: Building libsxe
Screencast: Click to play HD full-screen

I thought I’d try something new. So here’s a screencast showing how to download, build, and test libsxe on 64 bit Ubuntu. I also explain a bit about the layout of the libsxe source files and the various sub-libraries. Building the release, debug, and coverage targets for libsxe from scratch — including running the well over 1,000 tests on each target and enforcing 100% code coverage — takes about 1 minute 20 seconds in total on my VMware installation of Ubuntu. Quite fast but could be faster. The build is currently executed using consecutive steps. It’s on the ‘to do’ list to parallelize the build and make use of multiple cores to speed things up even more. The tests run so fast already — even on a single core — because we do sneaky things like faking time and mocking system calls to easily reproduce the most difficult to reproduce error conditions.

 

libsxe, shared-memory, and parallel state-driven algorithms February 27, 2011

I recently came across the following paper: “Memory Models: A Case for Rethinking Parallel Languages and Hardware” by Sarita V. Adve and Hans-J. Boehm

The paper starts off:

The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad¬† community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming. This paper discusses the path to the above convergence, the hard lessons learned, and their implications. …

And then introduces the idea of “disciplined shared-memory models”:

Moving forward, we believe a critical research agenda to enable ‚Äúparallelism for the masses‚ÄĚ is to develop and promote disciplined shared-memory models that:
‚ÄĘ are simple enough to be easily teachable to undergraduates; i.e., minimally provide sequential consistency to programs that obey the required discipline;
‚ÄĘ enable the enforcement of the discipline; i.e., violations of the discipline should not have undefined or horrendously complex semantics, but should be caught and returned back to the programmer as illegal;
‚ÄĘ are general-purpose enough to express important parallel algorithms and patterns; and
‚ÄĘ enable high and scalable performance.

This is interesting because libsxe has a disciplined shared-memory model which goes a long way towards fulfilling the criteria above in the form of the sxe pool library. So what is a sxe pool and how does it offer us a disciplined shared-memory model?

The sxe pool library was invented for different reasons than offering a disciplined shared-memory model. A shared-memory option was added later as a pool construction option. In short, sxe pools offer a way to create C arrays of structs with the following generic benefits:
‚ÄĘ The size of the array is persisted
‚ÄĘ Each element of the array gets its own state which is persisted outside the element struct in a linked list
‚ÄĘ Each element of the array gets its own timestamp which is persisted outside the element struct in a linked list
‚ÄĘ Each element is accessed using regular & concise C code, e.g. myarray[myindex].mystructmember

The sxe pool library caller can manipulate the state & timestamp element properties using the following API:

sxe_pool_get_number_in_state(void * array, unsigned state)
sxe_pool_index_to_state(void * array, unsigned id)
sxe_pool_set_indexed_element_state(void * array, unsigned id, unsigned old_state, unsigned new_state)
sxe_pool_set_oldest_element_state(void * array, unsigned old_state, unsigned new_state)
sxe_pool_get_oldest_element_index(void * array, unsigned state)
sxe_pool_get_oldest_element_time(void * array, unsigned state)
sxe_pool_get_element_time_by_index(void * array, unsigned element)
sxe_pool_touch_indexed_element(void * array, unsigned id)

Converting the sxe pool library API to support shared-memory was relatively simple. The sxe_pool_new() function got an option to share the pool memory. The API functions to change the pool element state use atomic assembler instructions if the pool was constructed as a shared-memory pool. It’s also interesting to note that sxe pools can be shared between processes as well as between threads in the same process. This is because the sxe pool library internal implementation avoids absolute pointers; which is also something that I encourage from libsxe developers and C developers in general.

This API is “general-purpose enough to express important parallel algorithms and patterns” and most interestingly is the same API whether the algorithm is threaded or not. It’s also “simple enough to be easily teachable to undergraduates” or even junior developers as we have found out at Sophos. The atomic sxe_pool_set_[indexed|oldest]_element_state() API functions “enable the enforcement of the discipline” by requiring both the old state and the new state of the array element; if the developer supplies the wrong old state then sxe pool will assert. Because the sxe pool library manages the element states itself then an assert is very unlikely when using a single pool. However, more complicated algorithms often make use of chains of pools in order to implement multiplexing and/or combining of parallel results, etc. In these cases, it is common to keep references to pool array element indexes and/or pool array element states in the caller supplied pool element structs. Finally, by implementing algorithms using the sxe pool API then it is possible to “enable high and scalable performance” using a minimum of simple to understand C source code. The developer is forced into thinking about the algorithm as a state model which often simplifies the hardest problems. And the generic part of the implementation complexity — e.g. locking, shared memory, keeping state, double linked lists, timeout handling, & memory allocation — is all handled by the sxe pool library and backed by automated tests with 100% library code coverage. The resulting performance is excellent as can be seen by the figures published in earlier blog entries; tens of thousands of network requests per second per core.

As you can see, the sxe pool is an incredibly powerful and code saving and code simplifying generic data structure. It’s a sort of Swiss Army knife for parallel, event driven algorithms. In a future article I’ll show some of the implementation patterns.

 

node.js versus Lua ‚ÄúHello World‚ÄĚ October 13, 2010

Neil Watkiss¬†— known among other things for many¬†cool Perl modules¬†—¬†has created a non-optimized, experimental version of SXE (pronounced ‚Äėsexy‚Äô) containing embedded Lua¬†called SXELua. So I thought it would be fun to redo the familiar ‚Äď to readers of this blog ‚Äď ‚ÄėHello World‚Äô benchmark using SXELua. And here is the Lua source code:

do
    local connect = function (sxe) end
    local read = function (sxe, content)
        if content:match("\r\n\r\n", -4) then sxe_write(sxe,"HTTP/1.0 200 OK\r\nConnection: Close\r\nContent-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n")
        end
    end
    local close = function (sxe) end
    sxe_register(10001, function () sxe_listen(sxe_new_tcp("127.0.0.1", 8000, connect, read, close)) end)
end

Compare this with the slightly longer node.js equivalent from the last blog:

var net = require('net');
var server = net.createServer(function (stream) {
  stream.on('connect', function () {});
  stream.on('data', function (data) {
    var l = data.length;
    if (l >= 4 && data[l - 4] == 0xd && data [l - 3] == 0xa && data[l - 2] == 0xd && data[l - 1] == 0xa) {
      stream.write('HTTP/1.0 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: text/html\r\nContent-Length: 13\r\n\r\nHello World\r\n');
    }
  });
  stream.on('end', function () {stream.end();});
});
server.listen(8000, 'localhost');

And now the updated results:

‚ÄúHello World‚Ä̬†¬†¬† Queries/ % Speed
Server           Second   of SXE
---------------- -------- -------
node.js+http     12,344    16%
Node.js+net+crcr 23,224    30% <-- *1
Node.js+net      28,867    37%
SXELua           66,731    85%
SXE              78,437   100%

In conclusion, calling Lua¬†functions from C and vice-versa is very fast… close ¬†to the speed of C itself. I am very excited by how well Lua¬†performed in the benchmark. The Lua¬†“Hello World” program performed 3.6 times better than the node.js equivalent.¬†After a quick Google it looks like this isn’t the first time that JavaScript V8¬†has gone up against Lua; these results¬†suggest that SXELua¬†could get even faster after optimization. It looks like Lua¬†will become part of SXE¬†soon. Lua¬†seems ideal for creating tests for SXE¬†& SXELua programs alike, and prototyping programs. Stay tuned‚Ķ!

*1 Update: Somebody who knows JavaScript better than me offered faster code to detect the “\n\r\n\r”. I updated the script above and the resulting queries per second and % speed of SXE.

 

Nginx versus SXE ‚ÄúHello World‚ÄĚ October 2, 2010

After my last post then a colleague offered the criticism that comparing C to a script language is a bit like shooting fish in a barrel¬†ūüôā I think the colleague missed the point which is that often the main reason for choosing to use a scripting language in the first place is to achieve rapid application development at the expense of run-time performance and memory usage. The purpose of the post was to try to dispel this myth and show how few lines of C source code can be necessary to achieve ultimate performance. However, in order to keep the colleague happy, here is a similar head to head between nginx¬†and SXE. What is nginx?¬†Here‚Äôs what Wikipedia says about nginx: ‚ÄúNginx quickly delivers static content with efficient use of system resources.‚ÄĚ Now on with the ‚ÄúHello World‚ÄĚ comparison‚Ķ

Here is the nginx.conf:

# cat /etc/nginx/nginx.conf
worker_processes  1;
events {
    worker_connections  10240;
}
http {
    server {
        listen 8000;
        access_log off;
        server_name  localhost;
        location / {
            root   html;
            index  index.html index.htm;
        }
    }
}

 And here is the index.html file:

# cat /usr/html/index.html
Hello World

I use the same http.c from the previous post in order to load test nginx. Here are the results:

# ./http -i 127.0.0.1 -p 8000 -n 50 -c 10000
20101002 181142.250 P00006a5f ------ 1 - connecting via ramp 10000 sockets to peer 127.0.0.1:8000
20101002 181142.290 P00006a5f    999 1 - connected: 1000
20101002 181142.328 P00006a5f   1999 1 - connected: 2000
20101002 181142.367 P00006a5f   2999 1 - connected: 3000
20101002 181142.406 P00006a5f   3999 1 - connected: 4000
20101002 181142.445 P00006a5f   4999 1 - connected: 5000
20101002 181142.484 P00006a5f   5999 1 - connected: 6000
20101002 181142.523 P00006a5f   6999 1 - connected: 7000
20101002 181142.562 P00006a5f   7999 1 - connected: 8000
20101002 181142.602 P00006a5f   8999 1 - connected: 9000
20101002 181142.641 P00006a5f   9999 1 - connected: 10000
20101002 181142.641 P00006a5f ------ 1 - starting writes: 500000 (= 10000 sockets * 50 queries/socket) queries
20101002 181142.641 P00006a5f ------ 1 - using query of 199 bytes:
20101002 181142.641 P00006a5f ------ 1 - 080552a0 47 45 54 20 2f 31 32 33 34 35 36 37 38 39 2f 31 GET /123456789/1
20101002 181142.641 P00006a5f ------ 1 - 080552b0 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 23456789/1234567
20101002 181142.641 P00006a5f ------ 1 - 080552c0 38 39 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 89/123456789/123
20101002 181142.641 P00006a5f ------ 1 - 080552d0 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 38 39 456789/123456789
20101002 181142.641 P00006a5f ------ 1 - 080552e0 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 /123456789/12345
20101002 181142.641 P00006a5f ------ 1 - 080552f0 36 37 2e 68 74 6d 20 48 54 54 50 2f 31 2e 31 0d 67.htm HTTP/1.1.
20101002 181142.641 P00006a5f ------ 1 - 08055300 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b 65 65 .Connection: Kee
20101002 181142.641 P00006a5f ------ 1 - 08055310 70 2d 41 6c 69 76 65 0d 0a 48 6f 73 74 3a 20 31 p-Alive..Host: 1
20101002 181142.641 P00006a5f ------ 1 - 08055320 32 37 2e 30 2e 30 2e 31 3a 38 30 30 30 0d 0a 55 27.0.0.1:8000..U
20101002 181142.641 P00006a5f ------ 1 - 08055330 73 65 72 2d 41 67 65 6e 74 3a 20 53 58 45 2d 68 ser-Agent: SXE-h
20101002 181142.641 P00006a5f ------ 1 - 08055340 74 74 70 2d 6c 6f 61 64 2d 6b 65 65 70 61 6c 69 ttp-load-keepali
20101002 181142.641 P00006a5f ------ 1 - 08055350 76 65 2f 31 2e 30 0d 0a 41 63 63 65 70 74 3a 20 ve/1.0..Accept:
20101002 181142.641 P00006a5f ------ 1 - 08055360 2a 2f 2a 0d 0a 0d 0a                            */*....
20101002 181202.794 P00006a5f   9128 1 - read all expected http responses
20101002 181202.794 P00006a5f   9128 1 - time for all connections: 0.391057 seconds or 25571.718778 per second
20101002 181202.794 P00006a5f   9128 1 - time for all queries    : 20.152358 seconds or 24810.992567 per second
20101002 181202.794 P00006a5f   9128 1 - time for all            : 20.543415 seconds or 24338.699486 per second

Where nginx manages 25,571 connections per second, the SXE implementation manages 25,009 connections per second; a performance tie. Further, where nginx manages 24,810 queries per second, the SXE implementation manages 59,171 queries per second; a 2.4 fold increase. This is an especially great result for SXE because there is still scope for optimizing its code further.

During the test I also monitored memory usage of both the client and server processes:

# top -b -d1 | egrep "(nginx|http)"
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 44712 5160  692 S    0  0.1   0:18.46 nginx
27231 root      15   0 18064  16m  516 R   79  0.4   0:00.79 http
27216 nobody    16   0 57468  17m  692 R   67  0.4   0:19.13 nginx
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    17   0 60064  20m  692 R   98  0.5   0:20.12 nginx
27231 root      15   0 18064  16m  516 S   58  0.4   0:01.37 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    19   0 60064  20m  692 R   96  0.5   0:21.09 nginx
27231 root      15   0 18064  16m  516 R   68  0.4   0:02.05 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    20   0 60064  20m  692 R   97  0.5   0:22.07 nginx
27231 root      15   0 18064  16m  516 R   64  0.4   0:02.69 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    24   0 60064  20m  692 R   97  0.5   0:23.05 nginx
27231 root      15   0 18064  16m  516 R   66  0.4   0:03.35 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   86  0.5   0:23.91 nginx
27231 root      15   0 18064  16m  516 R   42  0.4   0:03.77 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:24.90 nginx
27231 root      15   0 18064  16m  516 R   50  0.4   0:04.27 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:25.90 nginx
27231 root      15   0 18064  16m  516 R   50  0.4   0:04.77 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R  100  0.5   0:26.91 nginx
27231 root      15   0 18064  16m  516 R   50  0.4   0:05.27 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:27.91 nginx
27231 root      15   0 18064  16m  516 R   54  0.4   0:05.81 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:28.91 nginx
27231 root      15   0 18064  16m  516 R   53  0.4   0:06.34 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:29.91 nginx
27231 root      15   0 18064  16m  516 R   50  0.4   0:06.84 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   99  0.5   0:30.90 nginx
27231 root      15   0 18064  16m  516 R   51  0.4   0:07.35 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R  100  0.5   0:31.91 nginx
27231 root      15   0 18064  16m  516 R   50  0.4   0:07.85 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 D   54  0.5   0:32.45 nginx
27231 root      15   0 18064  16m  516 S   28  0.4   0:08.13 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    19   0 60064  20m  692 R   89  0.5   0:33.35 nginx
27231 root      15   0 18064  16m  516 S   61  0.4   0:08.74 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    21   0 60064  20m  692 R   97  0.5   0:34.33 nginx
27231 root      15   0 18064  16m  516 R   66  0.4   0:09.40 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    22   0 60064  20m  692 R   68  0.5   0:35.01 nginx
27231 root      15   0 18064  16m  516 R   34  0.4   0:09.74 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   98  0.5   0:36.00 nginx
27231 root      15   0 18064  16m  516 S   66  0.4   0:10.40 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 60064  20m  692 R   97  0.5   0:36.98 nginx
27231 root      15   0 18064  16m  516 R   52  0.4   0:10.92 http
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27216 nobody    25   0 44712 5160  692 S   63  0.1   0:37.61 nginx
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx
27215 root      25   0 40788  836  344 S    0  0.0   0:00.00 nginx

Unlike SXE, nginx uses a dynamic memory model and top shows me that peak memory usage is only 20MB which is very similar to the peak memory usage of SXE of 16MB; another tie.

In conclusion, if you’re planning to serve static content and CPU is your bottleneck then using nginx¬†could cause you to¬†employ up to 2.4 times as many servers as if you had implemented with SXE. It would be interesting to create a real static content delivery system using SXE and post a more realistic head to head comparison. If anybody has ideas on what the more realistic head to head comparison might look like then please comment below.

 

node.js versus SXE “Hello World”; complexity, speed, and memory usage October 1, 2010

A new technology that has been given a lot of press lately is node.js¬†which describes¬†itself as “an easy way to build scalable network programs”. Since I’ve designed and — for some time — have been working with some talented colleagues on technology¬†(read: SXE) which is¬†similar using plain old C instead of JavaScript, I thought it might be good to do a head to head comparison in terms of quantity & complexity of source code, run-time performance, and memory usage. What is SXE? SXE is “an easy way to build scalable network programs” but without comprising run-time performance or memory usage. One of the goals of SXE is to make developing in C almost as easy as developing in a high level script language, but how this is achieved is the subject for another time. Now on with the “Hello World” comparison…

Here is the source code for the node.js example¬†“Hello World” HTTP server:


var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8000, "127.0.0.1");

 And here is the equivalent source code implemented using C and SXE:

#include <errno.h>
#include <string.h>
#include "ev.h"
#include "sxe.h"
#include "sxe-log.h"
#include "sxe-util.h"
SXE * listener;
static char canned_reply_keep_alive[] = "HTTP/1.0 200 OK\r\n" "Connection: Keep-Alive\r\n" "Content-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n";
static void event_read(SXE * this, int length) {
    SXE_UNUSED_ARGUMENT(length);
    if (! SXE_BUF_STRNSTR(this,"\r\n\r\n")) { goto SXE_EARLY_OUT; }
    sxe_write(this, (void *)&canned_reply_keep_alive[0], sizeof(canned_reply_keep_alive) - 1);
    SXE_BUF_CLEAR(this);
    SXE_EARLY_OR_ERROR_OUT:
}
int main(int argc, char *argv[]) {
    sxe_register(10100, 0);
    sxe_init();
    listener = sxe_new_tcp(NULL, "127.0.0.1", 8000, NULL, event_read, NULL);
    sxe_listen(listener);
    ev_loop(ev_default_loop(EVFLAG_AUTO), 0);
    return 0;
}

And here is the instrumented version (unlike JavaScript, an advantage of C is that it is able to offer release and more heavily instrumented debug versions of the same code without any run-time performance penalty for the release version) of the code used in the test below:

# cat httpd.c
/* Copyright (c) 2010 Simon Hardy-Francis.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

#include <errno.h>
#include <string.h>

#include "ev.h"
#include "sxe.h"
#include "sxe-log.h"
#include "sxe-util.h"

/**
 * - Example http session (using keep-alive):
 *   - http -i 127.0.0.1 -p 8000 -n 50 -c 10000
 * - Example ab sessions:
 *   - Without keep-alive: ab -n 50000 -c 500    <a href="http://localhost:9090/">http://localhost:9090/</a>
 *   - With    keep-alive: ab -n 50000 -c 500 -k <a href="http://localhost:9090/">http://localhost:9090/</a>
 */

SXE * listener;

static char canned_reply_no_keep_alive[] = "HTTP/1.0 200 OK\r\n" "Connection: Close\r\n"      "Content-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n";
static char canned_reply____keep_alive[] = "HTTP/1.0 200 OK\r\n" "Connection: Keep-Alive\r\n" "Content-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n";

static void
event_close(SXE * this)
{
    SXEE60I("httpd::event_close()");
    SXE_UNUSED_ARGUMENT(this);
    SXEL60I("Peer disconnected; do nothing");
    SXER60I("return");
} /* event_close() */

static void
event_read(SXE * this, int length)
{
    SXEE61I("httpd::event_read(length=%d)", length);
    SXE_UNUSED_ARGUMENT(length);

    if (! SXE_BUF_STRNSTR(this,"\r\n\r\n")) {
        SXEL10I("Read partial header; waiting for remainder to be appended");
        goto SXE_EARLY_OUT;
    }

    if (SXE_BUF_STRNCASESTR(this,"Connection: Keep-Alive")) {
        (void)sxe_write(this, (void *)&canned_reply____keep_alive[0], sizeof(canned_reply____keep_alive) - 1);
        SXEL60I("Connection: Keep-Alive: found");
    }
    else {
        (void)sxe_write(this, (void *)&canned_reply_no_keep_alive[0], sizeof(canned_reply_no_keep_alive) - 1);
        SXEL60I("Connection: Keep-Alive: not found; closing");
        sxe_close(this);
    }

    SXE_BUF_CLEAR(this);

    SXE_EARLY_OR_ERROR_OUT:

    SXER60I("return");
} /* event_read() */

int
main(int argc, char *argv[]) {
    SXE_RETURN result;

    SXE_UNUSED_ARGUMENT(argc);
    SXE_UNUSED_ARGUMENT(argv);
    SXEL60("httpd starting");

    sxe_register(10100, 0);
    SXEA10((result = sxe_init()) == SXE_RETURN_OK, "sxe_init failed");
    SXEA10((listener  = sxe_new_tcp(NULL, "127.0.0.1", 8000, NULL, event_read, event_close)) != NULL, "sxe_new_tcp failed");
    SXEA10((result = sxe_listen(listener)) == SXE_RETURN_OK, "sxe_listen failed");

    SXEL60("httpd calling ev_loop()");
    ev_loop(ev_default_loop(EVFLAG_AUTO), 0);

    SXEL60("httpd exiting");
    return 0;
} /* main() */

When comparing quantity of source code then node.js wins. However, many readers may be surprised at how little C source code is necessary. And if I was to create an SXE deployment service similar to, e.g. Joyent or Heroku, then the C source code main() function and #include statements would disappear leaving the event handler which is barely more lines of code than the node.js counterpart.

Now let’s find out about performance and memory usage. I decided to create a simple HTTP load generator using C and SXE. On the command line¬†I can specify which IP and port to connect to, how many simultaneous TCP sessions to connect, and how many queries to send over each¬†connection. The HTTP load generator first creates all it’s connections, and then starts sending the queries. Here is the source code for http.c:

/* Copyright (c) 2010 Simon Hardy-Francis.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

#include <errno.h>
#include <string.h>
#include <getopt.h>
#include <stdlib.h>

#include "ev.h"
#include "sxe.h"
#include "sxe-log.h"
#include "sxe-util.h"

#define SXE_CONCURRENCY_MAX 10000
#define SXE_WRITE_RAMP      10
#define SXE_CONNECTION_RAMP 16

char     source_ip_address[] = "127.0.0.1";
char     peer_ip_default[] = "127.0.0.1";
char   * peer_ip = peer_ip_default;
int      peer_port = 9090;
SXE    * sender[SXE_CONCURRENCY_MAX];
int      sender_index = 0;
int      connection_index = 0;
int      per_connection_sender_writes[SXE_CONCURRENCY_MAX];
double   per_connection_time_at_connect[SXE_CONCURRENCY_MAX];
double   per_connection_time_at_connected[SXE_CONCURRENCY_MAX];
int      response_count = 0;
int      connect_count = 0;
int      connect_batch_count = 0;
int      sxe_concurrency = 1000;
int      sxe_writes_per_connection = 1;
double   time_at_start;
double   time_at_all_connected;
double   time_at_all_responses;

/*                                              0         10        20        30        40        50        60        70        80 */
static char canned_query____keep_alive[] = "GET /123456789/123456789/123456789/123456789/123456789/123456789/123456789/1234567.htm HTTP/1.1\r\nConnection: Keep-Alive\r\nHost: 127.0.0.1:8000\r\nUser-Agent: SXE-http-load-keepalive/1.0\r\nAccept: */*\r\n\r\n";

static void
event_close(SXE * this)
{
    SXEE60I("http::event_close()");
    SXE_UNUSED_ARGUMENT(this);
    SXEL60I("peer disconnected; do nothing");
    SXER60I("return");
} /* event_close() */

static void event_connect(SXE * this);
static void event_read(SXE * this, int length);

static void
connect_ramp(SXE * this)
{
    int i;

    SXE_UNUSED_ARGUMENT(this);

    SXEE60I("http::connect_ramp()");

    SXEL63I("connecting max %d http connects starting with connection %d of %d", SXE_CONNECTION_RAMP, 1 + connection_index, sxe_concurrency);
    for (i = 0; i < SXE_CONNECTION_RAMP; i++)
    {
        if (connection_index == sxe_concurrency) {
            goto SXE_EARLY_OUT;
        }
        per_connection_sender_writes[connection_index] = 0;
        sender[connection_index] = sxe_new_tcp(NULL, &source_ip_address[0], 0, event_connect, event_read, event_close);
        per_connection_time_at_connect[connection_index] = sxe_get_time_in_seconds();
        sxe_connect(sender[connection_index], peer_ip, peer_port);
        connection_index ++;
    }

    SXE_EARLY_OR_ERROR_OUT:

    SXER60I("return");
} /* connect_ramp() */

static void
write_ramp(SXE * this)
{
    int i;

    SXEE60I("http::write_ramp()");
    SXE_UNUSED_ARGUMENT(this);

    SXEL63I("writing max %d http queries starting with query %d of %d", SXE_WRITE_RAMP, 1 + sender_index, sxe_concurrency);
    for (i = 0; i < SXE_WRITE_RAMP; i++)
    {
        if (sender_index == sxe_concurrency) {
            goto SXE_EARLY_OUT;
        }
        per_connection_sender_writes[sender_index] ++;
        sxe_write(sender[sender_index], canned_query____keep_alive, sizeof(canned_query____keep_alive) - 1);
        sender_index ++;
    }

    SXE_EARLY_OR_ERROR_OUT:

    SXER60I("return");
} /* write_ramp() */

static void
event_connect(SXE * this)
{
    SXEE60I("http::event_connect()");

    per_connection_time_at_connected[SXE_ID(this)] = sxe_get_time_in_seconds();
    double per_connection_seconds_to_connect = per_connection_time_at_connected[SXE_ID(this)] - per_connection_time_at_connect[SXE_ID(this)];
    if (per_connection_seconds_to_connect > 1.0) {
        SXEL11I("finished connection to peer in %f seconds (suspiciously long time)", per_connection_seconds_to_connect);
    }
    else {
        SXEL61I("finished connection to peer in %f seconds", per_connection_seconds_to_connect);
    }
    connect_count ++;
    if ((connect_count % 1000) == 0) {
        SXEL11I("connected: %d", connect_count);
    }
    if (connect_count == sxe_concurrency) {
        time_at_all_connected = sxe_get_time_in_seconds();
        SXEL13("starting writes: %d (= %d sockets * %d queries/socket) queries", sxe_concurrency * sxe_writes_per_connection, sxe_concurrency, sxe_writes_per_connection);
        SXEL11("using query of %d bytes:", strlen(canned_query____keep_alive));
        SXED10(canned_query____keep_alive, strlen(canned_query____keep_alive));
        write_ramp (NULL);
    }

    connect_batch_count ++;
    if (connect_batch_count == SXE_CONNECTION_RAMP) {
        connect_batch_count = 0;
        connect_ramp (NULL);
    }

    SXE_EARLY_OR_ERROR_OUT:

    SXER60I("return");
} /* event_connect() */
static void
event_read(SXE * this, int length)
{
    SXEE61I("http::event_read(length=%d)", length);
    SXE_UNUSED_ARGUMENT(length);

    if (sender_index < sxe_concurrency) {
        write_ramp(this);
    }

    if (! SXE_BUF_STRNSTR(this,"\r\n\r\n")) {
        SXEL10I("read partial header; waiting for remainder to be appended");
        goto SXE_EARLY_OUT;
    }

    if (per_connection_sender_writes[SXE_ID(this)] < sxe_writes_per_connection) {
        per_connection_sender_writes[SXE_ID(this)] ++;
        sxe_write(this, canned_query____keep_alive, sizeof(canned_query____keep_alive) - 1);
    }

    response_count ++;

    if (response_count == (sxe_concurrency * sxe_writes_per_connection)) {
        SXEL10I("read all expected http responses");
        time_at_all_responses = sxe_get_time_in_seconds();
        double seconds_for_connections = (time_at_all_connected - time_at_start        );
        double seconds_for_responses   = (time_at_all_responses - time_at_all_connected);
        double seconds_for_all         = seconds_for_connections + seconds_for_responses;
        SXEL12I("time for all connections: %f seconds or %f per second", seconds_for_connections, (sxe_concurrency                            ) / seconds_for_connections);
        SXEL12I("time for all queries    : %f seconds or %f per second", seconds_for_responses  , (sxe_concurrency * sxe_writes_per_connection) / seconds_for_responses  );
        SXEL12I("time for all            : %f seconds or %f per second", seconds_for_all        , (sxe_concurrency * sxe_writes_per_connection) / seconds_for_all        );
        exit(0);
    }

    SXE_BUF_CLEAR(this);

    SXE_EARLY_OR_ERROR_OUT:

    SXER60I("return");
} /* event_read() */

static void
usage(void)
{
    fprintf(stderr, "Usage   : http [-i ip] [-p port] [-n queries per socket] [-c sockets]\n");
    fprintf(stderr, "Defaults: http -i %s -p %d -n %d -c %d\n", peer_ip, peer_port, sxe_writes_per_connection, sxe_concurrency);
    exit(2);
} /* usage() */

int
main(int argc, char *argv[])
{
    int c;
    (void) argc;
    (void) argv;

    SXEL60("http starting");

    if (argc == 1) {
        usage();
    }

    while ((c = getopt(argc, argv, "i:p:n:c:")) != -1) {
        switch (c) {
        case 'i': peer_ip = optarg;
                  break;
        case 'p': peer_port = atoi(optarg);
                  if (peer_port < 1    ) { SXEL10("ERROR: -p must be >= 1"   ); }
                  if (peer_port > 65535) { SXEL10("ERROR: -p must be < 65536"); }
                  break;
        case 'n': sxe_writes_per_connection = atoi(optarg);
                  if (sxe_writes_per_connection < 1) { SXEL10("ERROR: -n must be >= 1"); }
                  break;
        case 'c': sxe_concurrency = atoi(optarg);
                  if (sxe_concurrency < 1                  ) { SXEL10("ERROR: -c must be >= 1"                     ); }
                  if (sxe_concurrency > SXE_CONCURRENCY_MAX) { SXEL11("ERROR: -c must be < %d", SXE_CONCURRENCY_MAX); }
                  break;
        default:  usage();
        }
    }

    sxe_register(1 + SXE_CONCURRENCY_MAX, 0);

    sxe_init();

    SXEL13("connecting via ramp %d sockets to peer %s:%d", sxe_concurrency, peer_ip, peer_port);
    time_at_start = sxe_get_time_in_seconds();
    connect_ramp(NULL);

    SXEL60("http calling ev_loop()");
    ev_loop(ev_default_loop(EVFLAG_AUTO), 0);

    SXEL60("http exiting");
    return 0;
} /* main() */

So I compiled http.c and used it to test the node.js example “Hello World” server. Here are the results:

# ./http -i 127.0.0.1 -p 8000 -n 50 -c 10000
20100929 215440.709 P00002d94 ------ 1 - connecting via ramp 10000 sockets to peer 127.0.0.1:8000
20100929 215443.718 P00002d94    184 1 - finished connection to peer in 3.000193 seconds (suspiciously long time)
20100929 215443.718 P00002d94    185 1 - finished connection to peer in 3.000282 seconds (suspiciously long time)
20100929 215443.718 P00002d94    187 1 - finished connection to peer in 3.000287 seconds (suspiciously long time)
20100929 215443.718 P00002d94    188 1 - finished connection to peer in 3.000316 seconds (suspiciously long time)
20100929 215443.718 P00002d94    190 1 - finished connection to peer in 3.000316 seconds (suspiciously long time)
20100929 215446.725 P00002d94    446 1 - finished connection to peer in 2.999944 seconds (suspiciously long time)
20100929 215449.741 P00002d94    756 1 - finished connection to peer in 3.000473 seconds (suspiciously long time)
20100929 215449.741 P00002d94    758 1 - finished connection to peer in 3.000475 seconds (suspiciously long time)
20100929 215449.741 P00002d94    760 1 - finished connection to peer in 3.000446 seconds (suspiciously long time)
20100929 215449.741 P00002d94    761 1 - finished connection to peer in 3.000442 seconds (suspiciously long time)
20100929 215449.741 P00002d94    763 1 - finished connection to peer in 3.000420 seconds (suspiciously long time)
20100929 215449.741 P00002d94    765 1 - finished connection to peer in 3.000389 seconds (suspiciously long time)
20100929 215449.741 P00002d94    767 1 - finished connection to peer in 3.000359 seconds (suspiciously long time)
20100929 215449.748 P00002d94    999 1 - connected: 1000
20100929 215452.751 P00002d94   1132 1 - finished connection to peer in 2.999929 seconds (suspiciously long time)
20100929 215452.751 P00002d94   1134 1 - finished connection to peer in 2.999924 seconds (suspiciously long time)
20100929 215452.751 P00002d94   1135 1 - finished connection to peer in 2.999931 seconds (suspiciously long time)
20100929 215455.763 P00002d94   1396 1 - finished connection to peer in 2.999883 seconds (suspiciously long time)
20100929 215455.763 P00002d94   1397 1 - finished connection to peer in 2.999903 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1398 1 - finished connection to peer in 2.999897 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1399 1 - finished connection to peer in 2.999888 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1400 1 - finished connection to peer in 2.999875 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1401 1 - finished connection to peer in 2.999867 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1402 1 - finished connection to peer in 2.999858 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1403 1 - finished connection to peer in 2.999850 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1404 1 - finished connection to peer in 2.999842 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1405 1 - finished connection to peer in 2.999832 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1406 1 - finished connection to peer in 2.999825 seconds (suspiciously long time)
20100929 215455.764 P00002d94   1407 1 - finished connection to peer in 2.999818 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1686 1 - finished connection to peer in 3.000733 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1688 1 - finished connection to peer in 3.000710 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1689 1 - finished connection to peer in 3.000703 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1691 1 - finished connection to peer in 3.000672 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1693 1 - finished connection to peer in 3.000650 seconds (suspiciously long time)
20100929 215458.772 P00002d94   1694 1 - finished connection to peer in 3.000642 seconds (suspiciously long time)
20100929 215458.781 P00002d94   1999 1 - connected: 2000
20100929 215501.784 P00002d94   2091 1 - finished connection to peer in 3.000436 seconds (suspiciously long time)
20100929 215501.784 P00002d94   2093 1 - finished connection to peer in 3.000445 seconds (suspiciously long time)
20100929 215501.784 P00002d94   2094 1 - finished connection to peer in 3.000451 seconds (suspiciously long time)
20100929 215504.793 P00002d94   2397 1 - finished connection to peer in 3.000772 seconds (suspiciously long time)
20100929 215504.793 P00002d94   2399 1 - finished connection to peer in 3.000767 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2598 1 - finished connection to peer in 2.999878 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2599 1 - finished connection to peer in 2.999888 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2600 1 - finished connection to peer in 2.999882 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2601 1 - finished connection to peer in 2.999873 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2602 1 - finished connection to peer in 2.999866 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2603 1 - finished connection to peer in 2.999860 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2604 1 - finished connection to peer in 2.999851 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2605 1 - finished connection to peer in 2.999848 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2606 1 - finished connection to peer in 2.999843 seconds (suspiciously long time)
20100929 215507.803 P00002d94   2607 1 - finished connection to peer in 2.999832 seconds (suspiciously long time)
20100929 215510.812 P00002d94   2952 1 - finished connection to peer in 2.999587 seconds (suspiciously long time)
20100929 215510.812 P00002d94   2954 1 - finished connection to peer in 2.999570 seconds (suspiciously long time)
20100929 215510.812 P00002d94   2956 1 - finished connection to peer in 2.999538 seconds (suspiciously long time)
20100929 215510.812 P00002d94   2958 1 - finished connection to peer in 2.999508 seconds (suspiciously long time)
20100929 215510.816 P00002d94   2999 1 - connected: 3000
20100929 215513.823 P00002d94   3304 1 - finished connection to peer in 2.999448 seconds (suspiciously long time)
20100929 215513.824 P00002d94   3306 1 - finished connection to peer in 3.000435 seconds (suspiciously long time)
20100929 215513.824 P00002d94   3308 1 - finished connection to peer in 3.000412 seconds (suspiciously long time)
20100929 215513.824 P00002d94   3309 1 - finished connection to peer in 3.000419 seconds (suspiciously long time)
20100929 215513.824 P00002d94   3311 1 - finished connection to peer in 3.000398 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3606 1 - finished connection to peer in 2.999794 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3607 1 - finished connection to peer in 2.999819 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3608 1 - finished connection to peer in 2.999822 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3609 1 - finished connection to peer in 2.999820 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3610 1 - finished connection to peer in 2.999812 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3611 1 - finished connection to peer in 2.999810 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3612 1 - finished connection to peer in 2.999814 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3613 1 - finished connection to peer in 2.999805 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3614 1 - finished connection to peer in 2.999794 seconds (suspiciously long time)
20100929 215516.834 P00002d94   3615 1 - finished connection to peer in 2.999796 seconds (suspiciously long time)
20100929 215519.844 P00002d94   3978 1 - finished connection to peer in 2.999932 seconds (suspiciously long time)
20100929 215519.844 P00002d94   3979 1 - finished connection to peer in 2.999958 seconds (suspiciously long time)
20100929 215519.844 P00002d94   3982 1 - finished connection to peer in 2.999906 seconds (suspiciously long time)
20100929 215519.844 P00002d94   3983 1 - finished connection to peer in 2.999900 seconds (suspiciously long time)
20100929 215519.845 P00002d94   3999 1 - connected: 4000
20100929 215522.853 P00002d94   4287 1 - finished connection to peer in 2.999896 seconds (suspiciously long time)
20100929 215525.861 P00002d94   4590 1 - finished connection to peer in 2.999847 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4901 1 - finished connection to peer in 3.000493 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4902 1 - finished connection to peer in 3.000520 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4904 1 - finished connection to peer in 3.000509 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4906 1 - finished connection to peer in 3.000480 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4907 1 - finished connection to peer in 3.000472 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4909 1 - finished connection to peer in 3.000441 seconds (suspiciously long time)
20100929 215528.870 P00002d94   4911 1 - finished connection to peer in 3.000409 seconds (suspiciously long time)
20100929 215528.872 P00002d94   4999 1 - connected: 5000
20100929 215531.878 P00002d94   5211 1 - finished connection to peer in 3.000253 seconds (suspiciously long time)
20100929 215531.878 P00002d94   5212 1 - finished connection to peer in 3.000276 seconds (suspiciously long time)
20100929 215531.878 P00002d94   5213 1 - finished connection to peer in 3.000270 seconds (suspiciously long time)
20100929 215531.878 P00002d94   5214 1 - finished connection to peer in 3.000265 seconds (suspiciously long time)
20100929 215531.878 P00002d94   5215 1 - finished connection to peer in 3.000269 seconds (suspiciously long time)
20100929 215534.889 P00002d94   5582 1 - finished connection to peer in 3.000186 seconds (suspiciously long time)
20100929 215537.898 P00002d94   5887 1 - finished connection to peer in 3.000733 seconds (suspiciously long time)
20100929 215537.901 P00002d94   5999 1 - connected: 6000
20100929 215540.907 P00002d94   6200 1 - finished connection to peer in 3.000404 seconds (suspiciously long time)
20100929 215540.907 P00002d94   6202 1 - finished connection to peer in 3.000402 seconds (suspiciously long time)
20100929 215540.907 P00002d94   6204 1 - finished connection to peer in 3.000378 seconds (suspiciously long time)
20100929 215540.907 P00002d94   6206 1 - finished connection to peer in 3.000350 seconds (suspiciously long time)
20100929 215540.907 P00002d94   6207 1 - finished connection to peer in 3.000347 seconds (suspiciously long time)
20100929 215543.916 P00002d94   6505 1 - finished connection to peer in 3.000724 seconds (suspiciously long time)
20100929 215543.916 P00002d94   6507 1 - finished connection to peer in 3.000727 seconds (suspiciously long time)
20100929 215543.916 P00002d94   6509 1 - finished connection to peer in 3.000703 seconds (suspiciously long time)
20100929 215543.916 P00002d94   6511 1 - finished connection to peer in 3.000674 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6762 1 - finished connection to peer in 2.999642 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6763 1 - finished connection to peer in 2.999665 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6764 1 - finished connection to peer in 2.999662 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6765 1 - finished connection to peer in 2.999657 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6766 1 - finished connection to peer in 2.999656 seconds (suspiciously long time)
20100929 215546.922 P00002d94   6767 1 - finished connection to peer in 2.999651 seconds (suspiciously long time)
20100929 215546.929 P00002d94   6999 1 - connected: 7000
20100929 215549.933 P00002d94   7135 1 - finished connection to peer in 3.000380 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7524 1 - finished connection to peer in 3.000323 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7526 1 - finished connection to peer in 3.000320 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7528 1 - finished connection to peer in 3.000304 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7530 1 - finished connection to peer in 3.000274 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7532 1 - finished connection to peer in 3.000242 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7533 1 - finished connection to peer in 3.000236 seconds (suspiciously long time)
20100929 215552.944 P00002d94   7535 1 - finished connection to peer in 3.000204 seconds (suspiciously long time)
20100929 215555.955 P00002d94   7931 1 - finished connection to peer in 2.999945 seconds (suspiciously long time)
20100929 215555.955 P00002d94   7933 1 - finished connection to peer in 2.999942 seconds (suspiciously long time)
20100929 215555.955 P00002d94   7935 1 - finished connection to peer in 2.999919 seconds (suspiciously long time)
20100929 215555.957 P00002d94   7999 1 - connected: 8000
20100929 215558.966 P00002d94   8328 1 - finished connection to peer in 3.000006 seconds (suspiciously long time)
20100929 215558.966 P00002d94   8330 1 - finished connection to peer in 3.000006 seconds (suspiciously long time)
20100929 215558.966 P00002d94   8331 1 - finished connection to peer in 3.000007 seconds (suspiciously long time)
20100929 215558.966 P00002d94   8333 1 - finished connection to peer in 2.999987 seconds (suspiciously long time)
20100929 215558.966 P00002d94   8335 1 - finished connection to peer in 2.999958 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8724 1 - finished connection to peer in 3.000369 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8726 1 - finished connection to peer in 3.000387 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8728 1 - finished connection to peer in 3.000369 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8730 1 - finished connection to peer in 3.000336 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8732 1 - finished connection to peer in 3.000306 seconds (suspiciously long time)
20100929 215601.978 P00002d94   8734 1 - finished connection to peer in 3.000278 seconds (suspiciously long time)
20100929 215601.985 P00002d94   8999 1 - connected: 9000
20100929 215604.986 P00002d94   9006 1 - finished connection to peer in 3.000625 seconds (suspiciously long time)
20100929 215604.986 P00002d94   9007 1 - finished connection to peer in 3.000642 seconds (suspiciously long time)
20100929 215607.997 P00002d94   9390 1 - finished connection to peer in 3.000640 seconds (suspiciously long time)
20100929 215611.003 P00002d94   9579 1 - finished connection to peer in 3.000221 seconds (suspiciously long time)
20100929 215611.003 P00002d94   9580 1 - finished connection to peer in 3.000249 seconds (suspiciously long time)
20100929 215611.003 P00002d94   9581 1 - finished connection to peer in 3.000256 seconds (suspiciously long time)
20100929 215611.003 P00002d94   9582 1 - finished connection to peer in 3.000249 seconds (suspiciously long time)
20100929 215611.003 P00002d94   9583 1 - finished connection to peer in 3.000238 seconds (suspiciously long time)
20100929 215614.014 P00002d94   9966 1 - finished connection to peer in 3.000609 seconds (suspiciously long time)
20100929 215614.015 P00002d94   9999 1 - connected: 10000
20100929 215614.015 P00002d94 ------ 1 - starting writes: 500000 (= 10000 sockets * 50 queries/socket) queries
20100929 215614.015 P00002d94 ------ 1 - using query of 198 bytes:
20100929 215614.015 P00002d94 ------ 1 - 080562c0 47 45 54 20 2f 31 32 33 34 35 36 37 38 39 2f 31 GET /123456789/1
20100929 215614.015 P00002d94 ------ 1 - 080562d0 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 23456789/1234567
20100929 215614.015 P00002d94 ------ 1 - 080562e0 38 39 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 89/123456789/123
20100929 215614.015 P00002d94 ------ 1 - 080562f0 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 38 39 456789/123456789
20100929 215614.015 P00002d94 ------ 1 - 08056300 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 /123456789/12345
20100929 215614.015 P00002d94 ------ 1 - 08056310 36 37 2e 68 74 6d 20 48 54 54 50 2f 31 2e 31 0d 67.htm HTTP/1.1.
20100929 215614.015 P00002d94 ------ 1 - 08056320 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b 65 65 .Connection: Kee
20100929 215614.015 P00002d94 ------ 1 - 08056330 70 2d 41 6c 69 76 65 0d 0a 48 6f 73 74 3a 20 31 p-Alive..Host: 1
20100929 215614.015 P00002d94 ------ 1 - 08056340 32 37 2e 30 2e 30 2e 31 3a 38 30 30 30 0d 0a 55 27.0.0.1:8000..U
20100929 215614.015 P00002d94 ------ 1 - 08056350 73 65 72 2d 41 67 65 6e 74 3a 20 53 58 45 2d 68 ser-Agent: SXE-h
20100929 215614.015 P00002d94 ------ 1 - 08056360 74 74 70 2d 6c 6f 61 64 2d 6b 65 65 70 61 6c 69 ttp-load-keepali
20100929 215614.015 P00002d94 ------ 1 - 08056370 76 65 2f 31 2e 30 0d 0a 41 63 63 65 70 74 3a 20 ve/1.0..Accept:
20100929 215614.015 P00002d94 ------ 1 - 08056380 2a 2f 2a 0d 0a 0d                               */*...
20100929 215654.519 P00002d94   4010 1 - read all expected http responses
20100929 215654.519 P00002d94   4010 1 - time for all connections: 93.305586 seconds or 107.174719 per second
20100929 215654.519 P00002d94   4010 1 - time for all queries    : 40.504647 seconds or 12344.262617 per second
20100929 215654.519 P00002d94   4010 1 - time for all            : 133.810233 seconds or 3736.634997 per second

On the positive side, the node.js example “Hello World” server managed a respectable¬†12,344 queries per second at a concurrency of 10,000 connections. On the negative side, there seems to be some kind of bug with node.js concerning handling connections because node only managed 107 connections per second. Also, 134 out of the 10,000 connections decided to take about 3 seconds to connect.

During the test I also monitored memory usage of both the client and server processes:

# top -b -d1 | egrep "(node|http)"
11665 root      18   0  628m 9020 5100 S    0  0.2   0:00.05 node
11665 root      18   0  628m 9020 5100 S    0  0.2   0:00.05 node
11665 root      18   0  628m 9020 5100 S    0  0.2   0:00.05 node
11665 root      18   0  629m  10m 5108 S    1  0.3   0:00.06 node
11668 root      17   0 18032  16m  524 S    1  0.4   0:00.01 http
11665 root      18   0  629m  10m 5108 S    0  0.3   0:00.06 node
11668 root      17   0 18032  16m  524 S    0  0.4   0:00.01 http
11665 root      18   0  629m  10m 5108 S    0  0.3   0:00.06 node
11668 root      17   0 18032  16m  524 S    0  0.4   0:00.01 http
11665 root      15   0  629m  12m 5108 S    2  0.3   0:00.08 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.02 http
11665 root      15   0  629m  12m 5108 S    0  0.3   0:00.08 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.02 http
11665 root      15   0  629m  12m 5108 S    0  0.3   0:00.08 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.02 http
11665 root      15   0  633m  17m 5108 S    2  0.4   0:00.10 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.03 http
11665 root      15   0  633m  17m 5108 S    0  0.4   0:00.10 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.03 http
11665 root      15   0  633m  17m 5108 S    0  0.4   0:00.10 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.03 http
11665 root      15   0  633m  18m 5108 S    1  0.5   0:00.11 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.04 http
11665 root      15   0  633m  18m 5108 S    0  0.5   0:00.11 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.04 http
11665 root      15   0  633m  18m 5108 S    0  0.5   0:00.11 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.04 http
11665 root      16   0  635m  21m 5108 S    3  0.5   0:00.14 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.04 http
11665 root      16   0  635m  21m 5108 S    0  0.5   0:00.14 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.04 http
11665 root      16   0  635m  21m 5108 S    0  0.5   0:00.14 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.04 http
11665 root      15   0  636m  24m 5108 S    2  0.6   0:00.16 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.05 http
11665 root      15   0  636m  24m 5108 S    0  0.6   0:00.16 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.05 http
11665 root      15   0  636m  24m 5108 S    0  0.6   0:00.16 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.05 http
11665 root      15   0  636m  25m 5108 S    2  0.6   0:00.18 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.06 http
11665 root      15   0  636m  25m 5108 S    0  0.6   0:00.18 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.06 http
11665 root      15   0  636m  25m 5108 S    0  0.6   0:00.18 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.06 http
11665 root      15   0  636m  27m 5108 S    1  0.7   0:00.19 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.07 http
11665 root      15   0  636m  27m 5108 S    0  0.7   0:00.19 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.07 http
11665 root      15   0  636m  27m 5108 S    0  0.7   0:00.19 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.07 http
11665 root      15   0  643m  35m 5108 S    3  0.9   0:00.22 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.08 http
11665 root      15   0  643m  35m 5108 S    0  0.9   0:00.22 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.08 http
11665 root      15   0  643m  35m 5108 S    0  0.9   0:00.22 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.08 http
11665 root      15   0  643m  35m 5108 S    3  0.9   0:00.25 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.09 http
11665 root      15   0  643m  35m 5108 S    0  0.9   0:00.25 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.09 http
11665 root      15   0  643m  35m 5108 S    0  0.9   0:00.25 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.09 http
11665 root      16   0  643m  38m 5108 S    1  1.0   0:00.26 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.10 http
11665 root      16   0  643m  38m 5108 S    0  1.0   0:00.26 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.10 http
11665 root      16   0  643m  38m 5108 S    0  1.0   0:00.26 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.10 http
11665 root      15   0  644m  40m 5108 S    2  1.0   0:00.28 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.10 http
11665 root      15   0  644m  40m 5108 S    0  1.0   0:00.28 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.10 http
11665 root      15   0  644m  40m 5108 S    0  1.0   0:00.28 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.10 http
11665 root      16   0  644m  40m 5108 S    2  1.0   0:00.30 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.11 http
11665 root      15   0  644m  40m 5108 S    0  1.0   0:00.30 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.11 http
11665 root      15   0  644m  40m 5108 S    0  1.0   0:00.30 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.11 http
11665 root      15   0  644m  43m 5108 S    2  1.1   0:00.32 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.12 http
11665 root      15   0  644m  43m 5108 S    0  1.1   0:00.32 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.12 http
11665 root      15   0  644m  43m 5108 S    0  1.1   0:00.32 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.12 http
11665 root      15   0  644m  45m 5108 S    1  1.1   0:00.33 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.13 http
11665 root      15   0  644m  45m 5108 S    0  1.1   0:00.33 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.13 http
11665 root      15   0  644m  45m 5108 S    0  1.1   0:00.33 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.13 http
11665 root      15   0  644m  47m 5108 S    2  1.2   0:00.35 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.14 http
11665 root      15   0  644m  47m 5108 S    0  1.2   0:00.35 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.14 http
11665 root      15   0  644m  47m 5108 S    0  1.2   0:00.35 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.14 http
11665 root      15   0  652m  56m 5108 S    4  1.4   0:00.39 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.15 http
11665 root      15   0  652m  56m 5108 S    0  1.4   0:00.39 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.15 http
11665 root      15   0  652m  56m 5108 S    0  1.4   0:00.39 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.15 http
11665 root      15   0  652m  57m 5108 S    1  1.4   0:00.40 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.15 http
11665 root      15   0  652m  57m 5108 S    0  1.4   0:00.40 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.15 http
11665 root      15   0  652m  57m 5108 S    0  1.4   0:00.40 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.15 http
11665 root      15   0  652m  59m 5108 S    2  1.5   0:00.42 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.16 http
11665 root      15   0  652m  59m 5108 S    0  1.5   0:00.42 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.16 http
11665 root      15   0  652m  59m 5108 S    0  1.5   0:00.42 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.16 http
11665 root      15   0  652m  61m 5108 S    1  1.6   0:00.43 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.17 http
11665 root      15   0  652m  61m 5108 S    0  1.6   0:00.43 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.17 http
11665 root      15   0  652m  61m 5108 S    0  1.6   0:00.43 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.17 http
11665 root      15   0  652m  64m 5108 S    2  1.6   0:00.45 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.18 http
11665 root      15   0  652m  64m 5108 S    0  1.6   0:00.45 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.18 http
11665 root      15   0  652m  64m 5108 S    0  1.6   0:00.45 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.18 http
11665 root      16   0  653m  64m 5108 S    7  1.6   0:00.52 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.18 http
11665 root      15   0  653m  64m 5108 S    0  1.6   0:00.52 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.18 http
11665 root      15   0  653m  64m 5108 S    0  1.6   0:00.52 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.18 http
11665 root      16   0  665m  76m 5108 S    5  1.9   0:00.57 node
11668 root      18   0 18032  16m  552 S    1  0.4   0:00.19 http
11665 root      15   0  665m  76m 5108 S    0  1.9   0:00.57 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.19 http
11665 root      15   0  665m  76m 5108 S    0  1.9   0:00.57 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.19 http
11665 root      15   0  665m  76m 5108 S    2  1.9   0:00.59 node
11668 root      18   0 18032  16m  552 S    2  0.4   0:00.21 http
11665 root      15   0  665m  76m 5108 S    0  1.9   0:00.59 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.21 http
11665 root      15   0  665m  76m 5108 S    0  1.9   0:00.59 node
11668 root      18   0 18032  16m  552 S    0  0.4   0:00.21 http
11665 root      16   0  665m  77m 5108 S    1  1.9   0:00.60 node
11668 root      18   0 18220  16m  552 S    1  0.4   0:00.22 http
11665 root      15   0  665m  77m 5108 S    0  1.9   0:00.60 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.22 http
11665 root      15   0  665m  77m 5108 S    0  1.9   0:00.60 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.22 http
11665 root      15   0  666m  77m 5108 S    3  2.0   0:00.63 node
11668 root      18   0 18220  16m  552 S    1  0.4   0:00.23 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.63 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.23 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.63 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.23 http
11665 root      15   0  666m  77m 5108 S    1  2.0   0:00.64 node
11668 root      18   0 18220  16m  552 S    1  0.4   0:00.24 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.64 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.24 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.64 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.24 http
11665 root      16   0  666m  77m 5108 S    8  2.0   0:00.72 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.24 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.72 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.24 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.72 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.24 http
11665 root      15   0  666m  77m 5108 S    2  2.0   0:00.74 node
11668 root      18   0 18220  16m  552 S    1  0.4   0:00.25 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.74 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.25 http
11665 root      15   0  666m  77m 5108 S    0  2.0   0:00.74 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.25 http
11665 root      15   0  678m  89m 5108 S    5  2.3   0:00.79 node
11668 root      18   0 18220  16m  552 S    2  0.4   0:00.27 http
11665 root      15   0  678m  89m 5108 S    0  2.3   0:00.79 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.27 http
11665 root      15   0  678m  89m 5108 S    0  2.3   0:00.79 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.27 http
11665 root      15   0  678m  89m 5108 S    1  2.3   0:00.80 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.27 http
11665 root      15   0  678m  89m 5108 S    0  2.3   0:00.80 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.27 http
11665 root      15   0  678m  89m 5108 S    0  2.3   0:00.80 node
11668 root      18   0 18220  16m  552 S    0  0.4   0:00.27 http
11665 root      16   0  695m 105m 5120 R   66  2.7   0:01.46 node
11668 root      15   0 18220  16m  552 S   21  0.4   0:00.48 http
11665 root      17   0  711m 121m 5120 R  100  3.1   0:02.47 node
11668 root      15   0 18220  16m  552 S   24  0.4   0:00.72 http
11665 root      19   0  728m 138m 5120 R  100  3.5   0:03.48 node
11668 root      15   0 18220  16m  552 S   26  0.4   0:00.98 http
11665 root      21   0  736m 146m 5120 R   99  3.7   0:04.48 node
11668 root      15   0 18220  16m  552 S   13  0.4   0:01.11 http
11665 root      25   0  736m 146m 5120 R  100  3.7   0:05.49 node
11668 root      15   0 18220  16m  552 S   13  0.4   0:01.24 http
11665 root      25   0  737m 147m 5120 R   99  3.7   0:06.49 node
11668 root      15   0 18220  16m  552 S   11  0.4   0:01.35 http
11665 root      25   0  737m 147m 5120 R   99  3.7   0:07.49 node
11668 root      15   0 18220  16m  552 R   12  0.4   0:01.47 http
11665 root      25   0  717m 127m 5124 R   99  3.2   0:08.48 node
11668 root      15   0 18220  16m  552 S   13  0.4   0:01.60 http
11665 root      25   0  734m 143m 5124 R   99  3.6   0:09.48 node
11668 root      15   0 18220  16m  552 S   12  0.4   0:01.72 http
11665 root      25   0  737m 146m 5124 R  100  3.7   0:10.49 node
11668 root      15   0 18220  16m  552 S    8  0.4   0:01.80 http
11665 root      25   0  737m 146m 5124 R   99  3.7   0:11.49 node
11668 root      15   0 18220  16m  552 S   13  0.4   0:01.93 http
11665 root      25   0  737m 146m 5124 R   99  3.7   0:12.49 node
11668 root      15   0 18220  16m  552 R   11  0.4   0:02.04 http
11665 root      25   0  737m 146m 5124 R  101  3.7   0:13.50 node
11668 root      15   0 18220  16m  552 R   10  0.4   0:02.14 http
11665 root      25   0  737m 146m 5124 R   96  3.7   0:14.46 node
11668 root      15   0 18220  16m  552 R   16  0.4   0:02.30 http
11665 root      25   0  737m 146m 5124 R   84  3.7   0:15.30 node
11668 root      15   0 18220  16m  552 S   17  0.4   0:02.47 http
11665 root      25   0  737m 146m 5124 R   82  3.7   0:16.12 node
11668 root      15   0 18220  16m  552 S   18  0.4   0:02.65 http
11665 root      25   0  738m 147m 5124 R   94  3.7   0:17.06 node
11668 root      15   0 18220  16m  552 R   18  0.4   0:02.83 http
11665 root      25   0  738m 147m 5124 R   98  3.7   0:18.05 node
11668 root      15   0 18220  16m  552 S   21  0.4   0:03.04 http
11665 root      25   0  738m 148m 5124 R   97  3.7   0:19.03 node
11668 root      15   0 18220  16m  552 S   17  0.4   0:03.21 http
11665 root      25   0  738m 148m 5124 R   97  3.7   0:20.00 node
11668 root      15   0 18220  16m  552 S   20  0.4   0:03.41 http
11665 root      25   0  738m 147m 5124 R  100  3.7   0:21.01 node
11668 root      15   0 18220  16m  552 R   18  0.4   0:03.59 http
11665 root      25   0  738m 148m 5124 R  100  3.7   0:22.02 node
11668 root      15   0 18220  16m  552 S   29  0.4   0:03.88 http
11665 root      25   0  738m 148m 5124 R   99  3.7   0:23.02 node
11668 root      15   0 18220  16m  552 S   26  0.4   0:04.14 http
11665 root      25   0  738m 148m 5124 R  100  3.7   0:24.03 node
11668 root      15   0 18220  16m  552 R   32  0.4   0:04.46 http
11665 root      25   0  738m 148m 5124 R  100  3.7   0:25.04 node
11668 root      15   0 18220  16m  552 R   27  0.4   0:04.73 http
11665 root      25   0  738m 148m 5124 R   99  3.7   0:26.04 node
11668 root      15   0 18220  16m  552 S   28  0.4   0:05.01 http
11665 root      25   0  737m 147m 5124 R  100  3.7   0:27.05 node
11668 root      15   0 18220  16m  552 S   28  0.4   0:05.29 http
11665 root      25   0  737m 147m 5124 R  100  3.7   0:28.06 node
11668 root      15   0 18220  16m  552 S   30  0.4   0:05.59 http
11665 root      25   0  738m 148m 5124 R   99  3.7   0:29.06 node
11668 root      15   0 18220  16m  552 S   28  0.4   0:05.87 http
11665 root      25   0  719m 129m 5124 R  100  3.3   0:30.07 node
11668 root      15   0 18220  16m  552 R   21  0.4   0:06.08 http
11665 root      25   0  735m 145m 5124 R   99  3.7   0:31.07 node
11668 root      15   0 18220  16m  552 S   30  0.4   0:06.38 http
11665 root      25   0  735m 145m 5124 R   99  3.7   0:32.07 node
11668 root      15   0 18220  16m  552 S   26  0.4   0:06.64 http
11665 root      25   0  735m 145m 5124 R  100  3.7   0:33.08 node
11668 root      15   0 18220  16m  552 S   32  0.4   0:06.96 http
11665 root      25   0  737m 147m 5124 R  100  3.7   0:34.09 node
11668 root      15   0 18220  16m  552 R   27  0.4   0:07.23 http
11665 root      25   0  737m 147m 5124 R   99  3.7   0:35.09 node
11668 root      15   0 18220  16m  552 R   31  0.4   0:07.54 http
11665 root      25   0  737m 147m 5124 R  100  3.7   0:36.10 node
11668 root      15   0 18220  16m  552 S   28  0.4   0:07.82 http
11665 root      25   0  737m 147m 5124 R   96  3.7   0:37.07 node
11668 root      15   0 18220  16m  552 R   21  0.4   0:08.03 http
11665 root      25   0  737m 147m 5124 R   99  3.7   0:38.07 node
11668 root      15   0 18220  16m  552 S   25  0.4   0:08.28 http
11665 root      25   0  737m 147m 5124 R  100  3.7   0:39.08 node
11668 root      15   0 18220  16m  552 S   27  0.4   0:08.55 http
11665 root      25   0  738m 147m 5124 R   99  3.7   0:40.08 node
11668 root      15   0 18220  16m  552 R   24  0.4   0:08.79 http
11665 root      25   0  735m 117m 5124 R   99  3.0   0:41.08 node
11665 root      25   0  678m  57m 5124 S   29  1.5   0:41.37 node
11665 root      25   0  678m  57m 5124 S    0  1.5   0:41.37 node
11665 root      25   0  678m  57m 5124 S    0  1.5   0:41.37 node

During the connection part of the load test there was little CPU used (the possible node.js bug?) but memory of the node process steadily increased from about 9MB to about 89MB. This means that node is allocating about 8KB of dynamic memory per connection. During the query part of the load test the node process gets close to 100% CPU which is good. Also, the memory for the node process peak at about 148MB. This means that — in addition to the 8KB of dynamic memory already allocated per connection — upon receiving the query node allocates about an additional 6KB per connection; so about 14KB per connection in total.

Now I test the speed of the SXE example “Hello World” server:

# ./http -i 127.0.0.1 -p 8000 -n 50 -c 10000
20100929 220002.920 P00002dce ------ 1 - connecting via ramp 10000 sockets to peer 127.0.0.1:8000
20100929 220002.964 P00002dce    999 1 - connected: 1000
20100929 220003.003 P00002dce   1999 1 - connected: 2000
20100929 220003.043 P00002dce   2999 1 - connected: 3000
20100929 220003.082 P00002dce   3999 1 - connected: 4000
20100929 220003.122 P00002dce   4999 1 - connected: 5000
20100929 220003.161 P00002dce   5999 1 - connected: 6000
20100929 220003.201 P00002dce   6999 1 - connected: 7000
20100929 220003.240 P00002dce   7999 1 - connected: 8000
20100929 220003.281 P00002dce   8999 1 - connected: 9000
20100929 220003.320 P00002dce   9999 1 - connected: 10000
20100929 220003.320 P00002dce ------ 1 - starting writes: 500000 (= 10000 sockets * 50 queries/socket) queries
20100929 220003.320 P00002dce ------ 1 - using query of 198 bytes:
20100929 220003.320 P00002dce ------ 1 - 080562c0 47 45 54 20 2f 31 32 33 34 35 36 37 38 39 2f 31 GET /123456789/1
20100929 220003.320 P00002dce ------ 1 - 080562d0 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 23456789/1234567
20100929 220003.320 P00002dce ------ 1 - 080562e0 38 39 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 89/123456789/123
20100929 220003.320 P00002dce ------ 1 - 080562f0 34 35 36 37 38 39 2f 31 32 33 34 35 36 37 38 39 456789/123456789
20100929 220003.320 P00002dce ------ 1 - 08056300 2f 31 32 33 34 35 36 37 38 39 2f 31 32 33 34 35 /123456789/12345
20100929 220003.320 P00002dce ------ 1 - 08056310 36 37 2e 68 74 6d 20 48 54 54 50 2f 31 2e 31 0d 67.htm HTTP/1.1.
20100929 220003.320 P00002dce ------ 1 - 08056320 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b 65 65 .Connection: Kee
20100929 220003.320 P00002dce ------ 1 - 08056330 70 2d 41 6c 69 76 65 0d 0a 48 6f 73 74 3a 20 31 p-Alive..Host: 1
20100929 220003.320 P00002dce ------ 1 - 08056340 32 37 2e 30 2e 30 2e 31 3a 38 30 30 30 0d 0a 55 27.0.0.1:8000..U
20100929 220003.320 P00002dce ------ 1 - 08056350 73 65 72 2d 41 67 65 6e 74 3a 20 53 58 45 2d 68 ser-Agent: SXE-h
20100929 220003.320 P00002dce ------ 1 - 08056360 74 74 70 2d 6c 6f 61 64 2d 6b 65 65 70 61 6c 69 ttp-load-keepali
20100929 220003.320 P00002dce ------ 1 - 08056370 76 65 2f 31 2e 30 0d 0a 41 63 63 65 70 74 3a 20 ve/1.0..Accept:
20100929 220003.320 P00002dce ------ 1 - 08056380 2a 2f 2a 0d 0a 0d                               */*...
20100929 220011.770 P00002dce   3165 1 - read all expected http responses
20100929 220011.770 P00002dce   3165 1 - time for all connections: 0.399857 seconds or 25008.937931 per second
20100929 220011.770 P00002dce   3165 1 - time for all queries    : 8.450056 seconds or 59171.206630 per second
20100929 220011.770 P00002dce   3165 1 - time for all            : 8.849913 seconds or 56497.731297 per second

Where the node.js implementation manages 107 connections per second, the SXE¬†implementation manages 25,009 connections per second; a 233.7 fold increase. I ignore this result because it’s so bad that it must be a bug in node.js. Further, where the node.js implementation manages 12,344 queries per second, the SXE implementation manages 59,171 queries per second; a 4.8 fold increase.

Like wise, I also monitor memory usage:

# top -b -d1 | egrep "(node|http)"
11715 root      17   0 17788  16m  380 S    0  0.4   0:00.01 httpd
11715 root      17   0 17788  16m  380 S    0  0.4   0:00.01 httpd
11726 root      18   0 18224  16m  524 R   84  0.4   0:00.84 http
11715 root      16   0 18180  16m  392 R   61  0.4   0:00.62 httpd
11715 root      17   0 18180  16m  392 R  100  0.4   0:01.63 httpd
11726 root      17   0 18224  16m  524 R   95  0.4   0:01.79 http
11715 root      18   0 18180  16m  392 R   99  0.4   0:02.63 httpd
11726 root      17   0 18224  16m  524 R   94  0.4   0:02.74 http
11715 root      20   0 18180  16m  392 R   99  0.4   0:03.63 httpd
11726 root      17   0 18224  16m  524 R   94  0.4   0:03.69 http
11715 root      23   0 18180  16m  392 R  100  0.4   0:04.64 httpd
11726 root      17   0 18224  16m  524 R   95  0.4   0:04.65 http
11715 root      25   0 18180  16m  392 R   99  0.4   0:05.64 httpd
11726 root      17   0 18224  16m  524 R   94  0.4   0:05.60 http
11715 root      25   0 18180  16m  392 R  101  0.4   0:06.65 httpd
11726 root      17   0 18224  16m  524 R   95  0.4   0:06.55 http
11715 root      25   0 18180  16m  392 R  100  0.4   0:07.65 httpd
11726 root      17   0 18224  16m  524 R   95  0.4   0:07.50 http
11715 root      25   0 18180  16m  396 R   99  0.4   0:08.65 httpd
11726 root      15   0     0    0    0 R   87  0.0   0:08.37 http
11715 root      25   0 18320  16m  396 S   15  0.4   0:08.80 httpd
11715 root      25   0 18320  16m  396 S    0  0.4   0:08.80 httpd
11715 root      25   0 18320  16m  396 S    0  0.4   0:08.80 httpd

While the node.js implementation peaks at 148MB memory usage, the SXE implementation stays at a constant 16MB memory usage which is 9.25 times smaller.

In conclusion, if you’re planning to build scalable network programs and memory is your bottleneck then implementing with node.js will cause you to¬†employ 9.25 times as many servers as if you had implemented with SXE. Similarly, if CPU is your bottleneck then implementing with node.js will cause you to employ 4.8 times as many servers as if you had implemented with SXE.

Update: I minimized the amount of work the SXE “Hello World” server does when looking at the HTTP header for each query. Previously, it inefficiently searched along for the “Connection: Keep-Alive” header and obeyed that, so I removed this code so that it always assumes the connection is keep alive. It also inefficiently searched along for the end of the HTTP headers; it now just looks at the end of the accumulated data read on the socket; so no searching. The updated source code looks like this:¬†

    //ignore headers if (! SXE_BUF_STRNSTR(this,"\r\n\r\n")) {
    //ignore headers     SXEL10I("Read partial header; waiting for remainder to be appended");
    //ignore headers     goto SXE_EARLY_OUT;
    //ignore headers }
    //detect end of HTTP headers without searching
    if ((SXE_BUF(this)[SXE_BUF_USED(this)-4] != 0xd)
    ||  (SXE_BUF(this)[SXE_BUF_USED(this)-3] != 0xa)
    ||  (SXE_BUF(this)[SXE_BUF_USED(this)-2] != 0xd)
    ||  (SXE_BUF(this)[SXE_BUF_USED(this)-1] != 0xa)) {
        SXEL60I("Read partial header; waiting for remainder to be appended");
        goto SXE_EARLY_OUT;
    }
    //ignore headers if (SXE_BUF_STRNCASESTR(this,"Connection: Keep-Alive")) {
    //ignore headers     (void)sxe_write(this, (void *)&canned_reply____keep_alive[0], sizeof(canned_reply____keep_alive) - 1);
    //ignore headers     SXEL60I("Connection: Keep-Alive: found");
    //ignore headers }
    //ignore headers else {
    //ignore headers     (void)sxe_write(this, (void *)&canned_reply_no_keep_alive[0], sizeof(canned_reply_no_keep_alive) - 1);
    //ignore headers     SXEL60I("Connection: Keep-Alive: not found; closing");
    //ignore headers     sxe_close(this);
    //ignore headers }
    //assume HTTP request is always keep-alive
    (void)sxe_write(this, (void *)&canned_reply____keep_alive[0], sizeof(canned_reply____keep_alive) - 1);

After these optimizations the SXE “Hello World” server performs 78.437 queries per second. So that’s now 6.4 times faster than node.js (instead of only 4.8 times faster without the optimizations) ūüôā

 

 
%d bloggers like this: