SimonHF's Blog

Just another WordPress.com site

libsxe code metrics February 6, 2011

Filed under: Uncategorized — simonhf @ 7:08 pm

There aren’t too many quality assurance statistics available online about things like ratio of test lines of code to production lines of code, or ratio of tests to production lines of code. So I thought I’d publish a quick snapshot of such information for libsxe:

[libsxe]# find | grep -v /build | grep /lib-sxe | egrep “\.c” | xargs cat | wc -l
17712 <– Number of lines in .c files; production source code and test source code

[libsxe]# find | grep -v /build | grep /lib-sxe | egrep “\.c” | egrep “test-” | xargs cat | wc -l
7839 <– Number of lines in .c files; test source code only

[libsxe]# find | grep -v /build | grep /lib-sxe | egrep “\.c” | egrep -v “test-” | xargs egrep “SXE[ERLD]” | wc -l
680 <– Number of lines of permanent instrumentation; production source code only

[libsxe]# find | grep -v /build | grep /lib-sxe | egrep “\.c” | egrep “test-” | xargs grep “plan_tests” | perl -lane ‘($c) = $_ =~ m~plan_tests\s*\(\s*(\d+)~; $total+=$c; sub END { printf qq[%d <– Minimum tests run per platform before P4 submit\n], $total; }’
1291 <– Minimum automated tests run per platform to achieve 100% code coverage before source code check-in

Ratio of production lines to test lines: 9873 : 7839 or 1.3 : 1
Ratio of production lines to tests run: 9873 : 1291 or 7.6 : 1
Ratio of production lines to permanent instrumentation: 9873 : 680 or 14.5 to 1

Advertisements
 

Thought Experiment: Scaling Facebook January 30, 2011

I was recently asked the question:

“Let’s imagine it’s circa 2004 and we’ve just launched a website that is about to evolve into Facebook. We’re “pre-architecture” – we have a single instance of MySQL running on a linux machine out of a dorm room at Harvard. The site starts to take off. Status updates are hitting the database at ferocious pace. Now what???!!! If we don’t do something soon, the system will crash. What kinds of caching layers do we build, and how will they interact with the database backend? Do we transition to a colo, or keep everything in house? Let’s see if we can put our heads together and think through Facebook’s many scaling challenges, all the way from the ground up …”

Here’s my reply:

I love this thought experiment ūüôā It’s one of the things which I dream about when I go to bed at night and which my mind first thinks about when I wake up in the morning. No kidding. I also had to architect and lead develop a smaller cloud system for my employer which scales up to the tens of millions of Sophos users… it would scale further but we don’t have e.g. one billion users! But I dream about much more complex clouds which scale to a billion or more users… even though such systems don’t exist yet. I believe that to create such systems we have to go back to computer science fundamentals and be prepared to think outside the box if necessary. For example, I love the emerging NoSQL technology… but all of the systems are flawed in some way, and performance varies enormously but orders of magnitude. Few off the shelf technologies appear to come close to BerkeleyDB performance which is a disappointment. Back to scaling Facebook: One of the fundamental problems is striving to use the smallest number of servers while serving the largest population of users. The cost of the servers and the server traffic is going to play an enormous role in the business plan. Therefore I need to think in terms of creating servers or groups of servers which act as building blocks which can be duplicated in order to scale the cloud as large as required. Because we’ll be scaling up to using hundreds or thousands of servers, then it becomes cost effective to develop efficient technology which doesn’t exist yet. For example, the sexy new node.js technology is wonderful for prototyping cloud services or running clouds which don’t have to be scaled too big. However, node.js uses nearly ten times more memory than, say, C does (https://simonhf.wordpress.com/2010/10/01/node-js-versus-sxe-hello-world-complexity-speed-and-memory-usage/) for the same task… and this may mean having to use ten times more servers. So for a monster cloud service like Facebook then I’d forget about all the scripting language solutions like node.js and Ruby On Rails, etc. Instead I’d go for the most efficient solution which also happens to be the same challenge that my employer gave me; to achieve the most efficient run-time performance per server while allowing as many servers as necessary to work in parallel. This can be done efficiently by using the C language mixed with kernel asynchronous events. However, in order to make working in C fast and productive then some changes need to be made. The C code needs to work with a fixed memory architecture — almost unheard of in the computing world. This is really thinking out of the box. Without the overhead of constantly allocating memory and/or garbage collecting then the C code becomes faster than normally imaginable. The next thing is to make the development of the C code much faster… nearing the speed of development of scripting languages. Some of the things which make C development slow is the constant editing of header and makefiles. So I designed a system where this is largely done automatically. Next, C pointers cause confusion for programmers young and old so I removed the necessity to use pointers a lot in regular code. Another problem, is how to develop protocols, keep state, and debug massively parallel and asynchronous code while keeping the code as concise and readable as possible. Theses problem have also been solved. In short, Facebook helped to solve their scaling problem by developing HipHop technology which allowed them to program in PHP and compile the PHP to ‘highly optimized C++’. According to the blurb then compiled PHP runs 50% faster. So in theory this reduces the number of servers necessary also by 50%. My approach is from the other direction; make programming in C so comfortable that using a scripting language isn’t necessary. Also, use C instead of C++ because C++ (generally) relies on dynamic memory allocation which is also an unnecessary overhead at run-time. Languages which support dynamic memory allocation are great for general purpose programs which are generally not designed to use all the memory on a box. In contrast, in our cloud we will have clusters of servers running daemons which already have all the memory allocated at run-time that they will ever use. So there is no need for example to have any ‘swap’. If a box has, say, 16GB RAM then a particular daemon might be using 15.5GB of that RAM all the time. This technique also has some useful side-effects; we don’t have to worry about garbage collection or ever debug memory leaks because the process memory does not change at run-time. Also, DDOS attacks will not send the servers into some sort of unstable, memory swap nightmare. Instead, as soon as the DDOS passes then everything immediately returns to business as usual without instability problems etc. So being able to rapidly develop the Facebook business logic in C using a fixed memory environment is going to enable the use of less servers (because of faster code using less memory) and result in a commercial advantage. But there is still a problem: Where do we store all the data that all the Facebook users are generating? Any SQL solution is not going to scale cost effectively. The NoSQL offerings also have their own limitations. Amazon storage looks good but can I do it cheaper myself? Again, I create an own technology which is a hierarchical, distributed, redundant hash table. But unlike a big virtual hard disk, the HDRHT can store both very small and very large files without being wasteful and we can make it bigger as fast as we can hook up new drives and servers. The files would be compressed and chunked and stored via SHA1 (similar-ish to Venti although more efficient; http://doc.cat-v.org/plan_9/4th_edition/papers/venti/) in order to avoid duplication of data. I’d probably take one of these (http://www.engadget.com/2009/07/23/cambrionix-49-port-usb-hub-for-professionals-nerds/) per server and connect 49 2TB external drives to it, giving 49TB of of redundant data per server (although in deployment the data would redundant across different servers in different geographic locations). There would be 20 such servers to provide one Petabyte of redundant data, 200 for ten Petabytes, etc. Large parts of the system can be and should be in a Colo in order to keep the costs minimal. Other parts need to be in-house. The easy way to build robust & scalable network programs without comprising run-time performance or memory usage is called SXE (https://simonhf.wordpress.com/2010/10/09/what-is-sxe/) and is a work-in-progress recently open-sourced by my employer, Sophos. Much of what I’ve written about exists right now. The other stuff is right around the corner… ūüôā

 

How to compile 32 bit libsxe on a 64 bit host? December 29, 2010

Filed under: Uncategorized — simonhf @ 1:53 am

So believe it or not we don‚Äôt actually have a 64 bit build of libsxe yet. At work we currently only need a 32 bit build. I just got my Christmas present to myself setup ‚Äď a nice new Xeon-based PC running 64 bit Ubuntu among other things ‚Äď and tried to compile libsxe. Of course, it didn‚Äôt work. So here‚Äôs the makefile voodoo I did to make it work:


# echo get the latest libsxe from github
# mkdir 20101228-sxe
# cd 20101228-sxe/
# wget --output-document=sxe.tar.gz --no-check-certificate https://github.com/jimbelton/sxe/tarball/master
# tar -xvf sxe.tar.gz
# cd jimbelton-sxe-d3b82bb/libsxe/

# echo only if 64 bit host: install 32bit dev stuff
# apt-get install libc6-dev-i386

# echo only if 64 bit host: force makefile stuff to 32bit
# pushd ../mak
# cp mak-unix.mak mak-unix.mak.orig
# echo edit mak-unix.mak
# diff mak-unix.mak.orig mak-unix.mak
< CC = gcc
> CC = gcc -m32
< LINK = gcc
> LINK = gcc -m32
# popd

# echo only if 64 bit host: force third party makefile stuff to 32bit
# pushd lib-ev
# cp GNUmakefile GNUmakefile.orig
# edit GNUmakefile
# diff GNUmakefile.orig GNUmakefile
< @cd $(DST.dir)/$(THIRD_PARTY.dir) && ./configure
> @cd $(DST.dir)/$(THIRD_PARTY.dir) && env CFLAGS=-m32 LDFLAGS=-m32 ./configure --build=i686-unknown-linux-gnu --disable-ld64
# popd

# echo build release, debug, and coverage versions and run all the tests
# time make check
...
All pre-submit automated tests completed successfully!
Please have your source code changes reviewed before submit!

real 1m13.457s
user 0m40.390s
sys 0m16.190s
 

node.js versus Lua ‚ÄúHello World‚ÄĚ — Postmortem October 23, 2010

Filed under: Uncategorized — simonhf @ 4:54 pm
Tags: , , , , , , ,

Last time I performance tested an experimental version of the soon to be released SXELua¬†against node.js. What I found is that SXELua¬†is 2.9 times faster with the simple “Hello World” test. However, what I didn’t mention is why SXELua¬†is so much faster. When first running the test with a very early version of SXELua¬†then the results were about half as good as the node.js+net+crcr¬†results! The reason for this is the way that Lua¬†handles its strings. During the simple “Hello World” test then HTTP queries are read using the following code path: libev¬†C code calls SXE¬†C code which read()s the HTTP query string and then passes the string to the Lua¬†read event handler. During this last step then Lua¬†‘internalizes’ the string. This means that Lua¬†does a quick hash (alert: more expensive operation) over the string, allocates memory for it, and copies it… and later unfortunately garbage collects it too. All these operations take lots of CPU. The simple “Hello World” benchmark sends 500,000 HTTP query strings which are each 199 bytes. So you can imagine how even heavily optimized string internalization¬†code might get bogged down with such a large number of strings to internalize and eventually garbage collect. In-fact, take any small overhead and multiply it by 500,000 and suddenly it becomes a large overhead. So in the end Neil change SXELua¬†so that Lua’s¬†string handling code isn’t used and instead Lua¬†calls back out to SXE¬†C code to do all its string handling. This works very fast because C is very fast, and calling C from Lua¬†or vice-versa is very fast, and SXE¬†itself uses neither malloc() nor dynamic memory garbage collection techniques while processing queries. So this means that SXELua¬†is limited to only using an if… then… call subset of syntax of Lua? Yes! And if you think about it then this makes perfect sense. Why? Because it’s faster to do the generic ‘heavy lifting’ — for example, string — operations in C than it is in Lua… especially if this avoids internalization¬†and garbage collection. The part that I’d like to rapidly code in Lua is the lighter-weight, non-generic, business logic of the program.¬†In the end this works out really well and SXELua¬†is 2.9 times faster than the node.js simple “Hello World” test. This is probably mainly because node.js has three different types of overhead in this particular test: 1. JavaScript string manipulation code is slower than C string manipulation code and this difference gets multiplied 500,000 times. 2. JavaScript creates 500,000 strings. 3. JavaScript garbage collects 500,000 strings. So it’s easy to see why both the SXELua¬†and SXE simple “Hello World” tests are 2.9 times and 3.4 times faster than the node.js equivalent.

 

node.js versus Lua ‚ÄúHello World‚ÄĚ October 13, 2010

Neil Watkiss¬†— known among other things for many¬†cool Perl modules¬†—¬†has created a non-optimized, experimental version of SXE (pronounced ‚Äėsexy‚Äô) containing embedded Lua¬†called SXELua. So I thought it would be fun to redo the familiar ‚Äď to readers of this blog ‚Äď ‚ÄėHello World‚Äô benchmark using SXELua. And here is the Lua source code:

do
    local connect = function (sxe) end
    local read = function (sxe, content)
        if content:match("\r\n\r\n", -4) then sxe_write(sxe,"HTTP/1.0 200 OK\r\nConnection: Close\r\nContent-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n")
        end
    end
    local close = function (sxe) end
    sxe_register(10001, function () sxe_listen(sxe_new_tcp("127.0.0.1", 8000, connect, read, close)) end)
end

Compare this with the slightly longer node.js equivalent from the last blog:

var net = require('net');
var server = net.createServer(function (stream) {
  stream.on('connect', function () {});
  stream.on('data', function (data) {
    var l = data.length;
    if (l >= 4 && data[l - 4] == 0xd && data [l - 3] == 0xa && data[l - 2] == 0xd && data[l - 1] == 0xa) {
      stream.write('HTTP/1.0 200 OK\r\nConnection: Keep-Alive\r\nContent-Type: text/html\r\nContent-Length: 13\r\n\r\nHello World\r\n');
    }
  });
  stream.on('end', function () {stream.end();});
});
server.listen(8000, 'localhost');

And now the updated results:

‚ÄúHello World‚Ä̬†¬†¬† Queries/ % Speed
Server           Second   of SXE
---------------- -------- -------
node.js+http     12,344    16%
Node.js+net+crcr 23,224    30% <-- *1
Node.js+net      28,867    37%
SXELua           66,731    85%
SXE              78,437   100%

In conclusion, calling Lua¬†functions from C and vice-versa is very fast… close ¬†to the speed of C itself. I am very excited by how well Lua¬†performed in the benchmark. The Lua¬†“Hello World” program performed 3.6 times better than the node.js equivalent.¬†After a quick Google it looks like this isn’t the first time that JavaScript V8¬†has gone up against Lua; these results¬†suggest that SXELua¬†could get even faster after optimization. It looks like Lua¬†will become part of SXE¬†soon. Lua¬†seems ideal for creating tests for SXE¬†& SXELua programs alike, and prototyping programs. Stay tuned‚Ķ!

*1 Update: Somebody who knows JavaScript better than me offered faster code to detect the “\n\r\n\r”. I updated the script above and the resulting queries per second and % speed of SXE.

 

What is SXE? October 9, 2010

Filed under: Uncategorized — simonhf @ 10:05 pm

Since I’ve blogged a few times about its performance, several people have asked: What is SXE?

What is SXE? In a nutshell, SXE is an easy way to build robust & scalable network programs without comprising run-time performance or memory usage.

How does SXE achieve this?

1. By providing a software development environment for the fastest language, C, where rapid development (not traditionally associated with the C language) can occur.

How does the environment enable rapid development?

a. Traditionally tedious to create and maintain header files and makefiles are generated automatically meaning that the developer need only concentrate on creating .c files. Creating a new library is as easy as creating a new folder with a new .c file.

b. Release, debug, and code coverage builds happen automatically. SXE itself is developed using the same mechanism and is protected by 99.9% code coverage. This means SXE is rock solid code and programs developed using the environment will also be rock solid. This means bugs are always minimal and the latest github version is always a release candidate.

c. The necessity of using a traditional debugger to slowly single step through code has been eliminated by the use of extensive code instrumentation techniques which also double as an efficient code comprehension mechanism.

d. Although not forbidden the traditional use of pointers in C is strongly discouraged — for reasons of reducing complexity — and alternative mechanisms are provided to achieve similar results using simple, human-friendly, and easy to debug integers.

2. By providing a generic layer of network handling code on top of the cross-platform and asynchronous libev. Want to create an HTTP- / SMTP- / DNS- / whatever- server? The common network handling code has already been created for you. All you have to do is write some simple handlers for network connect, read, and close events.

3. By providing powerful and generic data structures to maintain state and chronological information even in a multi-threaded environment. It’s a bit like having an ultra powerful NoSQL solution right inside your program. For example, you have one million peers connected and want to know who has been in the read state the longest? The generic data structure will tell you. Want to know which is the second longest? The data structure will tell you. Want to multiplex between the one million peers and another limited resource? Just use two data structures and request the oldest member in state ‘waiting for resource’ when any member of the other data structure becomes state ‘resource free’. Memory leaks (a traditional problem with C programs) are completely eliminated by using fixed memory structures which effectively mirror the maximum capacity of the server hardware they are running on. The effects of DDOS attacks become a thing of the past because programs are engineered to use a fixed amount of memory and therefore even in an attack the server will never ‘go into SWAP’ or become unstable; as soon as the attack ends then regular functionality resume immediately.

In this way it’s possible to rapidly create robust & scalable network programs with surprisingly few lines of code and without compromising run-time performance or memory usage.

 

node.js versus SXE ‚ÄúHello World‚ÄĚ; node.js performance revisited October 8, 2010

Filed under: Uncategorized — simonhf @ 4:42 am

About a week ago I tested a simple node.js ‚ÄúHello World‚ÄĚ server against one written using SXE. Naively, I copy-and-pasted the example server from the node.js documentation. Since then I have discovered that it‚Äôs possible to get faster queries per second speeds out of node.js. So I decided to test the queries per second of different variations of the node.js ‚ÄúHello World‚ÄĚ server. Here is the original version that I tested earlier:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8000, "127.0.0.1");

Here is a version which is very nearly the same functionality as the SXE ‚ÄúHello World‚ÄĚ server that doesn‚Äôt search the headers. After taking the advice of Ryan Dahl, I have opted to use the faster node.js ‚Äėnet‚Äô package ¬†instead of the slower node.js ‚Äėhttp‚Äô package which apparently performs unwanted ‚Äď for the purposes of the simple ‚ÄúHello World‚ÄĚ test ‚Äď processing on the HTTP headers. ‚ÄėVery nearly the same functionality‚Äô means that ‚Äď unlike SXE ‚Äď the data read from consecutives packets on the same socket are not automatically accumulated, e.g. because the HTTP request is bigger than the MTU. I‚Äôm not sure the most efficient way to do this in JavaScript so I just left it out for this test where all the packets are smaller than the MTU anyway.

var net = require('net');
var server = net.createServer(function (stream) {
 stream.setEncoding('utf8');
 stream.on('connect', function () {});
 stream.on('data', function (data) {
   if(data.match("\r\n\r\n$")){
     stream.write('HTTP/1.0 200 OK\r\n" "Connection: Keep-Alive\r\n" "Content-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n');
   }
 });
 stream.on('end', function () {stream.end();});
});
server.listen(8000, 'localhost');

And here is hopefully  the simplest and fastest node.js variant that doesn’t check for the end of the HTTP headers so it’s difficult to imagine it doing anything useful with HTTP but it gives us an idea of the fastest possible speed that node.js can handle socket communications:

var net = require('net');
var server = net.createServer(function (stream) {
 stream.on('connect', function () {});
 stream.on('data', function (data) {
     stream.write('HTTP/1.0 200 OK\r\n" "Connection: Keep-Alive\r\n" "Content-Type: text/html\r\nContent-Length: 14\r\n\r\nHello World\n\r\n');
 });
 stream.on('end', function () {stream.end();});
});
server.listen(8000, 'localhost');

And now the results:

‚ÄúHello World‚Ä̬†¬†¬† Queries/ % Speed
Server           Second   of SXE
---------------- -------- -------
node.js+http     12,344    16%
Node.js+net+crcr 18,448    24%
Node.js+net      28,867    37%
SXE              78,437   100%

In conclusion, if you want to get the very best performance out of node.js then you might consider using the node.js ‚Äėnet‚Äô package instead of the ‚Äėhttp‚Äô package. Doing so will let your your node.js application approach about a quarter of the speed of the C-language-based SXE equivalent.

 

 
%d bloggers like this: