SimonHF's Blog

Just another WordPress.com site

G-WAN versus SXE “Hello World” April 26, 2012

Recently I’ve been very impressed reading about the performance figures for G-WAN:
http://gwan.ch/benchmark

G-WAN has quite the licensing model with the G-WAN binary being freeware and support costing very much money:
http://gwan.ch/buy

So I decided to do a simple libsxe versus G-WAN performance test like I did for libsxe versus NGINX and libsxe versus node.js. However, for this test I decided to use G-WAN’s very own multi-threaded load tool called weighttp:
http://redmine.lighttpd.net/projects/weighttp/wiki

I modified the simple libsxe HTTP server to make it take advantage of multiple CPUs.

These tests were run on a Ubuntu 11.04 instance running on a dual quad core i7 processor.

First the G-WAN figures:

I don’t know why G-WAN is talking about 16 cores upon starting because my i7 only has 8!

simon@ubuntu:~/gwan_linux64-bit$ sudo ./gwan

allowed Cores: 8 (‘sudo ./gwan’ to let G-WAN use your 16 Core(s))

loading
> ‘all.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)
> ‘hello.mm’: to use Objective-C++ (*.mm) scripts, install ‘gobjc++’ (sudo apt-get install gobjc++)
> ‘loan.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..
> ‘argv.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.m’: to use Objective-C (*.m) scripts, install ‘gobjc’ (sudo apt-get install gobjc)
> ‘report.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..

G-WAN 3.3.28 (pid:3110)

simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool

host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 61 sec, 501 millisec and 457 microsec, 162597 req/s, 59862 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3770000000 bytes total, 2770000000 bytes http, 1000000000 bytes data

Now the libsxe figures:

simon@ubuntu:~/sxe-httpd/sxe-httpd$ ./build-linux-64-release/sxe-httpd 127.0.0.1 8080 10000
20120426 211759.525 T 10198 —— 1 – sxe-httpd starting // detected cpus: 8
20120426 211759.525 T 10198 —— 1 – sxe-httpd parent forking 7 times
20120426 211759.525 T 10199 —— 1 – sxe-httpd child created
20120426 211759.525 T 10200 —— 1 – sxe-httpd child created
20120426 211759.525 T 10201 —— 1 – sxe-httpd child created
20120426 211759.526 T 10202 —— 1 – sxe-httpd child created
20120426 211759.526 T 10203 —— 1 – sxe-httpd child created
20120426 211759.526 T 10204 —— 1 – sxe-httpd child created
20120426 211759.526 T 10205 —— 1 – sxe-httpd child created

simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool

host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 34 sec, 79 millisec and 878 microsec, 293428 req/s, 108316 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3780000000 bytes total, 2780000000 bytes http, 1000000000 bytes data

Conclusion:

At 162597 versus 293428 requests per second, libsxe is significantly — or 1.8 times — faster than G-WAN for this simple performance test using 8 cores. Although G-WAN calls itself the fastest web server available — and admittedly is very fast — it obviously suffers internally from quite a bit of overhead even for such a trivial performance test such as this one. And with libsxe the CPU bottleneck is really the networking layer in the kernel… so what is G-WAN doing with all those spare CPU cycles? Looks like G-WAN might have room for optimization yet? Or maybe it’s partly due to libsxe’s fixed memory model which does away with the unnecessary and repetitive malloc() / free() cycle? I guess we’ll never know since G-WAN is closed source.

EDIT: Since running this test we have found two potential problems with G-WAN which mean that these figures are unreliable (see thread below): (a) G-WAN’s performance seems highly tuned to particular processors but it’s supplied as a single binary executable meaning that performance tests may vary wildly, and (b) G-WAN doesn’t scale linearly as the number of cores increase even with the simplest of performance tests.

 

Screencast: Building libsxe March 27, 2011

Filed under: Uncategorized — simonhf @ 8:09 pm
Tags: , , , , , ,

Screencast: Building libsxe
Screencast: Click to play HD full-screen

I thought I’d try something new. So here’s a screencast showing how to download, build, and test libsxe on 64 bit Ubuntu. I also explain a bit about the layout of the libsxe source files and the various sub-libraries. Building the release, debug, and coverage targets for libsxe from scratch — including running the well over 1,000 tests on each target and enforcing 100% code coverage — takes about 1 minute 20 seconds in total on my VMware installation of Ubuntu. Quite fast but could be faster. The build is currently executed using consecutive steps. It’s on the ‘to do’ list to parallelize the build and make use of multiple cores to speed things up even more. The tests run so fast already — even on a single core — because we do sneaky things like faking time and mocking system calls to easily reproduce the most difficult to reproduce error conditions.