Recently I’ve been very impressed reading about the performance figures for G-WAN:
http://gwan.ch/benchmark
G-WAN has quite the licensing model with the G-WAN binary being freeware and support costing very much money:
http://gwan.ch/buy
So I decided to do a simple libsxe versus G-WAN performance test like I did for libsxe versus NGINX and libsxe versus node.js. However, for this test I decided to use G-WAN’s very own multi-threaded load tool called weighttp:
http://redmine.lighttpd.net/projects/weighttp/wiki
I modified the simple libsxe HTTP server to make it take advantage of multiple CPUs.
These tests were run on a Ubuntu 11.04 instance running on a dual quad core i7 processor.
First the G-WAN figures:
I don’t know why G-WAN is talking about 16 cores upon starting because my i7 only has 8!
simon@ubuntu:~/gwan_linux64-bit$ sudo ./gwan
allowed Cores: 8 (‘sudo ./gwan’ to let G-WAN use your 16 Core(s))
loading
> ‘all.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)
> ‘hello.mm’: to use Objective-C++ (*.mm) scripts, install ‘gobjc++’ (sudo apt-get install gobjc++)
> ‘loan.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..
> ‘argv.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.m’: to use Objective-C (*.m) scripts, install ‘gobjc’ (sudo apt-get install gobjc)
> ‘report.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..
G-WAN 3.3.28 (pid:3110)
simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool
host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done
finished in 61 sec, 501 millisec and 457 microsec, 162597 req/s, 59862 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3770000000 bytes total, 2770000000 bytes http, 1000000000 bytes data
Now the libsxe figures:
simon@ubuntu:~/sxe-httpd/sxe-httpd$ ./build-linux-64-release/sxe-httpd 127.0.0.1 8080 10000
20120426 211759.525 T 10198 —— 1 – sxe-httpd starting // detected cpus: 8
20120426 211759.525 T 10198 —— 1 – sxe-httpd parent forking 7 times
20120426 211759.525 T 10199 —— 1 – sxe-httpd child created
20120426 211759.525 T 10200 —— 1 – sxe-httpd child created
20120426 211759.525 T 10201 —— 1 – sxe-httpd child created
20120426 211759.526 T 10202 —— 1 – sxe-httpd child created
20120426 211759.526 T 10203 —— 1 – sxe-httpd child created
20120426 211759.526 T 10204 —— 1 – sxe-httpd child created
20120426 211759.526 T 10205 —— 1 – sxe-httpd child created
simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool
host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done
finished in 34 sec, 79 millisec and 878 microsec, 293428 req/s, 108316 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3780000000 bytes total, 2780000000 bytes http, 1000000000 bytes data
Conclusion:
At 162597 versus 293428 requests per second, libsxe is significantly — or 1.8 times — faster than G-WAN for this simple performance test using 8 cores. Although G-WAN calls itself the fastest web server available — and admittedly is very fast — it obviously suffers internally from quite a bit of overhead even for such a trivial performance test such as this one. And with libsxe the CPU bottleneck is really the networking layer in the kernel… so what is G-WAN doing with all those spare CPU cycles? Looks like G-WAN might have room for optimization yet? Or maybe it’s partly due to libsxe’s fixed memory model which does away with the unnecessary and repetitive malloc() / free() cycle? I guess we’ll never know since G-WAN is closed source.
EDIT: Since running this test we have found two potential problems with G-WAN which mean that these figures are unreliable (see thread below): (a) G-WAN’s performance seems highly tuned to particular processors but it’s supplied as a single binary executable meaning that performance tests may vary wildly, and (b) G-WAN doesn’t scale linearly as the number of cores increase even with the simplest of performance tests.