SimonHF's Blog

Just another WordPress.com site

G-WAN versus SXE “Hello World” April 26, 2012

Recently I’ve been very impressed reading about the performance figures for G-WAN:
http://gwan.ch/benchmark

G-WAN has quite the licensing model with the G-WAN binary being freeware and support costing very much money:
http://gwan.ch/buy

So I decided to do a simple libsxe versus G-WAN performance test like I did for libsxe versus NGINX and libsxe versus node.js. However, for this test I decided to use G-WAN’s very own multi-threaded load tool called weighttp:
http://redmine.lighttpd.net/projects/weighttp/wiki

I modified the simple libsxe HTTP server to make it take advantage of multiple CPUs.

These tests were run on a Ubuntu 11.04 instance running on a dual quad core i7 processor.

First the G-WAN figures:

I don’t know why G-WAN is talking about 16 cores upon starting because my i7 only has 8!

simon@ubuntu:~/gwan_linux64-bit$ sudo ./gwan

allowed Cores: 8 (‘sudo ./gwan’ to let G-WAN use your 16 Core(s))

loading
> ‘all.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)
> ‘hello.mm’: to use Objective-C++ (*.mm) scripts, install ‘gobjc++’ (sudo apt-get install gobjc++)
> ‘loan.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..
> ‘argv.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac).
> ‘hello.m’: to use Objective-C (*.m) scripts, install ‘gobjc’ (sudo apt-get install gobjc)
> ‘report.java’: to use Java (*.java) scripts, install ‘javac’ (sudo apt-get install javac)..

G-WAN 3.3.28 (pid:3110)

simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool

host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 61 sec, 501 millisec and 457 microsec, 162597 req/s, 59862 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3770000000 bytes total, 2770000000 bytes http, 1000000000 bytes data

Now the libsxe figures:

simon@ubuntu:~/sxe-httpd/sxe-httpd$ ./build-linux-64-release/sxe-httpd 127.0.0.1 8080 10000
20120426 211759.525 T 10198 —— 1 – sxe-httpd starting // detected cpus: 8
20120426 211759.525 T 10198 —— 1 – sxe-httpd parent forking 7 times
20120426 211759.525 T 10199 —— 1 – sxe-httpd child created
20120426 211759.525 T 10200 —— 1 – sxe-httpd child created
20120426 211759.525 T 10201 —— 1 – sxe-httpd child created
20120426 211759.526 T 10202 —— 1 – sxe-httpd child created
20120426 211759.526 T 10203 —— 1 – sxe-httpd child created
20120426 211759.526 T 10204 —— 1 – sxe-httpd child created
20120426 211759.526 T 10205 —— 1 – sxe-httpd child created

simon@ubuntu:~/weighttp$ ./build/default/weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”
weighttp – a lightweight and simple webserver benchmarking tool

host: ‘127.0.0.1’, port: 8080
starting benchmark…
spawning thread #1: 250 concurrent requests, 2500000 total requests
spawning thread #2: 250 concurrent requests, 2500000 total requests
spawning thread #3: 250 concurrent requests, 2500000 total requests
spawning thread #4: 250 concurrent requests, 2500000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 34 sec, 79 millisec and 878 microsec, 293428 req/s, 108316 kbyte/s
requests: 10000000 total, 10000000 started, 10000000 done, 10000000 succeeded, 0 failed, 0 errored
status codes: 10000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3780000000 bytes total, 2780000000 bytes http, 1000000000 bytes data

Conclusion:

At 162597 versus 293428 requests per second, libsxe is significantly — or 1.8 times — faster than G-WAN for this simple performance test using 8 cores. Although G-WAN calls itself the fastest web server available — and admittedly is very fast — it obviously suffers internally from quite a bit of overhead even for such a trivial performance test such as this one. And with libsxe the CPU bottleneck is really the networking layer in the kernel… so what is G-WAN doing with all those spare CPU cycles? Looks like G-WAN might have room for optimization yet? Or maybe it’s partly due to libsxe’s fixed memory model which does away with the unnecessary and repetitive malloc() / free() cycle? I guess we’ll never know since G-WAN is closed source.

EDIT: Since running this test we have found two potential problems with G-WAN which mean that these figures are unreliable (see thread below): (a) G-WAN’s performance seems highly tuned to particular processors but it’s supplied as a single binary executable meaning that performance tests may vary wildly, and (b) G-WAN doesn’t scale linearly as the number of cores increase even with the simplest of performance tests.

Advertisements
 

16 Responses to “G-WAN versus SXE “Hello World””

  1. s00th Says:

    It says 16 cores because of hyperthreading (2 threads/core).

  2. lucassiba Says:

    Simon, I think all these “verses” tests of yours are pretty much worthless. Your comparing (in most cases) real http web servers to what is essentially a bare-bones “tcp server”. Node, Nginx and (I’m assuming) G-Wan are all parsing the http headers to make them available for call-backs and data processing (because every realistic web server required scenario needs to read at least some header information). You’ve built up an unrealistic test where your code knows ahead of time not to bother with any of the extra http data and your “challenging” other servers knowing full well they are “real web servers” and will parse all available data, (because that’s what web servers are supposed to do).

    In the last year of working with high performance C-based http web servers, I found that about 10-15% of the CPU’s time is spent cracking SSL, and about 20-50% (deepening on the application) is spent parsing the (not so parsing friendly) http packet. So I’m not surprised that skipping as much of the http parsing as you can give you a big performance win.

    As far as I’m concerned any “verses” test looking at native web servers must include at least some header and body data parsing, otherwise it’s just not realistic and of no value in my opinion.

    • simonhf Says:

      Lucas, you make a very good point and you’re in good company too as Ryan Dahl also made the same point… which is why I re-ran the test against node in TCP only mode 🙂 Plus I also pitted Lua against node rather than straight C. See, so I do listen and I am fair and it’s not all apples to oranges comparisons as you are suggesting! I feel you are being a little unfair towards me because in the case of G-WAN then I haven’t built the test itself as you suggested… I used the same test program that G-WAN itself uses. And it’s an interesting test and the libsxe-based code causes that performance test to pass… so it seems realistic to me. It is true that the libsxe-based code that I performance tested doesn’t parse the http headers in the same way as the other products. But I put it to you: Is it really necessary to pass all the headers? Obviously not in order to pass performance tests like ab and the G-WAN multi-threaded ab variant. Let’s take the case of node and headers. For each tiny HTTP request then node parses each header and stores it at great CPU and memory expense in a hash… and whether the header or headers are needed or not. Seems like a waste of time at least for the case of the off-the-shelf performance tests. All that header hash memory from so many parallel requests actually causes node to run out of memory and effectively limit its own ability to scale! And in what other cases would parsing headers be a waste of time? I would say that in the web today then apart from video then the next biggest traffic gobbler for HTTP requests is AJAX… and all other traffic, e.g. for regular JavaScript and html and graphics, fall a long way behind. Just try leaving your gmail or facebook page open and idling and monitor the constant and on-going AJAX requests. So if we can agree that AJAX requests will make up the majority of HTTP requests (assuming we’re not using the web server to be the next youtube) then please let me know how parsing HTTP headers are going to be useful for these requests. Usually there is no caching with AJAX requests. Time stamps are also not useful. We can assume keep-alive is on all the time… incidentally the libsxe-based code actually *does* scan for the keep-alive header and is a complete waste of time 🙂 But that’s way quicker than doing what node does and painstakingly grooming each header into a memory hungry hash structure. And if you think about it a bit more then maybe the ab performance tests of this world are realistic after all when considering most of the HTTP traffic is AJAX traffic, and headers aren’t really that useful for processing AJAX traffic. Plus, for AJAX then we can assume that you are responsible for the Javascript or flash running on the clients generating the AJAX requests. So in theory you have even more advantages and opportunities to groom your HTTP AJAX request for easy parsing on the server end. But how is this going to be possible with a web server on the other end which insists on jumping through hoops parsing this, that and the other… even when it’s not strictly necessary? Lucas, you mention SSL but I haven’t mentioned SSL in my blog so I’ll just ignore that. You also mention 20-50% CPU time for parsing. Obviously, in my post then G-WAN is nearly twice as slow and if the extra CPU usage is down to parsing then maybe the G-WAN parser is closer to the 50% figure that you mention? But again… G-WAN didn’t need to do all that parsing to full-fill the performance test. So why not do parsing on demand or something out-of-the-box like that? Anyway, soon when the whole world is using SPDY then we won’t have to have this exchange 🙂 In the meantime, let’s assume that to convince you then you take my G-WAN performance figures, build your own HTTP header parser code into the libsxe-based code, and because you’ve spent a whole year perfecting header parsing then your header parsing code that you build in only takes the lower end of your given range or 20% CPU time… et voila G-WAN still is slower. Now try building your header parsing code into G-WAN. Ahh, sorry I forget it’s closed source 🙂 And lastly, my tip to you is my friend: If you really, really want to do high performance HTTP web servers then I suggest figuring out ways to avoid the headers and not ways to parse the headers faster 🙂

  3. lucassiba Says:

    The TCP only mode test was also a complete waste of time in my opinion. Realistically what are you testing at that point? What product can call the system epoll() and read/write() functions the quickest? What are you proving? I think it’s common knowledge that a C-based program that’s just calling system calls over and over again will out perform any higher level programming language. Your not show how great libsxe is, your showing that C is still faster then javascript.

    You claim that your being “apples to apples” by comparing Lua vs Node, but again the two sets of code are doing completely different things. Your Lua code knows ahead of time not to bother to really parse any of the http (it just looks for a hard-coded string), while the Node code assumes it has to act like a real web server. So what are you proving? Lua code doing almost no parsing is faster then a javascript parsing an entire http packet. Wow, you just blew my mind with surprise on that one…

    You asked: “Is it really necessary to parse all the headers?” My answer is “absolutely”. I agree that a large portion of today’s web traffic is AJAX based, and that most of the headers are probably not necessary, but I will NOT agree that you can completely ignore all headers on the server side from an AJAX request. I unfortunately cannot show you an example of hundreds of big web server deployments where they are utilizing the http headers in AJAX requests (I just don’t have access to that data), but if your going to claim that they are not using the headers, please supply more reasoning/evidence for it. To further this, if your trying to sell this libsxe example as an AJAX request handler (doing minimal actual http parsing) then why are you comparing it to real web servers? Find something (if such a thing exists) that’s designed to just ignore http headers and pump out AJAX responses as fast as you can. (and when you don’t find one, I believe it’s because people/companies/products need real web servers, that do real http parsing).

    I do agree that a “parsing on demand” architecture would be a performance improvement and I also agree (if what you say about node.js memory model is true) then there are certainly plenty of ways to improve performance. But if node.js is such a “dog in performance” why is it taking off so much? Why are yahoo, cloud9 and linkedin all claiming it’s the best thing since sliced-bread? (see nodejs.org for more corporate praise)

    Could it be that in the real world, node is just fast enough? That the 2-3X speed improvement your promoting in libsxe is being dwarfed by what the actual application code behind the server is doing anyway? Could it be that the time-to market, the pretty good performance and the ease of programming in javascript is a bigger more important “win” then “absolute performance” to most people.

    If you want to impress me, build a fully functional web server out of libsxe, one that “parses on demand” and provides the basic features of a real web server (and has a clean, high level easy to use interface). Then show me how much faster it is then all the others… We’ll see how those numbers turn out…

    Absolute performance is meaningless without functionality.

    • simonhf Says:

      Thanks for writing again, Lucas. Getting feedback on a blog is the best thing that can happen! I’ll address your comments one by one:

      TCP only mode test / Lua vs node / apples to apples: Lucas, please examine the node code in this test (https://simonhf.wordpress.com/2010/10/13/node-js-versus-lua-hello-world/) carefully and notice that “Node.js+net” is testing node without what you refer to as a ‘real web server’. Later you inaccurately refer to a “2-3X speed improvement”. In the apples to apples test then “The Lua ”Hello World” program performed 3.6 times better than the node.js equivalent” so faster than the speed improvement that you refer to. However, let’s not forget that node is single threaded so the apples to apples test was also single threaded. So in reality the Lua “Hello World” program would perform very much faster on a multi-core box than the Node.js+net program ever could with its single thread.

      “common knowledge that a C-based program … will out perform any higher level programming language”: Not necessarily. It’s all to do with algorithm. Lucas, I think this is where you are not thinking out-of-the-box enough and making assumptions which aren’t true. In previous tests then NGINX managed 24,810 requests per second but the SXELua test program managed 66,731 requests per second. Hmmm… looks like Lua is a higher level programming language than C and is over twice as fast as a C program. Food for thought. So why is that? I think it’s a combination of several factors. Firstly, C is fast but C developers can be guilty of sloppy code, too much code, fat building up, using the wrong algorithm, doing too much too soon, using too much memory, doing too much memory allocation and de-allocation, etc. So using C is no guarantee of out performing a high level language. Secondly, most high level languages have C or C++ under the covers. That means when the OS notifies the high level language of a network event then it first notifies e.g. C code which is part of the language implementation. That C code may end up calling a function written in the high level language. How C calls high level language functions is a big potential bottleneck and is implemented differently from language to language. Languages like Javascript have a huge overhead when being called from C compared to the overhead of calling Lua from C. That’s the main reason why libsxe has Lua built in… because it’s very economic to call Lua functions from C and vice-versa. It’s an algorithm thing. Thirdly, memory management has a huge effect on code speed. Modern processors are blazingly fast when everything is in the cache and ready to go but much slower when things aren’t in the cache and ready to go. Obviously a smaller memory footprint means that the necessary bytes will be in the CPU cache more often. This means the code runs faster. Javascript uses comparatively lots of memory even to do the simplest stuff… this means it working at a cache disadvantage from the get go. Lua has the same problem which is why libsxe does the memory management for Lua so that in fact Lua is only used to make business logic decisions. Otherwise Lua would perform very poorly. Similarly, there is no default way in C for developers to handle memory management. Therefore, each C implementation is going to be a little bit different. While one C implementation might use a similar amount of memory to another, one implementation might allocate and de-allocate memory much more often causing performance problems and/or memory fragmentation. FYI libsxe suffers from neither of these problems since no memory is allocated dynamically during the test. This is why I think you are generalizing too much when talking about testing “What product can call the system epoll() and read/write() functions the quickest?”. However, this is still an interesting question in itself and maybe the answer is to not use epoll! See below about moving part of libsxe into the kernel.

      “I agree … most of the headers are probably not necessary … I unfortunately cannot show you an example … where they are utilizing the http headers in AJAX requests”: Lucas, I think you’ve proved my point here very nicely that headers often aren’t necessary 🙂 If you, with your one year experience of writing high performance HTTP header parsing code, can’t think of why the headers would be useful for AJAX requests then I rest my case 🙂 The other thing is that nobody is saying here that this HTTP server or that HTTP server is good for doing everything. Usually, you pick the right tool for the job. It’s not unusual for websites and on-line services to use a variety of different web servers and/or technologies for different tasks, for example, NGINX is very well known for delivering static content very quickly and it’s not uncommon to e.g. speed up an Apache-based web-site by getting NGINX to deliver the static content, etc.

      “node.js is such a dog in performance”: I disagree. Performance is a relative thing. Compared to Apache then node.js is a race horse. You have to pick the right tool for the right job. Why is node successful? Why is Java so successful? Probably the main reason is that there are so many Java programmers available. If you’re the boss then why not hire Java programmers who are two a penny rather than advertise for Erlang programmers and get no responses for your advert?! It’s the same principle with node. Lots of developers already know Javascript so node is a great choice and the performance isn’t bad for many tasks. I agree with all the praise and positive statements you have to say about node. But that’s not to say that it’s the ultimate development tool. I don’t think we’re going to see any kernels being written in Javascript soon. And I don’t think we’re going to see any ~ 1 billion user, Facebook-like cloud applications being written in Javascript any time soon. But there’s plenty of stuff you can do in between very effectively with node and Javascript. Having said all that, it’s all about the business model. AFAIK Facebook write their servers in a sub-set of PHP and then use a special tool to compile that PHP into C code which runs on their servers. So if you did try to re-write Facebook using node then what would happen. If you read my previous posts then what would happen is that you’d probably run out of memory per box before running out of performance. But with the right architecture then node will scale but maybe you’d need — let’s say — 10 times more boxes to run node Facebook on than the real Facebook. That’s a lot of extra boxes and expense. How are you going to compete with Facebook with the extra cost of 10 times as many servers? So what it comes down to in the end is not just the technology selection but the business model too. Let’s look at Salesforce. Now there’s a successful company virtually printing money. They also do everything in the cloud. What sort of developers do they hire? Only Java developers. Could they be using an order of magnitude less servers if using different technology? Yes. Can they afford to use way more servers and still make lots and lots of money? Yes. It all comes down to the business model…

      “build a fully functional web server out of libsxe”: libsxe is not a web server but rather a well thought out library enabling the rapid coding of high performance network applications, e.g. a web server. I get requests all the time to build this that and the other network applications. Alas, there is only one of me 🙂 The intention of this blog is to educate readers about what’s possible and what’s not possible. The performance metrics set the stage for what is possible at best with current technology. But what happens is down to the imagination of the developer using libsxe in order to develop their particular application/functionality. In this respect then libsxe is similar to node. No matter what you develop with node then you know it’s never going to run faster than the metrics presented in this blog. It will only ever run slower as more functionality is coded in. It’s the same with libsxe. So should libsxe always be used instead of node? No. When should libsxe be considered instead of something like node or G-WAN or NGINX? When you want to save a lot of money by reducing the number of servers, e.g. if you know that you’ll need 100 servers with libsxe and 1000 servers with node then maybe it’s more cost effective to hire expensive C programmers than cheaper Javascript programmers than buy/rent the extra 900 servers? So where can libsxe go from here? The answer is probably into the kernel which is the current bottleneck. If libsxe interfaced directly with the NIC drivers and bypassed the kernel network stack then another order of magnitude performance increase could well be possible. That’s what I should be working on… not writing HTTP servers in libsxe 🙂 So I’m dreaming of 10 libsxe servers instead of 1000 node servers 🙂

  4. Gart Says:

    For you, G-WAN processed 162,597 req/s on an i7 (4 Cores)

    For Nicolas Bonvin, G-WAN processed 143,000 req/s on an i2 (2 Cores)

    For me, G-WAN processes 140,000 req/s PER CPU CORE (on a 6-Core).

    Something is obviously wrong with your test.

    • simonhf Says:

      Hi Gart, Thanks for the message. What do you think is wrong with the test? Too many req/s or too few? I literally just downloaded G-WAN and run it. I was trying to find a way to tell G-WAN how many cores to use but I couldn’t find a way after some simple searches. It would be nice to do some tests increasing the number of cores with each run to see how G-WAN scales. Assuming that it’s not possible to specify the number of cores for G-WAN to use then maybe a solution is to run G-WAN in a VM and specify the number of cores of the VM to use.

      • Gart Says:

        Hi Simon, To use a given count of CPU Cores use:

        sudo ./gwan -w 4

        That’s documented in the command line help: “./gwan -h”
        (this will work much better than a VM).

        On my machine (a Xeon wih 6 Cores) I get:

        “sudo ./gwan -2” gives 450,000 req/sec.
        “sudo ./gwan -4” gives 630,000 req/sec.
        “sudo ./gwan -6” gives 810,000 req/sec.

        Don’t forget to change the weighttp command line accordingly (“-t 2”, “-t 4”). You should never use more workers than you have physical CPU Cores.

        BTW, weighttp was made by Lighttpd’s team, not by G-WAN’s team (you may want to correct that in your article).

      • simonhf Says:

        Thanks for the tip about the number of threads. Not sure how I over-looked that! When using the -w option then G-WAN doesn’t tell me that it’s actually using a different number of threads when it starts up… so I’m just assuming that -w is working. So here are the results on my i7:

        “weighttp -n 10000000 -c 1000 -t 1 -k “http://127.0.0.1:8080/100.html”″ “sudo ./gwan -w 1″ gives 110,232 req/sec.
        “weighttp -n 10000000 -c 1000 -t 2 -k “http://127.0.0.1:8080/100.html”″ “sudo ./gwan -w 2″ gives 173,047 req/sec.
        “weighttp -n 10000000 -c 1000 -t 3 -k “http://127.0.0.1:8080/100.html”″ “sudo ./gwan -w 3″ gives 165,863 req/sec.
        “weighttp -n 10000000 -c 1000 -t 4 -k “http://127.0.0.1:8080/100.html”″ “sudo ./gwan -w 4″ gives 170,386 req/sec.

        I am wondering about either G-WAN or weighttp. Normally the performance tests worked but one time weighttp displayed ‘progress: 100% done’ but just hanged thereafter not showing any performance info. This only happened once.

        I must agree with you that I don’t think G-WAN is performing at it’s best. Why? When I look at total CPU activity of all cores during the tests then at any point there is never more than 30% *total* CPU in use including both G-WAN and weighttp. So something isn’t working properly. However, when I run the same weighttp with my libsxe-based daemon then I get 90+% CPU usage. And when you look at the G-WAN figures above then the numbers aren’t scaling properly as the numbers of threads increase. This suggests to me that delivering G-WAN as a pre-compiled binary might be the cause of the problem. Why? Maybe G-WAN compilation is so optimized that it works very well on certain types of CPUs but not others? Which would mean that the G-WAN author should publish details about the CPU that it’s compiled and tested for.

        I’m also a bit confused about your Xeon processor test results with different numbers of threads. Your figures show that G-WAN does not scale linearly as the number of threads increase. The req/sec/thread counts are 225k, 157k, & 135k for 2, 4, & 6 threads. This suggests that there must be some sort of locking going on causing contention between the threads. But where would locking be necessary for such a simple performance test? The figures also suggest that the dream presented on the G-WAN website is not a reality; that G-WAN will scale to even more cores 😦

        So we have two potential problems with G-WAN: (a) G-WAN’s performance seems highly tuned to particular processors but it’s supplied as a single binary executable, and (b) G-WAN doesn’t scale linearly as the number of cores increase even with the simplest of performance tests.

  5. Gart Says:

    “G-WAN doesn’t tell me that it’s actually using a different number of threads”

    It tells you in the /gwan/logs/gwan.log file.

    Your results are so small as compared to what everybody else is getting that this discussion becomes technically irrelevant:

    Nicolas Bonvin on an i3 CPU (2 Cores) matched your results on an i7 (4 Cores).

    According to the G-WAN forum, the program was compiled without a target CPU (no specific instruction set) so this is not the problem.

    Stating that this server does not scale (on the basis of a clearly broken test that contradicts all other test) is not explaining anything.

    Maybe you could publish the gwan.log files generated with “-w 2” and “-w 4” to let people spot the cause of the problem?

    • simonhf Says:

      Thanks for the hint about the gwan.log file. That’s peculiar then that G-WAN seemingly arbitrarily outputs some messages to stdout/stderr and some to the log file. I had a look in the log file and there are no error or warning messages.

      FYI I don’t have access to the G-WAN forum. I think it’s only for paying customers, or?

      To clarify: I didn’t say that the server doesn’t scale. And I didn’t comment on the scaling based on *my* test results. My point is that G-WAN doesn’t scale *linearly*. That means that the less threads you give it then the more requests/second/thread that are possible. Or put another way, the more threads you give G-WAN then the less requests/second/thread that are possible. And this is based upon *your* results and not mine! 🙂

      • Gart Says:

        Simon,

        You wrote:

        “My point is that G-WAN doesn’t scale linearly” … “And this is based upon *your* results”.

        This chart might be more explicit for you to visualize the numbers:

        Linearly means at a constant rate (the definition of the equation of a line). And this is what this chart is showing.

        You wrote:

        “I don’t have access to the G-WAN forum. I think it’s only for paying customers, or?”

        I had no trouble to access this forum, Google can access it too.

        I was suggesting that to post your gwan.log files, not for you to attempt interpret them, but for others to read them in order to help you find what was done wrong.

      • simonhf Says:

        Hi Gart, Thanks for the message and the link to the graph. Looks like you have close to exactly the same Xeon box as the author who created the graph! I find the graph linear but strange. Why isn’t the line at 45 degrees? Why isn’t 4 CPU cores double the speed of 2 CPU cores? So I guess you’re right that it is increasing linearly but what I expressed poorly is that I expected a 45 degree linear graph! Otherwise it would be way better to have 3x 2 core boxes (== 3x 450k req/sec) than 1x 6 core box (== 1x 810k req/sec), or? I actually have a 12 core Xeon box… I’ll try to find the time to re-run the test on that box in order to find out if CPU utilization is any better off the i7.

        Regarding the G-WAN forum: It doesn’t seem possible to automatically make an account to access the forum. And when you look at the messages without an account then the last message is from November 2011. It’s as if the forum was open until November 2011 and then was made closed (only to be accessed by G-WAN customers?) for all messages after that point, or?

        Regarding the G-WAN log file: There’s only a few different line types in there. Not much to interpret I’m afraid. Mostly about the number of cores and modules loaded. Almost the same stuff output to stdout/stderr but a bit more wordy.

  6. Gart Says:

    Hi Simon,

    You wrote:

    “you have the same Xeon box as the author who created the graph!”

    Unlike the author of the graph (who documented it), I did not use 10 rounds, hence the difference.

    But Mac Pro units do not venture in that much fantaisy. I guess Apple’s success does not come only from developer sales… so C users are most probably in good company with less technical users.

    “Why isn’t the graph line at 45 degrees?”

    If you shorten the horizontal axis (or add more Cores on the axis) then you will get a 45 degree slope. The graph slope is a cosmetic variable which is not tied to the actual axis numbers.

    “I actually have a 12 core Xeon box”

    It sounds like a better place to test SMP servers. Just make sure that you are always counting physical Cores, not logical Cores (provided by hyper-threading).

    Regarding the forum, new posts have been limited after some found the need to create dozens of new accounts every day to massively publish junk posts. Sadly for all the others, a few are willing to spoil the game.

    About publishing the log files: that’s the only way for you to show that things were done correctly. If you don’t publish them then the doubt will persist that a mistake prevented the server from running normally. This is why configuration files are published in all serious server reviews.

  7. HighPerfLover Says:

    Hi Gart, you’re fading Simons question by directing to gwans benchmark. Simon analyze your strange number on reqs/sec/thread based on your own number, that always decreasing very steep. If you cannot defends your numbers and just pass to gwan official site, better you’re not release yours number.

    Btw, what simons experience is also happens on mine, sometime weighttp stop responding very long time when benchmarking gwan, yes gwan is fast but libsxe gives alternative for FOSS movements, also gwan can benefit the FOSS by analyzing nginx or libsxe source code … cheers.

    I agree with simons, from now and then, web server will only deal with static resources, the application’s thing (ajax / websocket) is absolutely waste of our money to parse the header, we control the client. Libsxe + lua is rock !


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s