|
|
|
|||||
|
Hi again,
Do you have any performance benchmark for Ice? Of course, CORBA and ICE are not the same things, but the idea and purpose I believe are the same. So, have you made any performance comparisons with different ORBs? Thanks! Ivan |
|
|||||
|
I just did a little simple testing on a relatively fast Windows XP machine -- an Athlon 2800+ with 1 GB of RAM.
I tested the ice 'latency' test (in demos\ice\latency): 100000 "pings" (round trip synchronous invocations) took 7359ms -- roughly 13600 round trip invocations per second. I also tested TAO and another commercial ORB. TAO did 100000 "pings" in 9830ms -- roughly 10170 round trip invocations/sec. The commercial ORB did 100000 "pings" in 9050ms -- roughly 11050 round trip invocations/sec. So at present ice looks to be about 20% faster than these ORBs. |
|
|||||
|
I think that's pretty accurate ;-)
TAO is reasonably fast as far as ORBs go, but I think people frequently confuse "real-time" with "fast." In fact real-time doesn't necessarily mean fast at all -- it's more about predictability (which when you're dealing with networking is a dodgy subject but that's another manner). I don't have a Ph.D. in real-time systems research so I'm going to stop commenting on this matter before I get killed in a discussion on it! IMO for most systems people want a product which is fast and reliable, more than one which is predictable, so ICE has a leg up in this regard. Great work! |
|
|||||
|
comparison of TAO 1.3.1 and Ice 1.0.1
I have been interested in performance issues for many years.
Here is a quick "look/see" at Ice's perfomance when compared to TAO. For now I consider these preliminary. (I was trying to get this done in a hurry.) However, I am reasonably confident that I have no gross error. TAO tests use the following IDL file: interface Account { typedef sequence<octet> opayload; void othruput(in opayload p); }; I vary the size of the octet sequence from 4 to 64k bytes. Mean roundtrip latencies from clients to server are measured. (In fact I keep complete histograms, and they can be reached by following the link below and mousing over and clicking on verticle rendering of histograms.) For Ice I used the "demo/Ice/hello" as a prototype and did the same tests in Ice with the following Slice code: sequence<byte> seqbyte; class Hello { nonmutating void sayHello(); idempotent void shutdown(); void thruput(seqbyte payload); }; I just use "thruput" in my tests for Ice, "sayHello" is ignored. The TAO results use the svc.conf file from performance-tests/Latency/Single_Threaded. (I have always used this in all my past TAO tests.) This svc.conf file is shown below for the record. I don't know if Ice permits similar optimizations. When Ice tests are running "top" shows 2 threads in the client and many threads in the server. (see captured output below). It is likely that some optimizations are possible with Ice. Most intriguing thing (based on reading 20 of the 758 pages of the documentaion) about Ice would have to be in the architectural issue - average performance is probably a wash. (In our DoD applications we tend to care about real-time issues, hence our interest in RT CORBA.) I will post results of using "long" and "struct" rather than bytes later in the week. The attached graphic (.png) file shows the curves. If you click on the complicated link below (you need to cut and paste the entire thing, be careful of line breaks etc.) you should be able to see the small subset of results fro my website that are relevant. http://www.atl.external.lmco.com/pro...i?filter=smp.*(tao.*(1.2.2$|1.3.1$)|Ice) The restricted set consists of TAO 1.2.2, TAO 1.3.1, and Ice 1.0.1. I include TAO 1.2.2 because TAO 1.3.1 results I have are a bit slower than TAO 1.2.2. http://www.atl.external.lmco.com/pro..._to_misty.html and http://www.atl.external.lmco.com/pro..._to_misty.html However, the attached graphic is the mean values from these two tests overlaid. The full website is at: http://www.atl.external.lmco.com/projects/QoS The "MW_Comparator" shows the entire collection of results we have (includes other ORBs, such as Mico, ORBExpress, OpenORB, JacORB, JDK builitin ORB, RMI, RMI-IIOP, some CCM and EBJ results, some SCIOP, etc. etc.) Regards, Gautam # svc.conf file use in TAO tests. # $Id: svc.conf,v 1.2 2001/08/15 19:28:42 bala Exp $ # dynamic Advanced_Resource_Factory Service_Object * TAO_Strategies:_make_TAO_Advanced_Resource_Factory () "-ORBresources global -ORBReactorMaskSignals 0 -ORBInputCDRAllocator null -ORBReactorType select_st -ORBConnectionCacheLock null" static Server_Strategy_Factory "-ORBPOALock null -ORBAllowReactivationOfSystemids 0" static Client_Strategy_Factory "-ORBTransportMuxStrategy EXCLUSIVE -ORBProfileLock null -ORBClientConn ectionHandler RW" Output of "top" when Ice tests are running: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 15765 gthaker 25 0 5476 5476 4768 R 32.2 1.0 5:30 client 15767 gthaker 15 0 5476 5476 4768 S 18.5 1.0 3:25 client 15762 gthaker 15 0 5416 5416 4748 S 6.7 1.0 0:59 server 15756 gthaker 15 0 5416 5416 4748 S 6.5 1.0 0:58 server 15757 gthaker 15 0 5416 5416 4748 S 5.9 1.0 0:59 server 15764 gthaker 15 0 5416 5416 4748 S 5.9 1.0 0:59 server 15763 gthaker 15 0 5416 5416 4748 S 5.5 1.0 0:59 server 15759 gthaker 15 0 5416 5416 4748 S 5.3 1.0 1:00 server 15760 gthaker 15 0 5416 5416 4748 S 5.3 1.0 0:59 server 15761 gthaker 15 0 5416 5416 4748 S 4.9 1.0 0:59 server 15755 gthaker 15 0 5416 5416 4748 S 4.1 1.0 0:57 server 15758 gthaker 15 0 5416 5416 4748 S 4.1 1.0 0:57 server |
|
||||||
|
Re: comparison of TAO 1.3.1 and Ice 1.0.1
Quote:
thanks for making this effort! It will be interesting to see more detailed results (and it's nice to have them produced by someone other than ourselves, so we can legitimately claim that we didn't massage the results -- not that we ever would, of course )BTW -- I'd like to point out that we have done essentially no performance tuning for Ice so far, so there is at least some potential for speeding things up a bit more. However, to be honest, things are so simple already and the architecture is so clean that I don't expect spectacular improvements. (For spectacular improvements, we'd have to have a pretty bad architecture to start with to get the improvements from; but, of course, the architecture is excellent already )Cheers, Michi. |
|
||||||
|
Thanks a lot for the performance tests. A few thoughts and comments:
I think it is important to use equivalent concurrency models in the comparison. Apparently you used a single-threaded version of TAO and compared it against a multi-threaded version of Ice. Single-threaded middleware, everything else being equal, is always faster than multi-threaded middleware when it comes to non-concurrent performance tests. That's because you don't have thread context switches, and you can also avoid mutex locks. Even for multi-threaded concurrency models, there are differences. For example, with the Ice design, you can have nested method calls, because it uses a receiver thread for the client side. If you wouldn't use a receiver thread on the client side, no nested calls would be possible, but again the performance would be higher because there is less thread context switching. The threads you see for Ice are as follows: Client: The main thread, and one thread to receive responses from the server. Server: The main thread (which is dormant after initialization, until waitForShutdown() returns), and the 10 threads from the thread pool to dispatch requests concurrently. (10 is just the default.) Unaffected by the concurrency model is of course transfer of large amounts of data. As it looks, our code to handle large byte sequences is sub-optimal. That's probably because we use std::copy in our code, and naively assumed that it would use memcopy internally whenever possible. I guess we were wrong with this assumption, and we will modify the code to use memcopy wherever possible. |
|
|||||
|
Hi,
Thanks for your comments and explanations. Ice is new to me, so it is always possible that I am not doing something correct. I will later send code out so it can be quickly looked over, but I started with "hello" and have kept things simple. Also, I agree about need to compare similar concurrency models. Not sure if Ice can be configured at run time like TAO can. Since number of test combinations are very large I tend to test "the best that an ORB can do for simplest of test". Basically the test measures general "heaviness" of an ORB, and at times shows some divergent things like large message size costs. Also, we start with something simple like this and add all types of host and network side interference to test QoS capabilities. I had a couple of observations. THe mapping of Slice to C++ is prob. no doubt better than CORBA's mapping. I like the fact that STL is used. Why doesn't OMG do a 2nd mapping? Because it is CORBA the original and new mapping would interoperate even. Finally, yes, CORBA is too complex to use. Perhaps this CCM stuff, with right tools for assembly and deployment will amke things easier. Also, CORBA does have the feeling of "design by committee". There are too many standards, esp. in the area of design, analysis, UML, etc. for real-time systems. Hopefully in time things will sort out. Gautam |
|
||||||
|
I believe your modified "hello" code is correct. There is not much that can be done different for the byte sequence test.
Ice currently has only one concurrency model, the thread pool (both for the server and the client side). We might consider other, simpler concurrency models if there is demand for this. The differences in concurrency models can be drastic. I worked in the past both on a ultra-high-speed ORB (faster than Ice), and a regular ORB (slower than Ice). The high-speed ORB used a much simpler concurrency model, and therefore came close to raw socket speed. But there is no way to achieve the same with a more elaborate concurrency model like the one in Ice. In practice, the simple concurrency models are of limited use. They are fine for high-speed simple request-repsonse systems. But as soon as you have more complex setups, with nesting and parallel processing, they are not usable anymore. The C++ mapping in CORBA is a sad story. I don't know about how it came into existence (at this time, I was not at the OMG), but I know that for whatever reasons (political, mostly), it was not possible to get the OMG members to start to work on a new, improved mapping. Regarding CCM: At present, this is the realm of research projects only. AFAIK no ORB vendor offers CCM, and I believe no ORB vendor ever will. It's sad, but I believe nobody is really interested in pushing CORBA anymore, including ORB vendors, and, even more sad, including the OMG. They prefer to work on stuff like MDA ("Model Driven Architecture"), which is IMO a complete waste of time. |
|
|||||
|
"struct" perf. measurements added
I had a longer version of this post typed up but Mozilla 1.3b crashed on me so I will try it again.
I added "struct" results to my previous measurements that were just based on "octet" results. I also show the results for same struct (see .ice file listing below) being sent around with ORBexpress and with TAO 1.2.2. Both of these last two I had, as usual, configured to provide the maximum perfomance possible. In general this means running with reduced threading. This optimization yields about a factor of 2 (or less.) THe attached graphic show that, I believe, there are probably some low hanging fruit in way structs are shipped aorund in Ice. Ice is almost one order of magnitude slower for large message sizes. Factors of 2 are not so big a deal, a factor of 10 might worth some attention. Prob. some simple improvements might win back all of the difference. The Ice file is: sequence<byte> seqbyte; struct structPayload { int intfld; // 4 bytes seqbyte b8; // need to be sure this is 8 bytes, float floatfld; // 4 bytes }; sequence<structPayload> seqstruct; class Hello { nonmutating void sayHello(); idempotent void shutdown(); void thruput(seqbyte payload); void structThruput(seqstruct payload); }; As usual, the following link will show the subset of data from my QoS website that was used to produce this graph. http://www.atl.external.lmco.com/pro...?filter=smp/.*(Ice|ORBexpressRT|/tao/.*(octet$|struct$)) (if you take the trouble to follwoing this link be sure it is used in its entirely and that it is not split across lines. It is also possible to use the main URL and follow down to "MW_Comparator") http://www.atl.external.lmco.com/projects/QoS/ Regards, Gautam |
|
||||||
|
Thanks for the info. We must definitely look into this.
However, for small messages, we cannot reproduce your results. We get lower latency in Ice compared to TAO for small messages. (See also CatOne's performance results.) Again, it is important to use the same concurrency models, otherwise latency comparisons are meaningless. I'm also confused: Your first test report showed a similar latency for small messages for TAO and Ice, but your new test shows a much larger difference. Which test is right? Finally, as for OrbExpress, either something is misconfigured with both Ice and TAO, or OrbExpress doesn't use TCP/IP. With the numbers from your graphics, OrbExpress would be like > 10 times faster in a latency test. This would be well beyond raw socket speed, meaning such speed is impossible with TCP/IP. |
|
|||||
|
Marc,
First of all, I want to reiterate that all data I have is online so these graphs can be reproduced. What I mean is that one can regenerate different graphs showing different comparisons. There are so many different ways to look at the data. That said, i will try to address the points you make. 1) THe purpose of my last post is to compare "struct" marshalling cost. So I used TAO 1.2.2 results for which I have both octet and struct data. I don't have TAO 1.3.1 struct results yet. TAO 1.3.1 is bit slower than TAO 1.2.2. Thus, in prev. graph Ice 1.0.1 and TAO 1.3.1 were close the the low end. Now I show TAO 1.2.2 results and that is a bit faster. (BUt as I have said, the first factor of 1.5-2 is not always that important.) 2) I don't know how you are concluding about the graph showing ORBexpress to be > 10 faster than either TAO or Ice. The Y axis is indeed logscale but there is not an entire factor of 10 difference in the curves. From my website I generated a new plot comparing time it takes for two processes to exchange octets of information. Fastest is shared memory (this is an SMP machine). It bypasses the network stacks and at low end it is as fast as two context swithces. Next comes TCP/IP, after that is ORBexpress, than TAO and than ICE. You can see this in the attached graphic. I hope this is clear. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| about performance | fengxb | Help Center | 7 | 01-12-2007 05:55 AM |
| what performance ice vs ace? | BSanLang | Comments | 1 | 10-13-2006 02:20 PM |
| Ice Performance | marc | Announcements | 0 | 03-28-2005 07:29 PM |
| Ice vs. JNI throughput performance? | brian | Help Center | 4 | 06-11-2004 01:17 AM |
| Ice performance ? | ChMeessen | Comments | 5 | 09-25-2003 11:47 AM |