Go Back   ZeroC Forums > Help Center

Reply
 
LinkBack Thread Tools Rating: Thread Rating: 9 votes, 5.00 average. Display Modes
  #1 (permalink)  
Old 02-20-2003
Ivan Ivan is offline
Registered User
 
 
Join Date: Feb 2003
Location: Helsinki, Finland
Posts: 15
Question Performance

Hi again,

Do you have any performance benchmark for Ice?

Of course, CORBA and ICE are not the same things, but the idea and purpose I believe are the same. So, have you made any performance comparisons with different ORBs?

Thanks!
Ivan
Reply With Quote
  #2 (permalink)  
Old 02-20-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
For the simple latency tests in demo/Ice/latency, I get the following for Ice for C++:

- Pentium 4, 1.8 GHz, RH8.0, optimized: 0.15ms per twoway call.

- Pentium 4, 2.4 GHz, Windows XP, optimized: 0.12ms per twoway call.

Ice for Java has roughly twice the latency of Ice for C++.

As for comparison with CORBA ORBs, I don't think that there are any multi-purpose ORBs out there that can match this performance. Only specialized high-speed ORBs are faster, but these then usually have a simpler threading model, and of course much fewer features.

While I don't have any actual measurements for this, I believe that Ice is a lot faster than any CORBA ORB when it comes to request forwarding services, such as routers or event services. That's because Ice can forward requests as blobs, and does not have to unmarshal and remarshal Anys as in CORBA.
Reply With Quote
  #3 (permalink)  
Old 03-04-2003
CatOne CatOne is offline
Registered User
 
Name: Bill Lloyd
Organization: --
Project: --
 
Join Date: Feb 2003
Location: California
Posts: 18
I just did a little simple testing on a relatively fast Windows XP machine -- an Athlon 2800+ with 1 GB of RAM.

I tested the ice 'latency' test (in demos\ice\latency):

100000 "pings" (round trip synchronous invocations) took 7359ms -- roughly 13600 round trip invocations per second.

I also tested TAO and another commercial ORB.

TAO did 100000 "pings" in 9830ms -- roughly 10170 round trip invocations/sec.

The commercial ORB did 100000 "pings" in 9050ms -- roughly 11050 round trip invocations/sec.

So at present ice looks to be about 20% faster than these ORBs.
Reply With Quote
  #4 (permalink)  
Old 03-04-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Thanks for the info!

So TAO might be real-time, but doesn't seem to be real-fast
Reply With Quote
  #5 (permalink)  
Old 03-04-2003
CatOne CatOne is offline
Registered User
 
Name: Bill Lloyd
Organization: --
Project: --
 
Join Date: Feb 2003
Location: California
Posts: 18
I think that's pretty accurate ;-)

TAO is reasonably fast as far as ORBs go, but I think people frequently confuse "real-time" with "fast." In fact real-time doesn't necessarily mean fast at all -- it's more about predictability (which when you're dealing with networking is a dodgy subject but that's another manner). I don't have a Ph.D. in real-time systems research so I'm going to stop commenting on this matter before I get killed in a discussion on it!

IMO for most systems people want a product which is fast and reliable, more than one which is predictable, so ICE has a leg up in this regard. Great work!
Reply With Quote
  #6 (permalink)  
Old 03-05-2003
gthaker gthaker is offline
Registered User
 
Name: Gautam Thaker
Organization: Lockheed Martin Advanced Technology Labs
Project: Distributed, Real-time Systems
 
Join Date: Mar 2003
Location: New Jersey, United States
Posts: 11
comparison of TAO 1.3.1 and Ice 1.0.1

I have been interested in performance issues for many years.

Here is a quick "look/see" at Ice's perfomance when compared to TAO. For now I consider these preliminary. (I was trying to get this done in a hurry.) However, I am reasonably confident that I have no gross error.

TAO tests use the following IDL file:

interface Account {
typedef sequence<octet> opayload;
void othruput(in opayload p);
};

I vary the size of the octet sequence from 4 to 64k bytes. Mean roundtrip latencies from clients to server are measured. (In fact I keep complete histograms, and they can be reached by following the link below and mousing over and clicking on verticle rendering of histograms.)

For Ice I used the "demo/Ice/hello" as a prototype and did the same tests in Ice with the following Slice code:

sequence<byte> seqbyte;

class Hello
{
nonmutating void sayHello();
idempotent void shutdown();
void thruput(seqbyte payload);
};

I just use "thruput" in my tests for Ice, "sayHello" is ignored.

The TAO results use the svc.conf file from performance-tests/Latency/Single_Threaded. (I have always used this in all my past TAO tests.) This svc.conf file is shown below for the record. I don't know if Ice permits similar optimizations. When Ice tests are running "top" shows 2 threads in the client and many threads in the server. (see captured output below). It is likely that some optimizations are possible with Ice. Most intriguing thing (based on reading 20 of the 758 pages of the documentaion) about Ice would have to be in the architectural issue - average performance is probably a wash. (In our DoD applications we tend to care about real-time issues, hence our interest in RT CORBA.)

I will post results of using "long" and "struct" rather than bytes later in the week.

The attached graphic (.png) file shows the curves. If you click on the complicated link below (you need to cut and paste the entire thing, be careful of line breaks etc.) you should be able to see the small subset of results fro my website that are relevant.

http://www.atl.external.lmco.com/pro...i?filter=smp.*(tao.*(1.2.2$|1.3.1$)|Ice)

The restricted set consists of TAO 1.2.2, TAO 1.3.1, and Ice 1.0.1. I include TAO 1.2.2 because TAO 1.3.1 results I have are a bit slower than TAO 1.2.2.

http://www.atl.external.lmco.com/pro..._to_misty.html

and
http://www.atl.external.lmco.com/pro..._to_misty.html

However, the attached graphic is the mean values from these two tests overlaid.

The full website is at:

http://www.atl.external.lmco.com/projects/QoS

The "MW_Comparator" shows the entire collection of results we have (includes other ORBs, such as Mico, ORBExpress, OpenORB, JacORB, JDK builitin ORB, RMI, RMI-IIOP, some CCM and EBJ results, some SCIOP, etc. etc.)

Regards,

Gautam

# svc.conf file use in TAO tests.

# $Id: svc.conf,v 1.2 2001/08/15 19:28:42 bala Exp $
#
dynamic Advanced_Resource_Factory Service_Object * TAO_Strategies:_make_TAO_Advanced_Resource_Factory
() "-ORBresources global -ORBReactorMaskSignals 0 -ORBInputCDRAllocator null -ORBReactorType select_st
-ORBConnectionCacheLock null"
static Server_Strategy_Factory "-ORBPOALock null -ORBAllowReactivationOfSystemids 0"
static Client_Strategy_Factory "-ORBTransportMuxStrategy EXCLUSIVE -ORBProfileLock null -ORBClientConn
ectionHandler RW"

Output of "top" when Ice tests are running:

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
15765 gthaker 25 0 5476 5476 4768 R 32.2 1.0 5:30 client
15767 gthaker 15 0 5476 5476 4768 S 18.5 1.0 3:25 client
15762 gthaker 15 0 5416 5416 4748 S 6.7 1.0 0:59 server
15756 gthaker 15 0 5416 5416 4748 S 6.5 1.0 0:58 server
15757 gthaker 15 0 5416 5416 4748 S 5.9 1.0 0:59 server
15764 gthaker 15 0 5416 5416 4748 S 5.9 1.0 0:59 server
15763 gthaker 15 0 5416 5416 4748 S 5.5 1.0 0:59 server
15759 gthaker 15 0 5416 5416 4748 S 5.3 1.0 1:00 server
15760 gthaker 15 0 5416 5416 4748 S 5.3 1.0 0:59 server
15761 gthaker 15 0 5416 5416 4748 S 4.9 1.0 0:59 server
15755 gthaker 15 0 5416 5416 4748 S 4.1 1.0 0:57 server
15758 gthaker 15 0 5416 5416 4748 S 4.1 1.0 0:57 server
Attached Thumbnails
performance-gplot_886.png  
Reply With Quote
  #7 (permalink)  
Old 03-06-2003
michi's Avatar
michi michi is offline
ZeroC Staff
 
Name: Michi Henning
Organization: ZeroC
Project: Ice
 
Join Date: Feb 2003
Location: Brisbane, Australia
Posts: 889
Re: comparison of TAO 1.3.1 and Ice 1.0.1

Quote:
Originally posted by gthaker
I have been interested in performance issues for many years.

Here is a quick "look/see" at Ice's perfomance when compared to TAO. For now I consider these preliminary. (I was trying to get this done in a hurry.) However, I am reasonably confident that I have no gross error.
Hi Gautam,

thanks for making this effort! It will be interesting to see more detailed results (and it's nice to have them produced by someone other than ourselves, so we can legitimately claim that we didn't massage the results -- not that we ever would, of course )

BTW -- I'd like to point out that we have done essentially no performance tuning for Ice so far, so there is at least some potential for speeding things up a bit more. However, to be honest, things are so simple already and the architecture is so clean that I don't expect spectacular improvements. (For spectacular improvements, we'd have to have a pretty bad architecture to start with to get the improvements from; but, of course, the architecture is excellent already )

Cheers,

Michi.
Reply With Quote
  #8 (permalink)  
Old 03-06-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Thanks a lot for the performance tests. A few thoughts and comments:

I think it is important to use equivalent concurrency models in the comparison. Apparently you used a single-threaded version of TAO and compared it against a multi-threaded version of Ice.

Single-threaded middleware, everything else being equal, is always faster than multi-threaded middleware when it comes to non-concurrent performance tests. That's because you don't have thread context switches, and you can also avoid mutex locks.

Even for multi-threaded concurrency models, there are differences. For example, with the Ice design, you can have nested method calls, because it uses a receiver thread for the client side. If you wouldn't use a receiver thread on the client side, no nested calls would be possible, but again the performance would be higher because there is less thread context switching.

The threads you see for Ice are as follows:

Client: The main thread, and one thread to receive responses from the server.

Server: The main thread (which is dormant after initialization, until waitForShutdown() returns), and the 10 threads from the thread pool to dispatch requests concurrently. (10 is just the default.)

Unaffected by the concurrency model is of course transfer of large amounts of data. As it looks, our code to handle large byte sequences is sub-optimal. That's probably because we use std::copy in our code, and naively assumed that it would use memcopy internally whenever possible. I guess we were wrong with this assumption, and we will modify the code to use memcopy wherever possible.
Reply With Quote
  #9 (permalink)  
Old 03-06-2003
gthaker gthaker is offline
Registered User
 
Name: Gautam Thaker
Organization: Lockheed Martin Advanced Technology Labs
Project: Distributed, Real-time Systems
 
Join Date: Mar 2003
Location: New Jersey, United States
Posts: 11
Hi,

Thanks for your comments and explanations. Ice is new to me, so it is always possible that I am not doing something correct. I will later send code out so it can be quickly looked over, but I started with "hello" and have kept things simple. Also, I agree about need to compare similar concurrency models. Not sure if Ice can be configured at run time like TAO can.

Since number of test combinations are very large I tend to test "the best that an ORB can do for simplest of test". Basically the test measures general "heaviness" of an ORB, and at times shows some divergent things like large message size costs. Also, we start with something simple like this and add all types of host and network side interference to test QoS capabilities.

I had a couple of observations. THe mapping of Slice to C++ is prob. no doubt better than CORBA's mapping. I like the fact that STL is used. Why doesn't OMG do a 2nd mapping? Because it is CORBA the original and new mapping would interoperate even.

Finally, yes, CORBA is too complex to use. Perhaps this CCM stuff, with right tools for assembly and deployment will amke things easier. Also, CORBA does have the feeling of "design by committee". There are too many standards, esp. in the area of design, analysis, UML, etc. for real-time systems. Hopefully in time things will sort out.

Gautam
Reply With Quote
  #10 (permalink)  
Old 03-06-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
I believe your modified "hello" code is correct. There is not much that can be done different for the byte sequence test.

Ice currently has only one concurrency model, the thread pool (both for the server and the client side). We might consider other, simpler concurrency models if there is demand for this.

The differences in concurrency models can be drastic. I worked in the past both on a ultra-high-speed ORB (faster than Ice), and a regular ORB (slower than Ice). The high-speed ORB used a much simpler concurrency model, and therefore came close to raw socket speed. But there is no way to achieve the same with a more elaborate concurrency model like the one in Ice.

In practice, the simple concurrency models are of limited use. They are fine for high-speed simple request-repsonse systems. But as soon as you have more complex setups, with nesting and parallel processing, they are not usable anymore.

The C++ mapping in CORBA is a sad story. I don't know about how it came into existence (at this time, I was not at the OMG), but I know that for whatever reasons (political, mostly), it was not possible to get the OMG members to start to work on a new, improved mapping.

Regarding CCM: At present, this is the realm of research projects only. AFAIK no ORB vendor offers CCM, and I believe no ORB vendor ever will. It's sad, but I believe nobody is really interested in pushing CORBA anymore, including ORB vendors, and, even more sad, including the OMG. They prefer to work on stuff like MDA ("Model Driven Architecture"), which is IMO a complete waste of time.
Reply With Quote
  #11 (permalink)  
Old 03-14-2003
gthaker gthaker is offline
Registered User
 
Name: Gautam Thaker
Organization: Lockheed Martin Advanced Technology Labs
Project: Distributed, Real-time Systems
 
Join Date: Mar 2003
Location: New Jersey, United States
Posts: 11
"struct" perf. measurements added

I had a longer version of this post typed up but Mozilla 1.3b crashed on me so I will try it again.

I added "struct" results to my previous measurements that were just based on "octet" results. I also show the results for same struct (see .ice file listing below) being sent around with ORBexpress and with TAO 1.2.2. Both of these last two I had, as usual, configured to provide the maximum perfomance possible. In general this means running with reduced threading. This optimization yields about a factor of 2 (or less.)

THe attached graphic show that, I believe, there are probably some low hanging fruit in way structs are shipped aorund in Ice. Ice is almost one order of magnitude slower for large message sizes. Factors of 2 are not so big a deal, a factor of 10 might worth some attention. Prob. some simple improvements might win back all of the difference.

The Ice file is:

sequence<byte> seqbyte;

struct structPayload {
int intfld; // 4 bytes
seqbyte b8; // need to be sure this is 8 bytes,
float floatfld; // 4 bytes
};

sequence<structPayload> seqstruct;

class Hello
{
nonmutating void sayHello();
idempotent void shutdown();
void thruput(seqbyte payload);
void structThruput(seqstruct payload);
};


As usual,

the following link will show the subset of data from my QoS website that was used to produce this graph.

http://www.atl.external.lmco.com/pro...?filter=smp/.*(Ice|ORBexpressRT|/tao/.*(octet$|struct$))

(if you take the trouble to follwoing this link be sure it is used in its entirely and that it is not split across lines. It is also possible to use the main URL and follow down to "MW_Comparator")

http://www.atl.external.lmco.com/projects/QoS/

Regards,

Gautam
Attached Thumbnails
performance-gplot_1105.png  
Reply With Quote
  #12 (permalink)  
Old 03-14-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Thanks for the performance tests.

Just to make sure, you compiled Ice with optimization, right?
Reply With Quote
  #13 (permalink)  
Old 03-14-2003
gthaker gthaker is offline
Registered User
 
Name: Gautam Thaker
Organization: Lockheed Martin Advanced Technology Labs
Project: Distributed, Real-time Systems
 
Join Date: Mar 2003
Location: New Jersey, United States
Posts: 11
Yes, in my file Make.rules I have:

OPTIMIZE = yes

Gautam
Reply With Quote
  #14 (permalink)  
Old 03-14-2003
marc's Avatar
marc marc is offline
ZeroC Staff
 
Name: Marc Laukien
Organization: ZeroC, Inc.
Project: The Internet Communications Engine
 
Join Date: Feb 2003
Location: Florida
Posts: 1,772
Thanks for the info. We must definitely look into this.

However, for small messages, we cannot reproduce your results. We get lower latency in Ice compared to TAO for small messages. (See also CatOne's performance results.) Again, it is important to use the same concurrency models, otherwise latency comparisons are meaningless.

I'm also confused: Your first test report showed a similar latency for small messages for TAO and Ice, but your new test shows a much larger difference. Which test is right?

Finally, as for OrbExpress, either something is misconfigured with both Ice and TAO, or OrbExpress doesn't use TCP/IP. With the numbers from your graphics, OrbExpress would be like > 10 times faster in a latency test. This would be well beyond raw socket speed, meaning such speed is impossible with TCP/IP.
Reply With Quote
  #15 (permalink)  
Old 03-14-2003
gthaker gthaker is offline
Registered User
 
Name: Gautam Thaker
Organization: Lockheed Martin Advanced Technology Labs
Project: Distributed, Real-time Systems
 
Join Date: Mar 2003
Location: New Jersey, United States
Posts: 11
Marc,

First of all, I want to reiterate that all data I have is online so these graphs can be reproduced. What I mean is that one can regenerate different graphs showing different comparisons. There are so many different ways to look at the data. That said, i will try to address the points you make.

1) THe purpose of my last post is to compare "struct" marshalling cost. So I used TAO 1.2.2 results for which I have both octet and struct data. I don't have TAO 1.3.1 struct results yet. TAO 1.3.1 is bit slower than TAO 1.2.2. Thus, in prev. graph Ice 1.0.1 and TAO 1.3.1 were close the the low end. Now I show TAO 1.2.2 results and that is a bit faster. (BUt as I have said, the first factor of 1.5-2 is not always that important.)

2) I don't know how you are concluding about the graph showing ORBexpress to be > 10 faster than either TAO or Ice. The Y axis is indeed logscale but there is not an entire factor of 10 difference in the curves.

From my website I generated a new plot comparing time it takes for two processes to exchange octets of information. Fastest is shared memory (this is an SMP machine). It bypasses the network stacks and at low end it is as fast as two context swithces. Next comes TCP/IP, after that is ORBexpress, than TAO and than ICE. You can see this in the attached graphic.

I hope this is clear.
Attached Thumbnails
performance-gplot_1122.png  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
about performance fengxb Help Center 7 01-12-2007 05:55 AM
what performance ice vs ace? BSanLang Comments 1 10-13-2006 02:20 PM
Ice Performance marc Announcements 0 03-28-2005 07:29 PM
Ice vs. JNI throughput performance? brian Help Center 4 06-11-2004 01:17 AM
Ice performance ? ChMeessen Comments 5 09-25-2003 11:47 AM


All times are GMT -4. The time now is 11:04 AM.