Results 1 to 7 of 7

Thread: ICE 3.4.0 / 3.4.1 bug (segmentation fault)

  1. #1
    PrzemekD is offline Registered User
    Name: Przemek Dlugosz
    Organization: INVENTWARE
    Project: Mainstream Project
    Join Date
    May 2010
    Posts
    7

    ICE 3.4.0 / 3.4.1 bug (segmentation fault)

    Hello Ice Team,

    I have probably found a bug which exists in ICE 3.4.0 and 3.4.1 (other version not tested). Under some circumstances (not so rare!) ICE does something very wrong and the program exists with SEGFAULT under Linux system. It took me almost a week to find out that my code is not responsible for SEGFAULT, I have created simple program based on your example which does segfaults too. Here are some details:
    - program is based on demo/minimal example
    - program uses AMI for client and AMD for server side
    - for AMD calls I used CallQueue (http://www.zeroc.com/newsletter/issue13/qt2.zip) which in my humble opinion should also suit well for
    server side asynchronous calls
    - client in main loop does:
    + create ice communicator
    + create Hello proxy with low timeout (200ms)
    + send few thousands of AMI sayHello calls and sleeps alternately
    + invoke
    communicator->shutdown();
    communicator->waitForShutdown();
    communicator->destroy();

    Client's main loop is repeated about 50 times. After a few iterations client does segfault on my machine. Here are some other observations:
    - for greater value of timeout segfault is less probable
    - for synchronous server calls (AMD) segfault is unlikely
    - debugger shows that the error happens in ConnectionI.cpp line: 591 (copy(p, p + sizeof(Int), os->b.begin() + headerSize), I checked that os->b.begin() points to NULL buffer; till today I thought that maybe some part of my code does some writes in memory where ICE objects are located, but I managed to isolate the problem

    I put the code in the attachement. I assume that segfault might be caused by some race condition or some path of code which is executed under rare conditions so it might be not easy to reproduce the error immediately on your side.

    Regardles from this please check whether the code it is valid and should not cause segfaults.

    Best regards
    Przemek
    Attached Files Attached Files
    Last edited by PrzemekD; 08-02-2010 at 05:23 AM.

  2. #2
    bernard's Avatar
    bernard is offline ZeroC Staff
    Name: Bernard Normier
    Organization: ZeroC, Inc.
    Project: Ice
    Join Date
    Feb 2003
    Location
    Palm Beach Gardens, FL
    Posts
    1,294
    Hi Przemek,

    I built your test with Ice 3.4.1 and Visual Studio 2008 and it ran fine - no crash. I just had to replace sleep(1) by Sleep(1000) and add the CallQueue.cpp/obj to the build.

    Maybe this is easier to reproduce on Linux. It would be helpful if you could review and improve your test case:

    - the client (which is supposed to crash)

    Do you really need to create and destroy all these communicators? Is thhis related or contributing to this crash? If not, you should create a single communicator, like most applications.

    You should also remove all the cookie-related code, as it doesn't seem relevant.

    Also, do you run this client with any configuration? Your test case didn't provide any configuration. By default, the client thread pool has just 1 thread, so there is no much concurrency.

    - the server

    The client is totally unaware of the implementation of the server. The client has no idea if the server is using synchronous or asynchronous dispatch, or is written in C++ or Java. This implementation may only matter in terms of timing (when empty responses are sent back to the client).

    And if/when you get a crash, please capture and post the stack trace!

    Best regards,
    Bernard
    Bernard Normier
    ZeroC, Inc.

  3. #3
    PrzemekD is offline Registered User
    Name: Przemek Dlugosz
    Organization: INVENTWARE
    Project: Mainstream Project
    Join Date
    May 2010
    Posts
    7
    Hi Bernard,

    Sorry that I didn't provide details, here they are:

    "Maybe this is easier to reproduce on Linux"

    Probably you might be right, different thread timing, socket implementations etc. may be important. I didn't test it on Windows, I assumed that Ice will run in the same way on any supported OS. Also system resources like CPU usage may be important (during my tests my system was a little busy doing other things).

    "Do you really need to create and destroy all these communicators? Is thhis related or contributing to this crash? If not, you should create a single communicator, like most applications"

    This is not a good argument, if the bug is related to closing communicator you may not assume that Ice is allowed to crash the whole application. Also I feel that you confirmed that the code is valid and should not cause segfaults. If so let's just seek the bug because Ice reliability is under big question mark then.
    Update: the code with one communicator also fails in the same way.

    "You should also remove all the cookie-related code, as it doesn't seem relevant."

    I'm currently testing this case, will send results later if it will help.
    Update: the code without cookies also fails with the same stacktrace.

    "Also, do you run this client with any configuration? Your test case didn't provide any configuration. By default, the client thread pool has just 1 thread, so there is no much concurrency."

    I run client without any configuration. If I understand Ice framework well there are at least two threads involved: main thread and Ice thread from client pool. In programming it is just enough to write code with race conditions.

    "The client is totally unaware of the implementation of the server. The client has no idea if the server is using synchronous or asynchronous dispatch, or is written in C++ or Java. This implementation may only matter in terms of timing (when empty responses are sent back to the client)."

    I'm aware that client knows nothing about server implementation. I just don't know (I'm not sure if I would like to know) how protocol works, when and whether some acknowlegmenets are being sent or not etc. Maybe the client will fail indepedently from the server implementation, let's just skip it.
    Update: client with synchronous server also fails.

    Here is a stack trace caught by gdb:

    (gdb) bt
    #0 memmove () at ../sysdeps/i386/i686/memmove.S:68
    #1 0x002d1f44 in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<unsigne d char> (
    __first=0xb77e2ae8 "^\004", __last=0xb77e2aec "Tk\a\266\274\034?\220\033?\210\035?\364\217R" ,
    __result=0xe <Address 0xe out of bounds>) at /usr/include/c++/4.4/bits/stl_algobase.h:378
    #2 0x003209ea in std::__copy_move_a<false, unsigned char const*, unsigned char*> (
    __first=0xb77e2ae8 "^\004", __last=0xb77e2aec "Tk\a\266\274\034?\220\033?\210\035?\364\217R" ,
    __result=0xe <Address 0xe out of bounds>) at /usr/include/c++/4.4/bits/stl_algobase.h:397
    #3 0x0031ee8c in std::__copy_move_a2<false, unsigned char const*, unsigned char*> (
    __first=0xb77e2ae8 "^\004", __last=0xb77e2aec "Tk\a\266\274\034?\220\033?\210\035?\364\217R" ,
    __result=0xe <Address 0xe out of bounds>) at /usr/include/c++/4.4/bits/stl_algobase.h:436
    #4 0x0031cfda in std::copy<unsigned char const*, unsigned char*> (__first=0xb77e2ae8 "^\004",
    __last=0xb77e2aec "Tk\a\266\274\034?\220\033?\210\035?\364\217R" ,
    __result=0xe <Address 0xe out of bounds>) at /usr/include/c++/4.4/bits/stl_algobase.h:468
    #5 0x00310171 in Ice::ConnectionI::sendAsyncRequest (this=0xb619b778, out=..., compress=false,
    response=true) at ConnectionI.cpp:591
    #6 0x002f3a0c in IceInternal::ConnectRequestHandler::flushRequests (this=0xb61109d0)
    at ConnectRequestHandler.cpp:416
    #7 0x002f3497 in IceInternal::ConnectRequestHandler::setConnection (this=0xb61109d0, connection=...,
    compress=false) at ConnectRequestHandler.cpp:321
    #8 0x00407a9f in setConnection (this=0xb6129aa0, connection=..., compress=false) at Reference.cpp:1711
    #9 0x002fef76 in IceInternal::OutgoingConnectionFactory::ConnectCal lback::setConnection (this=0xb6129e30,
    connection=..., compress=false) at ConnectionFactory.cpp:1129
    #10 0x002fd25c in IceInternal::OutgoingConnectionFactory::finishGetC onnection (this=0xb61009c0,
    connectors=..., ci=..., connection=..., cb=...) at ConnectionFactory.cpp:766
    #11 0x002fe3ab in IceInternal::OutgoingConnectionFactory::ConnectCal lback::connectionStartCompleted (
    this=0xb6129e30, connection=...) at ConnectionFactory.cpp:955
    #12 0x0031363a in Ice::ConnectionI::dispatch (this=0xb619b778, startCB=..., sentCBs=..., compress=0 '\000',
    requestId=0, invokeNum=0, servantManager=..., adapter=..., outAsync=..., stream=...)
    at ConnectionI.cpp:1443
    #13 0x003134a7 in Ice::ConnectionI::message (this=0xb619b778, current=...) at ConnectionI.cpp:1428
    #14 0x00446cdc in IceInternal::ThreadPool::run (this=0xb6101140, thread=...) at ThreadPool.cpp:624
    ---Type <return> to continue, or q <return> to quit---
    #15 0x004486ef in IceInternal::ThreadPool::EventHandlerThread::run (this=0xb6101e20) at ThreadPool.cpp:1097
    #16 0x005839f9 in startHook (arg=0xb6101e20) at Thread.cpp:413
    #17 0x005a780e in start_thread (arg=0xb77e3b70) at pthread_create.c:300
    #18 0x007c68de in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

    I hope this should help you a lot. I have premonition that it has something common with reconnection eg. some async request objects are being stored on some list when there is no connection but they are just copies of the original request without buffers and that's the reason why it fails.

    Best regards
    Przemek
    Last edited by PrzemekD; 08-04-2010 at 04:42 AM.

  4. #4
    bernard's Avatar
    bernard is offline ZeroC Staff
    Name: Bernard Normier
    Organization: ZeroC, Inc.
    Project: Ice
    Join Date
    Feb 2003
    Location
    Palm Beach Gardens, FL
    Posts
    1,294
    Hi Przemek,

    Could you post your simplified test-case?

    This is not a good argument, if the bug is related to closing communicator you may not assume that Ice is allowed to crash the whole application.
    It's highly desirable to create a small and simple the test-case: if there is a bug in the test-case itself, it's easier to spot in a small test-case, and if there is a bug in Ice, it's helpful to remove extra code-path unrelated to the bug in question.

    Thanks,
    Bernard
    Bernard Normier
    ZeroC, Inc.

  5. #5
    PrzemekD is offline Registered User
    Name: Przemek Dlugosz
    Organization: INVENTWARE
    Project: Mainstream Project
    Join Date
    May 2010
    Posts
    7
    Hi Bernard,

    In the attachement there is a simplified test-case. Sorry for late reply, it took me a lot of time to apply small changes and make a test many times.
    In the attachement there is only a code for Client.cpp of "minimal" example. I mean one should take "minimal" example from current Ice distro and replace Client.cpp file.

    Details of my tests (they were perfomed on virtal machine - vmware):
    physical machine: Intel Core 2 duo, 2.1GHz 64bit, 4GB RAM
    host OS: Windows 7 Prof.
    guest machine (vmware): 32 bit, 2 core, 2GB RAM
    guest OS (vmware): Ubuntu 2.6.31-20-generic #58-Ubuntu SMP
    gcc: v4.4.1
    Ice: 3.4.0 / 3.4.1
    server/client config for Ice: none
    CPU load: high or medium (doing other tasks)

    Tests fails (segfalut) for timeout ~250ms when CPU load is high and for ~135ms when CPU load is medium. Please note: segfault does not always occur, sometimes program will exit without error. You have to run test a few times, at least 5 I think.

    In the source you will see commented line, it may be important.

    My short summary:
    - error occurs only when client invokes AMI calls
    - error occurs when proxy has timeout set to low value (for greater values segfault is less probable) and the proxy goes out of scope before async calls are being performed (?)
    - error may exist only on Linux

    How else could I help? I may provide you virtual machine settings and virtual hard disk file but it may be hard with the size of the file (a few gigabytes).

    Update: in the code in commented line you will see different port number 10001 instead of 10000, just ignore it an put "10000" if you want to uncomment the line and perform your own tests ("10000" is used for other purposes on my machine so during all my tests I used "10001" and I forgot to change it in order to be compliant with "minimal" example).

    Update 2: I have just run my test on real (not virtual) 64bit machine with Linux, 8 cores and 64GB of RAM, segfault also occurs in the same way. If I'll find some time I may try Windows version.

    Best regards
    Przemek
    Attached Files Attached Files
    Last edited by PrzemekD; 08-05-2010 at 07:43 AM.

  6. #6
    benoit's Avatar
    benoit is online now ZeroC Staff
    Name: Benoit Foucher
    Organization: ZeroC, Inc.
    Project: Ice
    Join Date
    Feb 2003
    Location
    Rennes, France
    Posts
    2,196
    Hi,

    Thanks for the simplified test case. I'll look further into it.

    Cheers,
    Benoit.

  7. #7
    benoit's Avatar
    benoit is online now ZeroC Staff
    Name: Benoit Foucher
    Organization: ZeroC, Inc.
    Project: Ice
    Join Date
    Feb 2003
    Location
    Rennes, France
    Posts
    2,196
    I was able to reproduce the problem. I believe a workaround is to disable automatic retries with --Ice.RetryIntervals=-1. I will post a patch once I have a fix. Thanks for the bug report!

    Cheers,
    Benoit.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 4
    Last Post: 01-26-2010, 11:32 AM
  2. Segmentation fault with Ice for Python
    By bobc in forum Help Center
    Replies: 1
    Last Post: 01-20-2009, 12:38 PM
  3. Segmentation fault with Freeze
    By Yunqiao Yin in forum Help Center
    Replies: 2
    Last Post: 02-12-2007, 08:08 PM
  4. Segmentation fault in Icestorm
    By davidcr1983 in forum Help Center
    Replies: 4
    Last Post: 08-08-2006, 09:37 AM
  5. IceInternal::incRef segmentation fault
    By xdm in forum Help Center
    Replies: 2
    Last Post: 06-02-2006, 02:44 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •