Go Back   ZeroC Forums > Bug Reports

Reply
 
LinkBack Thread Tools Rate Thread Display Modes
  #1 (permalink)  
Old 08-12-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Bus Error under Mac OX 10.4 and IcePy 3.3.0

(The following also applies to Ice 3.2.1)

We have the following setup:
  • Java server running inside of IceGrid, behind Glacier2
  • Trivial python client accessing a Glacier2 session

The client logs in and can interact with the server normally. When trying to acquire one particular servant (of many) via the session facade, the client segfaults and "Bus error" is printed.

The gdb output for the client is:
Code:
(gdb) run buserror.py
Starting program: /opt/local/bin/python buserror.py
Reading symbols for shared libraries . done

Program received signal SIGTRAP, Trace/breakpoint trap.
0x8fe01010 in __dyld__dyld_start ()
(gdb) c
Continuing.
Reading symbols for shared libraries ... done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
Reading symbols for shared libraries ...... done
08/12/08 16:58:37.966 warning: deprecated property: Ice.MonitorConnections
Now

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x007224e9 in IcePy::SyncTypedInvocation::invoke ()
(gdb) bt
#0  0x007224e9 in IcePy::SyncTypedInvocation::invoke ()
#1  0x0071b6fb in operationInvoke ()
#2  0x0027fa7e in PyEval_EvalFrame (f=0x6083f0) at Python/ceval.c:3568
#3  0x00280619 in PyEval_EvalCodeEx (co=0x13a6620, globals=0x5b14b0, locals=0x0, args=0x608cfc, argcount=1, kws=0x608d00, kwcount=0, defs=0x13f263c, defcount=1, closure=0x0) at Python/ceval.c:2741
#4  0x0027e453 in PyEval_EvalFrame (f=0x608bb0) at Python/ceval.c:3661
#5  0x00280619 in PyEval_EvalCodeEx (co=0x4fe60, globals=0x20a50, locals=0x20a50, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#6  0x00280859 in PyEval_EvalCode (co=0x4fe60, globals=0x20a50, locals=0x20a50) at Python/ceval.c:484
#7  0x002a75b1 in PyRun_FileExFlags (fp=0xa000bda0, filename=0xbfffd8f3 "buserror.py", start=257, globals=0x20a50, locals=0x20a50, closeit=1, flags=0xbfffd608) at Python/pythonrun.c:1287
#8  0x002a7894 in PyRun_SimpleFileExFlags (fp=0xa000bda0, filename=0xbfffd8f3 "buserror.py", closeit=1, flags=0xbfffd608) at Python/pythonrun.c:871
#9  0x002b13dc in Py_Main (argc=1, argv=0xbfffd688) at Modules/main.c:493
#10 0x000018ee in ?? ()
#11 0x00001815 in ?? ()
(gdb)
The client code is:
Code:
import omero
c = omero.client()
s = c.createSession()
print "Now"
s.getScriptService()
The bus error is triggered at line 263 in IceInternal.ServantManager.java. (connection.sendResponse) and no exception shows up in the server.

An interesting corollary is that the log file for the server is halted (for lack of a better term) on the first bus error. I.e. our log statement "Creating servant" is printed, and afterwards nothing else even though further interactions are possible with the server.
Reply With Quote
  #2 (permalink)  
Old 08-13-2008
benoit's Avatar
benoit benoit is offline
ZeroC Staff
 
Name: Benoit Foucher
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Rennes, France
Posts: 1,928
Hi,

Note that we no longer support Mac OS X 10.4 for Ice 3.3.0. Which python version do you use and did you build python yourself? Did you try running the IcePy tests to see if they worked?

The best would be to try on Mac OS X 10.5 using python installed with the system and see if you can reproduce the problem.

Cheers,
Benoit.
Reply With Quote
  #3 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Comparison to Java

Similar code in Java hangs but does not crash.

Code:
public class buserror {
    public static void main(String[] args) throws Exception {
        omero.client c = new omero.client();
        omero.api.ServiceFactoryPrx s = c.createSession();
        System.out.println("first");
        s.getQueryService();
        System.out.println("now");
        s.getScriptService();
    }
}
produces:

Code:
first
now
Sending kill -QUIT to the process returns:

Code:
Full thread dump Java HotSpot(TM) Client VM (1.5.0_13-121 mixed mode, sharing):

"DestroyJavaVM" prio=5 tid=0x00501320 nid=0x1803800 waiting on condition [0x00000000..0xb0800000]

"Ice.SelectorThread" prio=5 tid=0x00511b70 nid=0x1969000 runnable [0xb0e8d000..0xb0e8dd10]
        at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
        at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:118)
        at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x266200d0> (a sun.nio.ch.Util$1)
        - locked <0x266200e0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x26620088> (a sun.nio.ch.KQueueSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at IceInternal.Selector.select(Selector.java:139)
        at IceInternal.SelectorThread.run(SelectorThread.java:120)
        at IceInternal.SelectorThread$HelperThread.run(SelectorThread.java:273)

"Ice.ThreadPool.Client-0" prio=5 tid=0x00511860 nid=0x1968c00 runnable [0xb0e0c000..0xb0e0cd10]
        at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
        at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:118)
        at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x26620228> (a sun.nio.ch.Util$1)
        - locked <0x26620238> (a java.util.Collections$UnmodifiableSet)
        - locked <0x266201e0> (a sun.nio.ch.KQueueSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at IceInternal.Selector.select(Selector.java:139)
        at IceInternal.ThreadPool.run(ThreadPool.java:335)
        at IceInternal.ThreadPool.access$100(ThreadPool.java:12)
        at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:951)

"Ice.EndpointHostResolverThread" prio=5 tid=0x00511230 nid=0x199ba00 in Object.wait() [0xb0d8b000..0xb0d8bd10]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x26620390> (a IceInternal.EndpointHostResolver)
        at java.lang.Object.wait(Object.java:474)
        at IceInternal.EndpointHostResolver.run(EndpointHostResolver.java:77)
        - locked <0x26620390> (a IceInternal.EndpointHostResolver)
        at IceInternal.EndpointHostResolver$HelperThread.run(EndpointHostResolver.java:146)

"Ice.ThreadPool.Server-0" prio=5 tid=0x005108c0 nid=0x199be00 runnable [0xb0d0a000..0xb0d0ad10]
        at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
        at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:118)
        at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x26620478> (a sun.nio.ch.Util$1)
        - locked <0x26620488> (a java.util.Collections$UnmodifiableSet)
        - locked <0x26620430> (a sun.nio.ch.KQueueSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at IceInternal.Selector.select(Selector.java:139)
        at IceInternal.ThreadPool.run(ThreadPool.java:335)
        at IceInternal.ThreadPool.access$100(ThreadPool.java:12)
        at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:951)

"Ice.Timer" prio=5 tid=0x0050fa70 nid=0x198ec00 in Object.wait() [0xb0c89000..0xb0c89d10]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x26b1fd98> (a IceInternal.Timer)
        at IceInternal.Timer.run(Timer.java:217)
        - locked <0x26b1fd98> (a IceInternal.Timer)

"Low Memory Detector" daemon prio=5 tid=0x00508f60 nid=0x1816800 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=9 tid=0x00508560 nid=0x1816400 waiting on condition [0x00000000..0xb0b06748]

"Signal Dispatcher" daemon prio=9 tid=0x00508090 nid=0x1813200 waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=8 tid=0x00507830 nid=0x1811600 in Object.wait() [0xb0a04000..0xb0a04d10]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x26a949e0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x26a949e0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00507480 nid=0x1810800 in Object.wait() [0xb0983000..0xb0983d10]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x26a94a60> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:474)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x26a94a60> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=9 tid=0x00506c00 nid=0x1810400 runnable 

"VM Periodic Task Thread" prio=9 tid=0x00509b30 nid=0x1816c00 waiting on condition 

"Exception Catcher Thread" prio=10 tid=0x005015a0 nid=0x1803c00 runnable
Reply With Quote
  #4 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Reproduced under Ubuntu/Hardy with Ice 3.2.1

Using the standard packages under hardy (currently 3.2.1) the Python example above segfaults.
Reply With Quote
  #5 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Sorry, our posts crossed each other, Benoit.

Quote:
Originally Posted by benoit View Post
Note that we no longer support Mac OS X 10.4 for Ice 3.3.0. Which python version do you use and did you build python yourself? Did you try running the IcePy tests to see if they worked?
This is Python 2.4.5 from MacPorts. I did not try to run the tests, since I'm also using the Ice port which has always worked admirably. But I will do that now.

Quote:
The best would be to try on Mac OS X 10.5 using python installed with the system and see if you can reproduce the problem.
Unfortunately that's not an option in the immediate term. I might should add that this setup was working under 3.2.1 on 10.4 and only started to segfault when I switched the server to AMD (This particular method, however, is not AMD, making it possibly unrelated.) The switch to 3.3.0 made no difference.
Reply With Quote
  #6 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
python allTests.py passes using the 3.3.0 ports under 10.4.
Reply With Quote
  #7 (permalink)  
Old 08-13-2008
benoit's Avatar
benoit benoit is offline
ZeroC Staff
 
Name: Benoit Foucher
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Rennes, France
Posts: 1,928
Hi Josh,

I suspect the Java client hangs because you don't destroy the communicator (and therefore threads created by the communicator aren't terminated and the JVM hangs on exit waiting for these threads to be gone). I'm afraid it's difficult to know what could be the problem with so little information. The fact that you changed how the server is implemented shouldn't have any effects on your Python client.

Could you perhaps narrow down the crash to a small self contained test case that we could use to try to reproduce it?

Btw, are you running this code on an Intel platform? Did you try setting the DYLD_BIND_AT_LAUNCH=1 environment variable to see it made a difference (we used to experience random crashes on 10.4 without this variable set in the environment).

Cheers,
Benoit.
Reply With Quote
  #8 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Quote:
Originally Posted by benoit View Post
I suspect the Java client hangs because you don't destroy the communicator (and therefore threads created by the communicator aren't terminated and the JVM hangs on exit waiting for these threads to be gone).
Apologies, you're of course right. In my attempt to make a minimal code sample, the shutdown was left out. The Java code does not hang.

Quote:
Originally Posted by benoit View Post
I'm afraid it's difficult to know what could be the problem with so little information. The fact that you changed how the server is implemented shouldn't have any effects on your Python client.
I definitely agree that it shouldn't make a difference, but it's so odd that other facade methods should work normally. The only other substantial interface difference is that the script service is forward declared (without ["ami","amd"] while the definition is in another file (with ["amd","ami"]). Am I grasping at straws?

Quote:
Originally Posted by benoit View Post
Could you perhaps narrow down the crash to a small self contained test case that we could use to try to reproduce it?
I'm certainly working on it, but as always, there are a terrible number of moving parts. Finding the one that has quite abruptly started causing a segfault isn't proving easy.

Quote:
Originally Posted by benoit View Post
Btw, are you running this code on an Intel platform? Did you try setting the DYLD_BIND_AT_LAUNCH=1 environment variable to see it made a difference (we used to experience random crashes on 10.4 without this variable set in the environment).
I had not tried DYLD_BIND_AT_LAUNCH, but it made no difference.

Thanks,
~J.
Reply With Quote
  #9 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
By the way, here's the Ubuntu/Hardy backtrace:
Code:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f30e04986e0 (LWP 22916)]
0x00007f30dee2281a in IcePy::OperationI::invoke (this=0x8c76c0, proxy=@0x7fffe84b8c20, args=0x7f30e0458050, pyctx=0x702580) at Operation.cpp:570
570                         Py_INCREF(ret);
(gdb) bt
#0  0x00007f30dee2281a in IcePy::OperationI::invoke (this=0x8c76c0, proxy=@0x7fffe84b8c20, args=0x7f30e0458050, pyctx=0x702580) at Operation.cpp:570
#1  0x00007f30dee1d71c in operationInvoke (self=0x7f30dd7342d0, args=0x7f30e0436a50) at Operation.cpp:206
#2  0x0000000000477146 in PyEval_EvalFrame ()
#3  0x00000000004784f5 in PyEval_EvalCodeEx ()
#4  0x000000000047780d in PyEval_EvalFrame ()
#5  0x00000000004784f5 in PyEval_EvalCodeEx ()
#6  0x0000000000478622 in PyEval_EvalCode ()
#7  0x000000000049ccca in PyRun_FileExFlags ()
#8  0x000000000049cf19 in PyRun_SimpleFileExFlags ()
#9  0x00000000004125c9 in Py_Main ()
#10 0x00007f30df6b71c4 in __libc_start_main () from /lib/libc.so.6
#11 0x0000000000411ae9 in _start ()
(gdb)
Reply With Quote
  #10 (permalink)  
Old 08-13-2008
benoit's Avatar
benoit benoit is offline
ZeroC Staff
 
Name: Benoit Foucher
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Rennes, France
Posts: 1,928
Hi,

It crashes when the getScriptService operation is invoked, is this correct? It looks like the crash occurs when un-marshalling the response from the server. What is the signature of this Slice operation?

Cheers,
Benoit.
Reply With Quote
  #11 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Correct.
Code:
        interface ServiceFactory extends Glacier2::Session
        {
            ...
            IScript*   getScriptService() throws ServerError;
with the ... all of the same type:
Code:
            IAdmin*    getAdminService() throws ServerError;
            IConfig*   getConfigService() throws ServerError;
            IPixels*   getPixelsService() throws ServerError;
            IPojos*    getPojosService() throws ServerError;
            IQuery*    getQueryService() throws ServerError;
            // etc.
all of which function normally. The implementation (just for reference) is:

Code:
   public IScriptPrx getScriptService(Ice.Current current) throws ServerError {
        Ice.Identity id = getIdentity(current, SCRIPTSERVICE.value);
        Ice.Object servant = current.adapter.find(id);
        if (servant == null) {
            servant = (Ice.Object) this.context.getBean(SCRIPTSERVICE.value);
            if (servant instanceof Ice.TieBase) {
                Ice.TieBase tie = (Ice.TieBase) servant;
                Object obj = tie.ice_delegate();
                if (obj instanceof ServiceFactoryAware) {
                    ((ServiceFactoryAware) obj).setServiceFactory(this);
                }
            }
            current.adapter.add(servant, id);
        }
        Ice.ObjectPrx prx = current.adapter.createProxy(id);
        return IScriptPrxHelper.checkedCast(prx);
    }
Reply With Quote
  #12 (permalink)  
Old 08-13-2008
benoit's Avatar
benoit benoit is offline
ZeroC Staff
 
Name: Benoit Foucher
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Rennes, France
Posts: 1,928
Hi Josh,

I believe I was able to reproduce. The problem shows up if the client doesn't load the Slice definition for the proxy type returned by the invocation.

Do you forward declare the IScript interface in the definition of the ServiceFactory interface and don't load the definition of the IScript interface in the python client? If that's the case, I believe loading the Slice file where the IScript interface is defined should solve the issue.

Cheers,
Benoit.
Reply With Quote
  #13 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
You're right. I had wondered about this after my last post, but unfortunately hadn't been able to test it during lunch. Thanks for being faster.

Do you have an idea why the server log would freeze when the client segfaults, and if there is a possible workaround?
Reply With Quote
  #14 (permalink)  
Old 08-13-2008
benoit's Avatar
benoit benoit is offline
ZeroC Staff
 
Name: Benoit Foucher
Organization: ZeroC, Inc.
Project: Ice
 
Join Date: Feb 2003
Location: Rennes, France
Posts: 1,928
Could you detail a little more how logging works in your Java server? Is this just straight logging on the standard output? Can the server still process incoming requests after this "log freeze" occurs?

Cheers,
Benoit.
Reply With Quote
  #15 (permalink)  
Old 08-13-2008
joshmoore joshmoore is offline
Registered User
 
Name: Josh Moore
Organization: Glencoe Software, Inc.
Project: OMERO, http://trac.openmicroscopy.org.uk/omero
 
Join Date: Feb 2007
Location: Germany
Posts: 99
Hi Benoit,

The Java server logs via commons-logging/log4j to standard out which is caught by icegridnode and printed to var/log/master.out:

Code:
#IceGrid.Node.Output=var/log
#IceGrid.Node.RedirectErrToOut=1

# Defines that the standard out from this node and all
# contained servers will be saved in the following files.
# If Output is set, then output from the servers will be
# redirected _after_ Ice.Communicator creation.
Ice.StdOut=var/log/master.out
Ice.StdErr=var/log/master.err
The only other possibly related tidbit is that the Java process is started via python's os.execvpe (where icegridnode runs "python ...").

And yes, the server still behaves completely normally except for stdout. The only way to "restart" the log that I've found is restarting the server.
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
can icepy support p3k? fabware Help Center 1 06-02-2009 05:42 AM
"error return without exception set" when Ice.ImplicitContext not set in IcePy joshmoore Bug Reports 1 08-12-2008 09:59 AM
IcePy and shutdown alexm Help Center 6 04-20-2006 10:59 PM
about icepy.py code Help Center 2 06-12-2005 03:26 PM
IcePy on AIX? zephyr007 Help Center 1 03-11-2005 03:17 AM


All times are GMT -4. The time now is 02:20 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0
(c) 2009 ZeroC, Inc.