This morning we experienced a situation in which ~50 threads for one of our IceBox-hosted servants were hung calling another servant hosted in IceBox in a gridded-deployment (no shared Communicator, so calls from one servant to the other hit the network). All of the hung threads had the same bottom of the stack trace as follows:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
IceInternal.Outgoing.invoke(Outgoing.java:137)
com.hp.halo.HaloICE._DatabaseServicesDelM.acquireL ock(Unknown Source)
com.hp.halo.HaloICE.DatabaseServicesPrxHelper.acqu ireLock(Unknown Source)
com.hp.halo.HaloICE.DatabaseServicesPrxHelper.acqu ireLock(Unknown Source)
com.hp.halo.eventservice.EventServicesImpl2.acquir eLock(EventServicesImpl2.java:1197)
com.hp.halo.eventservice.EventServicesImpl2.termin ateEvent(EventServicesImpl2.java:623)
com.hp.halo.HaloICE._EventServicesDisp.___terminat eEvent(Unknown Source)
com.hp.halo.HaloICE._EventServicesDisp.__dispatch( Unknown Source)
IceInternal.Incoming.invoke(Incoming.java:147)
Ice.ConnectionI.invokeAll(ConnectionI.java:2249)
Ice.ConnectionI.message(ConnectionI.java:1362)
IceInternal.ThreadPool.run(ThreadPool.java:782)
IceInternal.ThreadPool.access$100(ThreadPool.java: 12)
IceInternal.ThreadPool$EventHandlerThread.run(Thre adPool.java:1242)
Stack trace:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
IceInternal.Outgoing.invoke(Outgoing.java:137)
com.hp.halo.HaloICE._DatabaseServicesDelM.getEvent ById(Unknown Source)
com.hp.halo.HaloICE.DatabaseServicesPrxHelper.getE ventById(Unknown Source)
com.hp.halo.HaloICE.DatabaseServicesPrxHelper.getE ventById(Unknown Source)
com.hp.halo.eventservice.EventConfiguration.initia lizeEventConfiguration(EventConfiguration.java:200 )
com.hp.halo.eventservice.EventConfiguration.<init> (EventConfiguration.java:176)
com.hp.halo.eventservice.EventServicesImpl2.update Event(EventServicesImpl2.java:975)
com.hp.halo.HaloICE._EventServicesDisp.___updateEv ent(Unknown Source)
com.hp.halo.HaloICE._EventServicesDisp.__dispatch( Unknown Source)
IceInternal.Incoming.invoke(Incoming.java:147)
Ice.ConnectionI.invokeAll(ConnectionI.java:2249)
Ice.ConnectionI.message(ConnectionI.java:1362)
IceInternal.ThreadPool.run(ThreadPool.java:782)
IceInternal.ThreadPool.access$100(ThreadPool.java: 12)
IceInternal.ThreadPool$EventHandlerThread.run(Thre adPool.java:1242)
What was common to all ~50 of the hung threads was the following line:
IceInternal.Outgoing.invoke(Outgoing.java:137)
The offending block of Outgoing.java immediately below:
if(timedOut)
{
//
// Must be called outside the synchronization of
// this object
//
_connection.exception(new Ice.TimeoutException());
//
// We must wait until the exception set above has
// propagated to this Outgoing object.
//
synchronized(this)
{
while(_state == StateInProgress)
{
try
{
wait();
}
catch(InterruptedException ex)
{
}
}
}
}
Certain Bug #1: the _state variable is not declared as volatile. Thus when another thread calls Output.finished(...) to change _state and call notify(), the thread which is in the block below does not have to go to memory to get the updated value, but can simply reference a value cached in a register. The result is a thread hung in the loop above. Consult the java memory model for confirmation (chapter 17 of the java language specification, or Brian Goetz's Java Concurrency in Practice)
Possible Bug #2: It appears to me that the call to
_connection.exception(new Ice.TimeoutException());
will ultimately result in a call to Output.finished(). If this is the case, nothing prevents the call to Output.finished(...) from completing prior to entering the synchronized(this) block in the code snippet above, again resulting in a hung thread, as the notify() occurs prior to the wait().

Reply With Quote