
Originally Posted by
bernard
You can use a NFS or other shared drive to store your config files. Of course, each server needs its own private config-file.
I am using IceGrid, so my config file includes:
Code:
IceGrid.Node.Data=db/node
And when my servers are activated, they get:
Code:
[ 12/01/08 13:20:46.753 icegridnode: Activator: activating server `Server-neuron-0-0-scale10'
path = python
pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice
uid/gid = 1907/30026
args = python SolverServer.py 10 --Ice.Config=/nfs/hydra.local/data/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config ]
Since my servers each have unique names, IceGrid seems to create unique config files for them.

Originally Posted by
bernard
Here, it sounds like a mounting problem, a permission problem, or a problem with the way icegridnode transforms a relative path into an absolute path.
When you log onto this machine, can you read /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config? Can the icegridnode user read this file?
After I launch 12 icegridnodes, the db/node directory contains this:
Code:
> find db/node/
db/node/
db/node/servers
db/node/servers/Server-neuron-0-6-scale5
db/node/servers/Server-neuron-0-6-scale5/config
db/node/servers/Server-neuron-0-6-scale5/config/config
db/node/servers/Server-neuron-0-6-scale5/dbs
db/node/servers/Server-neuron-0-6-scale5/distrib
db/node/servers/Server-neuron-0-6-scale5/revision
db/node/tmp
db/node/distrib
And the rest of the my servers have died because their config files didn't exist:
Code:
Ice.FileException: exception ::Ice::FileException
{
error = 2
path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config
}
It's not a permissions problem: all icegridnodes run as 'dstn', a normal user, because I don't have root access on this cluster. The directories simply don't exist at the time my server runs.
I ran the following experiment:
Code:
./startallnodes & for ((;;)); do echo "### `date` ###" >> ls; find db/node >> ls; sleep 0.01; done
What I see is that each node's directory gets created and populated, but then disappears:
Code:
> grep "config/config\|###" ls
### 27:00.840826000 ###
db/node/servers/Server-neuron-0-11-scale5/config/config
### 27:00.949344000 ###
db/node/servers/Server-neuron-0-0-scale10/config/config
### 27:01.067322000 ###
db/node/servers/Server-neuron-0-0-scale10/config/config
### 27:01.176296000 ###
db/node/servers/Server-neuron-0-0-scale10/config/config
### 27:01.285155000 ###
db/node/servers/Server-neuron-0-1-scale5/config/config
### 27:01.395943000 ###
db/node/servers/Server-neuron-0-1-scale5/config/config
### 27:01.506709000 ###
db/node/servers/Server-neuron-0-1-scale5/config/config
### 27:01.621234000 ###
### 27:01.730697000 ###
db/node/servers/Server-neuron-0-2-scale4/config/config
### 27:01.842905000 ###
db/node/servers/Server-neuron-0-2-scale4/config/config
### 27:01.955905000 ###
db/node/servers/Server-neuron-0-3-scale3/config/config
### 27:02.066291000 ###
db/node/servers/Server-neuron-0-3-scale3/config/config
### 27:02.177472000 ###
db/node/servers/Server-neuron-0-3-scale3/config/config
### 27:02.286882000 ###
db/node/servers/Server-neuron-0-4-scale2/config/config
### 27:02.402262000 ###
db/node/servers/Server-neuron-0-4-scale2/config/config
### 27:02.523075000 ###
db/node/servers/Server-neuron-0-4-scale2/config/config
### 27:02.646938000 ###
db/node/servers/Server-neuron-0-5-scale10/config/config
### 27:02.757144000 ###
db/node/servers/Server-neuron-0-5-scale10/config/config
### 27:02.874255000 ###
db/node/servers/Server-neuron-0-5-scale10/config/config
### 27:02.984534000 ###
db/node/servers/Server-neuron-0-6-scale5/config/config
### 27:03.097887000 ###
db/node/servers/Server-neuron-0-6-scale5/config/config
### 27:03.240119000 ###
db/node/servers/Server-neuron-0-6-scale5/config/config
### 27:03.357368000 ###
db/node/servers/Server-neuron-0-7-scale4/config/config
### 27:03.469632000 ###
db/node/servers/Server-neuron-0-7-scale4/config/config
### 27:03.582018000 ###
db/node/servers/Server-neuron-0-7-scale4/config/config
### 27:03.694741000 ###
### 27:03.804037000 ###
db/node/servers/Server-neuron-0-8-scale3/config/config
### 27:03.936422000 ###
db/node/servers/Server-neuron-0-8-scale3/config/config
### 27:04.066373000 ###
db/node/servers/Server-neuron-0-8-scale3/config/config
### 27:04.182170000 ###
db/node/servers/Server-neuron-0-9-scale2/config/config
### 27:04.318162000 ###
db/node/servers/Server-neuron-0-9-scale2/config/config
### 27:04.449006000 ###
### 27:04.580340000 ###
db/node/servers/Server-neuron-0-10-scale10/config/config
### 27:04.709145000 ###
db/node/servers/Server-neuron-0-10-scale10/config/config
### 27:04.821542000 ###
### 27:04.943917000 ###
db/node/servers/Server-neuron-0-11-scale5/config/config
### 27:05.071327000 ###
db/node/servers/Server-neuron-0-11-scale5/config/config
I get this error message from find:
Code:
find: WARNING: Hard link count is wrong for db/node/servers: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched.
So this may by an NFS directory-creation race condition bug.
I suggest a warning be added to the manual!
Cheers,
dustin.