16. Troubleshooting, Trix and Tips¶
Some common problems and solutions
16.1. The exax server fails to start¶
There are two common reasons for this.
The first is that the configuration file is wrong, or that workdirs do not exist. Make sure that the workdir(s) specified in the confiuration file also exists on the file system.
The second is that there is an error in one or more job scripts. When exax starts, it tries to import all job scripts. If a script cannot be imported because it contains an error, the server fails to start. Read the output carefully and fix the script.
16.2. Urd conflict Error¶
When doing urd.finish(), the program exits with an urd conflict
error. The reason for this is that the program is trying to write a
new urd item to an existing key/timestamp with a different contents
than what is already in the database. This is considered to be
correct behaviour. Solution proposals
If the contents was exactly the same, there would be no error
If the contents should be updated, add a
update=Trueto theurd.begin()call.Remove the entry by issuing an
urd.truncate(timestamp)with a timestamp preceeding the one to be written. This will erase all entries with timestamps larger or equal to the one specified.Check first if the item exists with
urd.peek(), and avoid all processing for the item if it already exists.
16.3. The build creates a new job although it already exists¶
check hash/code
why build?
input parameters (options, datasets, jobs)
workdirs, new, removed?
depends on something that was rebuilt
16.4. Remove items from the Urd database¶
the files in urd.db/ are human readable, so it is possible to edit them (or remove them), but this should not be necessary!
do update=True
urd.truncate()
16.5. Connecting to a remote board server¶
If there are several board servers running on a machine,
each occupies a port…
which port goes where
use sockets like this
16.6. What’s the thing with -LATEST?¶
The jobid ending with -LATEST is a pointer to the last built job in that work directory. Not the last re-used job.
16.7. Urd view in board does not show correct link_results¶
The “results” shown in the urd item view in board reflects what was link_result’ed when the urd item was created. If it is being re-generated by a different build script, but still using the same jobs, the urd item is not updated.
16.8. How to abort a running job¶
A running job can be aborted using the ax abort command. No need
to restart the server.
16.9. Connect to a remote¶
Connecting to board or urd servers on a remote server is typically done using ssh and port forwarding. The server can listen to either a port or a socket. The benefit of using a socket is that they are unique for each running server, so that several users can work independently on the same server.
This is how to connect using a port:
To set up a server with board listening on port 1234, enter this in
accelerator.confboard listen: localhost:1234
and connect using
ssh -L 9999:localhost:1234 <remote_server>
This forwards port 1234 on the server to you local machine’s port 9999, so pointing a browser to
http://localhost:9999should display the remote board.And here’s how to set up using a socket:
Enter this in
accelerator.confboard listen: .socket.dir/board
This will create a socket in the specified path below the project directory. Next, find the absolute path to the socket file (probably by issuing ``realpath .socket.dir/board), and create the ssh forwarding command like this
ssh -L 9999:/path/to/.socket.dir/board <remote_server>
Connect by pointing browser to
http://localhost:9999.Note
In the firs case, using ports, this assumes that the port is not already allocated. Similarly, if there are several users on the machine, each user needs to have a unique port number. This can get messy and difficult to maintain. Better to use sockets, since each instance of the exax server can have its own socket file. And it is still possilbe to have multiple users connecting to the _same_ socket using ssh.
16.10. Setting up a remote urd or board server¶
By default, starting the exax server (ax server) will also start a
board and an urd server. In some cases, it is better to have these
run separately, for example
the board server could be still running while the exax server is taken down for some maintenance
the urd server is shared between several users
To run a separate board server, remove the board entry from
accelerator.conf:# board listen :1234And start the board server from the same project directory using
ax board-server localhost:9999
To run a separate urd server, tell the server which urd it should listen to in ``accelerator.conf
urd: localhost:5555
In this case, it assumes there is a board server on the same machine on port 5555.
To start the urd server, run from the project directory
ax urd-server --listen localhost:5555
Tip
Use passwords to authenticate different urd users.
Tip
To have the urd server listening to _external_ connections, i.e. exax servers running on _other_ servers, replace
localhostby the network interface IP number that is used for the access, for exampleax urd-server --listen 10.1.2.3:5555