Running on a SLURM cluster#
Most HPC clusters support a scheduler called SLURM ( Simple Linux Utility for Resource Management). The OpenQuake engine is able to interface with SLURM to make use of all the resources of the cluster.
Running OpenQuake calculations with SLURM#
Let’s consider a user with ssh access to a SLURM cluster. The only thing the user has to do after logging in is to load the openquake libraries with the command
$ load module openquake
Then running a calculation is quite trivial. The user has two choices: running the calculation on-the-fly with the command
$ srun oq engine --run job.ini
or running the calculation in a batch with the command
$ sbatch oq engine --run job.ini
In the first case the engine will log on the console the progress of the job and the errors, if any, will be clearly visible. This is the recommended approach for beginners. In the second case the progress will not be visible but it can be extracted from the engine database with the command
$ srun oq engine --show-log -1
where -1
denotes the last submitted job. Using sbatch
is
recommended to users that needs to send multiple calculations. The
calculations might be serialized by the engine queue, depending on the
configuration, but even if the jobs are sequential, the subtasks
spawned by them will run in parallel and make use of all of the
cluster.
NB: by default the engine will use a couple of nodes of the cluster.
You may use more or less resources by setting the parameter
concurrent_tasks
. For instance, if you want to use around 600
cores you can give the command
$ srun oq engine --run job.ini -p concurrent_tasks=600
All the usual oq commands
are available, but you need to prepend
srun
to them; for instance
$ srun oq show performance
will give informations about the performance of the last submitted job.
Running out of quota#
Right now the engine store all of its files (intermediate results and
calc_XXX.hdf5
files) under the $HOME/oqdata
directory. It is therefore
easy to run out of the quota for large calculations. Fortunaly there
is an environment variable $OQ_DATADIR
that can be configured to point
to some other target, like a directory on a large shared disk. Such
directory must be accessible in read/write mode from all workers in
the clusters. Another option is to set a shared_dir
in the
openquake.cfg
file and then the engine will store its data under the
path shared_dir/$HOME/oqdata
. This option is preferable since it will
work transparently for all users but only the sysadmin can set it.
Installing on HPC#
This section is for the administrators of the HPC cluster. Installing the engine requires access to PyPI since the universal installer will download packages from there.
Here are the installations instructions to create modules for engine 3.18 assuming you have python3.10 installed as modules.
We recommend choosing a base path for openquake and then installing the different versions using the release number, in our example /apps/openquake/3.18. This will create different modules for different releases
# module load python/3.10
# mkdir /apps/openquake
# python3.10 -m venv /apps/openquake/3.18
# source /apps/openquake/3.18/bin/activate
# pip install -U pip
# pip install -r https://github.com/gem/oq-engine/raw/engine-3.18/requirements-py310-linux64.txt
# pip install openquake.engine==3.18
Then you have to define the module file. In our cluster it is located in
/apps/Modules/modulefiles/openquake/3.18
, please use the appropriate
location for your cluster. The content of the file should be the following:
#%Module1.0
##
proc ModulesHelp { } {
puts stderr "\tOpenQuake - loads the OpenQuake environment"
puts stderr "\n\tThis will add OpenQuake to your PATH environment variable."
}
module-whatis "loads the OpenQuake 3.18 environment"
set version 3.18
set root /apps/openquake/3.18
prepend-path LD_LIBRARY_PATH $root/lib64
prepend-path MANPATH $root/share/man
prepend-path PATH $root/bin
prepend-path PKG_CONFIG_PATH $root/lib64/pkgconfig
prepend-path XDG_DATA_DIRS $root/share
After installing the engine, the sysadmin has to edit the file
/opt/openquake/venv/openquake.cfg
and set a few parameters:
[distribution]
oq_distribute = slurm
serialize_jobs = 2
python = /apps/openquake/3.18/bin/python
[directory]
# optionally set it to something like /mnt/large_shared_disk
shared_dir =
[dbserver]
host = local
With serialize_jobs = 2
at most two jobs per user can run concurrently. You may want to
increase or reduce this number. Each user will have its own database located in
$HOME/oqdata/db.sqlite3
. The database will be created automatically
the first time the user runs a calculation (NB: in engine-3.18 it must be
created manually with the command srun oq engine --upgrade-db --yes
).
How it works internally#
The support for SLURM is implemented in the module
openquake/baselib/parallel.py
. The idea is to submit to SLURM a job
array of tasks for each parallel phase of the calculation. For instance
a classical calculations has three phases: preclassical, classical
and postclassical.
The calculation will start sequentially, then it will reach the
preclassical phase: at that moment the engine will create a
bash script called slurm.sh
and located in the directory
$HOME/oqdata/calc_XXX
being XXX the calculation ID, which is
an OpenQuake concept and has nothing to do with the SLURM ID.
The slurm.sh
script has the following template:
#!/bin/bash
#SBATCH --job-name={mon.operation}
#SBATCH --array=1-{mon.task_no}
#SBATCH --time=10:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --output={mon.calc_dir}/%a.out
#SBATCH --error={mon.calc_dir}/%a.err
srun {python} -m openquake.baselib.slurm {mon.calc_dir} $SLURM_ARRAY_TASK_ID
At runtime the mon.
variables will be replaced with their values:
mon.operation
will be the string “preclassical”mon.task_no
will be the total number of tasks to spawnmon.calc_dir
will be the directory$HOME/oqdata/calc_XXX
python
will be the path to the python executable to use, as set in openquake.cfg
System administrators may want to adapt such template. At the moment this requires modifying the engine codebase; in the future the template may be moved in the configuration section.
A task in the OpenQuake engine is simply a Python function or
generator taking some arguments and a monitor object (mon
),
sending results to the submitter process via zmq.
Internally the engine will save the input arguments for each task
in pickle files located in $HOME/oqdata/calc_XXX/YYY.pik
, where
XXX is the calculation ID and YYY is the $SLURM_ARRAY_TASK_ID
starting from 1
to the total number of tasks.
The command srun {python} -m openquake.baselib.slurm {mon.calc_dir} $SLURM_ARRAY_TASK_ID
in slurm.sh
will submit the tasks in parallel
by reading the arguments from the input files.
Using a job array has the advantage that all tasks can be killed with a single command. This is done automatically by the engine if the user aborts the calculation or if the calculation fails due to an error.