(slurm)=

# Running on a SLURM cluster

Most HPC clusters support a scheduler called SLURM (
Simple Linux Utility for Resource Management). The OpenQuake engine
is able to interface with SLURM to make use of all the resources
of the cluster.

## Running OpenQuake calculations with SLURM

Let's consider a user with ssh access to a SLURM cluster. The only
thing the user has to do after logging in is to load the openquake
libraries with the command
```
$ load module openquake
```
Then running a calculation is quite trivial. The user has two choices:
running the calculation on-the-fly with the command
```
$ srun oq engine --run job.ini
```
or running the calculation in a batch with the command
```
$ sbatch oq engine --run job.ini
```
In the first case the engine will log on the console the progress
of the job and the errors, if any, will be clearly visible. This
is the recommended approach for beginners. In the
second case the progress will not be visible but it
can be extracted from the engine database with
the command
```
$ srun oq engine --show-log -1
```
where `-1` denotes the last submitted job. Using `sbatch` is
recommended to users that needs to send multiple calculations. The
calculations might be serialized by the engine queue, depending on the
configuration, but even if the jobs are sequential, the subtasks
spawned by them will run in parallel and make use of all of the
cluster.

NB: by default the engine will use a couple of nodes of the cluster.
You may use more or less resources by setting the parameter
`concurrent_tasks`. For instance, if you want to use around 600
cores you can give the command
```
$ srun oq engine --run job.ini -p concurrent_tasks=600
```

All the usual `oq commands` are available, but you need to prepend
`srun` to them; for instance
```
$ srun oq show performance
```
will give informations about the performance of the last submitted job.

## Running out of quota

Right now the engine store all of its files (intermediate results and
`calc_XXX.hdf5` files) under the `$HOME/oqdata` directory. It is therefore
easy to run out of the quota for large calculations. Fortunaly there
is an environment variable `$OQ_DATADIR` that can be configured to point
to some other target, like a directory on a large shared disk. Such
directory must be accessible in read/write mode from all workers in
the clusters. Another option is to set a `shared_dir` in the
`openquake.cfg` file and then the engine will store its data under the
path `shared_dir/$HOME/oqdata`. This option is preferable since it will
work transparently for all users but only the sysadmin can set it.

## Installing on HPC

This section is for the administrators of the HPC cluster.
Installing the engine requires access to PyPI since the universal
installer will download packages from there. 

Here are the installations instructions to create modules for
engine 3.18 assuming you have python3.10 installed as modules.

We recommend choosing a base path for openquake and then installing 
the different versions using the release number, in our example /apps/openquake/3.18.
This will create different modules for different releases

```
# module load python/3.10
# mkdir /apps/openquake
# python3.10 -m venv /apps/openquake/3.18
# source /apps/openquake/3.18/bin/activate
# pip install -U pip
# pip install -r https://github.com/gem/oq-engine/raw/engine-3.18/requirements-py310-linux64.txt
# pip install openquake.engine==3.18
```
Then you have to define the module file. In our cluster it is located in
`/apps/Modules/modulefiles/openquake/3.18`, please use the appropriate
location for your cluster. The content of the file should be the following:
```
#%Module1.0
##
proc ModulesHelp { } {

  puts stderr "\tOpenQuake - loads the OpenQuake environment"
  puts stderr "\n\tThis will add OpenQuake to your PATH environment variable."
}

module-whatis   "loads the OpenQuake 3.18 environment"

set     version         3.18
set     root    /apps/openquake/3.18 

prepend-path    LD_LIBRARY_PATH $root/lib64
prepend-path    MANPATH         $root/share/man
prepend-path    PATH            $root/bin
prepend-path    PKG_CONFIG_PATH $root/lib64/pkgconfig
prepend-path    XDG_DATA_DIRS   $root/share
```
After installing the engine, the sysadmin has to edit the file
`/opt/openquake/venv/openquake.cfg` and set a few parameters:
```
[distribution]
oq_distribute = slurm
serialize_jobs = 2
python = /apps/openquake/3.18/bin/python

[directory]
# optionally set it to something like /mnt/large_shared_disk
shared_dir =

[dbserver]
host = local
```
With `serialize_jobs = 2` at most two jobs per user can run concurrently. You may want to
increase or reduce this number. Each user will have its own database located in
`$HOME/oqdata/db.sqlite3`. The database will be created automatically
the first time the user runs a calculation (NB: in engine-3.18 it must be
created manually with the command `srun oq engine --upgrade-db --yes`).

## How it works internally

The support for SLURM is implemented in the module
`openquake/baselib/parallel.py`. The idea is to submit to SLURM a job
array of tasks for each parallel phase of the calculation. For instance
a classical calculations has three phases: preclassical, classical
and postclassical.

The calculation will start sequentially, then it will reach the
preclassical phase: at that moment the engine will create a
bash script called `slurm.sh` and located in the directory
`$HOME/oqdata/calc_XXX` being XXX the calculation ID, which is
an OpenQuake concept and has nothing to do with the SLURM ID.
The `slurm.sh` script has the following template:
```bash
#!/bin/bash
#SBATCH --job-name={mon.operation}
#SBATCH --array=1-{mon.task_no}
#SBATCH --time=10:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --output={mon.calc_dir}/%a.out
#SBATCH --error={mon.calc_dir}/%a.err
srun {python} -m openquake.baselib.slurm {mon.calc_dir} $SLURM_ARRAY_TASK_ID
```
At runtime the `mon.` variables will be replaced with their values:

- `mon.operation` will be the string "preclassical"
- `mon.task_no` will be the total number of tasks to spawn
- `mon.calc_dir` will be the directory `$HOME/oqdata/calc_XXX`
- `python` will be the path to the python executable to use, as set in openquake.cfg

System administrators may want to adapt such template. At the moment
this requires modifying the engine codebase; in the future the template
may be moved in the configuration section.

A task in the OpenQuake engine is simply a Python function or
generator taking some arguments and a monitor object (`mon`),
sending results to the submitter process via zmq.

Internally the engine will save the input arguments for each task
in pickle files located in `$HOME/oqdata/calc_XXX/YYY.pik`, where
XXX is the calculation ID and YYY is the `$SLURM_ARRAY_TASK_ID` starting from 1
to the total number of tasks.

The command `srun {python} -m openquake.baselib.slurm {mon.calc_dir}
$SLURM_ARRAY_TASK_ID` in `slurm.sh` will submit the tasks in parallel
by reading the arguments from the input files.

Using a job array has the advantage that all tasks can be killed
with a single command. This is done automatically by the engine
if the user aborts the calculation or if the calculation fails
due to an error.