# Common mistakes: bad configuration parameters¶

By far, the most common source of problems with the engine is the choice of parameters in the job.ini file. It is very easy to make mistakes, because users typically copy the parameters from the OpenQuake demos. However, the demos are meant to show off all of the features of the engine in simple calculations, they are not meant for getting performance in large calculations.

## The quadratic parameters¶

In large calculations, it is essential to tune a few parameters that are really important. Here is a list of parameters relevant for all calculators:

- maximum_distance:
- The larger the maximum_distance, the more sources and ruptures will be
considered; the effect is quadratic, i.e. a calculation with
`maximum_distance=500`

km could take up to 6.25 times more time than a calculation with`maximum_distance=200`

km. - region_grid_spacing:
- The hazard sites can be specified by giving a region and a grid step.
Clearly the size of the computation is quadratic with the inverse grid
step: a calculation with
`region_grid_spacing=1`

will be up to 100 times slower than a computation with`region_grid_spacing=10`

. - area_source_discretization:
- Area sources are converted into point sources,
by splitting the area region into a grid of points. The
`area_source_discretization`

(in km) is the step of the grid. The computation time is inversely proportional to the square of the discretization step, i.e. calculation with`area_source_discretization=5`

will take up to four times more time than a calculation with`area_source_discretization=10`

. - rupture_mesh_spacing:
- Fault sources are computed by converting the geometry of the fault into
a mesh of points; the
`rupture_mesh_spacing`

is the parameter determining the size of the mesh. The computation time is quadratic with the inverse mesh spacing. Using a`rupture_mesh_spacing=2`

instead of`rupture_mesh_spacing=5`

will make your calculation up to 6.25 times slower. Be warned that the engine may complain if the`rupture_mesh_spacing`

is too large. - complex_fault_mesh_spacing:
- The same as the
`rupture_mesh_spacing`

, but for complex fault sources. If not specified, the value of`rupture_mesh_spacing`

will be used. This is a common cause of problems; if you have performance issue you should consider using a larger`complex_fault_mesh_spacing`

. For instance, if you use a`rupture_mesh_spacing=2`

for simple fault sources but`complex_fault_mesh_spacing=10`

for complex fault sources, your computation can become up to 25 times faster, assuming the complex fault sources are dominating the computation time.

## Intensity measure types and levels¶

Classical calculations are roughly linear in the number of intensity measure types and levels. A common mistake is to use too many levels. For instance a configuration like the following one:

```
intensity_measure_types_and_levels = {
"PGA": logscale(0.001,4.0, 100),
"SA(0.3)": logscale(0.001,4.0, 100),
"SA(1.0)": logscale(0.001,4.0, 100)}
```

requires computing the PoEs on 300 levels. Is that really what the user wants? It could very well be that using only 20 levels per each intensity measure type produces good enough results, while potentially reducing the computation time by a factor of 5.

## pointsource_distance¶

PointSources (and MultiPointSources and AreaSources, which are split into PointSources and therefore are effectively the same thing) are not pointwise for the engine: they actually generate ruptures with rectangular surfaces, where size is determined by the magnitude scaling relationship. The geometry and position of such rectangles depends on the hypocenter distribution and the nodal plane distribution of the point source, which are used to model the uncertainties on the hypocenter location and on the orientation of the underlying ruptures.

Is the effect of the hypocenter/nodal planes distributions relevant? Not always: in particular, if you are interested in points that are far from the rupture the effect is minimal. So if you have a nodal plane distribution with 20 planes and a hypocenter distribution with 5 hypocenters, the engine will consider 20 x 5 ruptures and perform 100 times more calculations than needed, since at large distance the hazard will be more or less the same for each rupture.

To avoid this performance problem there is a `pointsource_distance`

parameter: you can set it in the `job.ini`

as a dictionary (tectonic
region type -> distance in km) or as a scalar (in that case it is
converted into a dictionary `{"default": distance}`

and the same
distance is used for all TRTs). For sites that are more distant than
the pointsource_distance from the point source, the engine ignores
the hypocenter and nodal plane distributions and consider only the
first rupture in the distribution, by rescaling its occurrent rate to
also take into account the effect of the other ruptures. For closer
points, all the ruptures are considered. This approximation
(we call it *rupture collapsing* because it essentially reduces the
number of ruptures) can give a substantial speedup if the model is
dominated by PointSources and there are several nodal
planes/hypocenters in the distribution. In some situations it also
makes sense to set

`pointsource_distance = 0`

to completely remove the nodal plane/hypocenter distributions. For
instance the Indonesia model has 20 nodal planes for each point
sources; however such model uses the so-called equivalent distance
approximation which considers the point sources to be really
pointwise. In this case the contribution to the hazard is totally
independent from the nodal plane and by using ```
pointsource_distance =
0
```

one can get *exactly* the same numbers and run the model in 1 hour
instead of 20 hours. Actually, starting from engine 3.3 the engine is
smart enough to recognize the cases where the equivalent distance
approximation is used and automatically set ```
pointsource_distance =
0
```

.

Even if you not using the equivalent distance approximation, the
effect of the nodal plane/hypocenter distribution can be negligible: I
have seen cases when setting setting `pointsource_distance = 0`

changed the result in the hazard maps only by 0.1% and gained an order of
magnitude of speedup. You have to check on a case by case basis.

NB: the `pointsource_distance`

approximation has changed a lot
across engine releases and you should not expect it to give always the same
results. For instance in engine 3.8 it has been
extended to take into account the fact that small magnitudes will have
a smaller collapse distance. For instance, if you
set `pointsource_distance=100`

, the engine will collapse the ruptures
over 100 km for the maximum magnitude, but for lower magnitudes the
engine will consider a (much) shorter collapse distance and will collapse
a lot more ruptures. This is possible because given a tectonic region type
the engine knows all the GMPEs associated to that tectonic region and can
compute an upper limit for the maximum intensity generated by a rupture at any
distance. Then it can invert the curve and given the magnitude and the
maximum intensity can determine the collapse distance for that magnitude.

In engine 3.9 this feature has been hidden and it is not used anymore by default. However, you will not get the same results than in engine 3.7 because the underlying logic has changed: before we were just picking one nodal plane/hypocenter from the distribution, now the approximation neglects completely the finite size effects by replacing planar ruptures with point ruptures of zero lenght. This is the reason why in engine 3.9 one should use larger pointsource_distances than before, since the approximation is cruder. On the other hand, it collapses more than before and it makes the engine much faster for single site analysis.

Starting from engine 3.9 you can set

pointsource_distance = ?

and the engine will automagically define a magnitude-dependent magnitude pointsource_distance, but it is recommended that you use your own distance, because in the next version the algorithm used with pointsource_distance = ? may change again.

## concurrent_tasks parameter¶

There is a last parameter which is worthy of mention, because of its effect on the memory occupation in the risk calculators and in the event based hazard calculator.

- concurrent_tasks:
- This is a parameter that you should not set, since in most cases the engine will figure out the correct value to use. However, in some cases, you may be forced to set it. Typically this happens in event based calculations, when computing the ground motion fields. If you run out of memory, increasing this parameter will help, since the engine will produce smaller tasks. Another case when it may help is when computing hazard statistics with lots of sites and realizations, since by increasing this parameter the tasks will contain less sites.

Notice that if the number of `concurrent_tasks`

is too big the
performance will get worse and the data transfer will increase: at a
certain point the calculation will run out of memory. I have seen this
to happen when generating tens of thousands of tasks. Again, it is
best not to touch this parameter unless you know what you are doing.