Bad configuration parameters#

By far, the most common source of problems with the engine is the choice of parameters in the job.ini file. It is very easy to make mistakes, because users typically copy the parameters from the OpenQuake engine demos. However, the demos are meant to show off all of the features of the engine in simple calculations, they are not meant for getting performance in large calculations.

The quadratic parameters#

In large calculations, it is essential to tune a few parameters that are really important. Here is a list of parameters relevant for all calculators:

maximum_distance:

The larger the maximum_distance, the more sources and ruptures will be considered; the effect is quadratic, i.e. a calculation with maximum_distance=500 km could take up to 6.25 times more time than a calculation with maximum_distance=200 km.

region_grid_spacing:

The hazard sites can be specified by giving a region and a grid step. Clearly the size of the computation is quadratic with the inverse grid step: a calculation with region_grid_spacing=1 will be up to 100 times slower than a computation with region_grid_spacing=10.

area_source_discretization:

Area sources are converted into point sources, by splitting the area region into a grid of points. The area_source_discretization (in km) is the step of the grid. The computation time is inversely proportional to the square of the discretization step, i.e. calculation with area_source_discretization=5 will take up to four times more time than a calculation with area_source_discretization=10.

rupture_mesh_spacing:

Fault sources are computed by converting the geometry of the fault into a mesh of points; the rupture_mesh_spacing is the parameter determining the size of the mesh. The computation time is quadratic with the inverse mesh spacing. Using a rupture_mesh_spacing=2 instead of rupture_mesh_spacing=5 will make your calculation up to 6.25 times slower. Be warned that the engine may complain if the rupture_mesh_spacing is too large.

complex_fault_mesh_spacing:

The same as the rupture_mesh_spacing, but for complex fault sources. If not specified, the value of rupture_mesh_spacing will be used. This is a common cause of problems; if you have performance issue you should consider using a larger complex_fault_mesh_spacing. For instance, if you use a rupture_mesh_spacing=2 for simple fault sources but complex_fault_mesh_spacing=10 for complex fault sources, your computation can become up to 25 times faster, assuming the complex fault sources are dominating the computation time.

Maximum distance#

The engine gives users a lot of control on the maximum distance parameter. For instance, you can have a different maximum distance depending on the tectonic region, like in the following example:

maximum_distance = {'Active Shallow Crust': 200, 'Subduction': 500}

You can also have a magnitude-dependent maximum distance:

maximum_distance = [(5, 0), (6, 100), (7, 200), (8, 300)]

In this case, given a site, the engine will completely discard ruptures with magnitude below 5, keep ruptures up to 100 km for magnitudes between 5 and 6 (the maximum distance in this magnitude range will vary linearly between 0 and 100), keep ruptures up to 200 km for magnitudes between 6 and 7 (with maximum_distance increasing linearly from 100 to 200 km from magnitude 6 to magnitude 7), keep ruptures up to 300 km for magnitudes between 7 and 8 (with maximum_distance increasing linearly from 200 to 300 km from magnitude 7 to magnitude 8) and discard ruptures for magnitudes over 8.

You can have both trt-dependent and mag-dependent maximum distance:

maximum_distance = {
   'Active Shallow Crust': [(5, 0), (6, 100), (7, 200), (8, 300)],
   'Subduction': [(6.5, 300), (9, 500)]}

Given a rupture with tectonic region type trt and magnitude mag, the engine will ignore all sites over the maximum distance md(trt, mag). The precise value is given via linear interpolation of the values listed in the job.ini; you can determine the distance as follows:

from openquake.hazardlib.calc.filters import IntegrationDistance
idist = IntegrationDistance.new('[(4, 0), (6, 100), (7, 200), (8.5, 300)]')
interp = idist('TRT')
interp([4.5, 5.5, 6.5, 7.5, 8])
array([ 25.        ,  75.        , 150.        , 233.33333333,
       266.66666667])

pointsource_distance#

PointSources (and MultiPointSources and AreaSources, which are split into PointSources and therefore are effectively the same thing) are not pointwise for the engine: they actually generate ruptures with rectangular surfaces which size is determined by the magnitude scaling relationship. The geometry and position of such rectangles depends on the hypocenter distribution and the nodal plane distribution of the point source, which are used to model the uncertainties on the hypocenter location and on the orientation of the underlying ruptures.

Is the effect of the hypocenter/nodal planes distributions relevant? Not always: in particular, if you are interested in points that are far away from the rupture the effect is minimal. So if you have a nodal plane distribution with 20 planes and a hypocenter distribution with 5 hypocenters, the engine will consider 20 x 5 ruptures and perform 100 times more calculations than needed, since at large distance the hazard will be more or less the same for each rupture.

To avoid this performance problem there is a pointsource_distance parameter: you can set it in the job.ini as a dictionary (tectonic region type -> distance in km) or as a scalar (in that case it is converted into a dictionary {"default": distance} and the same distance is used for all TRTs). For sites that are more distant than the pointsource_distance plus the rupture radius from the point source, the engine creates an average rupture by taking weighted means of the parameters strike, dip, rake and depth from the nodal plane and hypocenter distributions and by rescaling the occurrence rate. For closer points, all the original ruptures are considered. This approximation (we call it rupture collapsing because it essentially reduces the number of ruptures) can give a substantial speedup if the model is dominated by PointSources and there are several nodal planes/hypocenters in the distribution. In some situations it also makes sense to set pointsource_distance = 0 to completely remove the nodal plane/hypocenter distributions. For instance the Indonesia model has 20 nodal planes for each point sources; however such model uses the so-called equivalent distance approximation which considers the point sources to be really pointwise. In this case the contribution to the hazard is totally independent from the nodal plane and by using pointsource_distance = 0 one can get exactly the same numbers and run the model in 1 hour instead of 20 hours. Actually, starting from engine 3.3 the engine is smart enough to recognize the cases where the equivalent distance approximation is used and automatically set pointsource_distance = 0.

Even if you not using the equivalent distance approximation, the effect of the nodal plane/hypocenter distribution can be negligible: I have seen cases when setting pointsource_distance = 0 changed the result in the hazard maps only by 0.1% and gained an order of magnitude of speedup. You have to check on a case by case basis.

There is a good example of use of the pointsource_distance in the MultiPointClassicalPSHA demo. Here we will just show a plot displaying the hazard curve without pointsource_distance (with ID=-2) and with pointsource_distance=200 km (with ID=-1). As you see they are nearly identical but the second calculation is ten times faster.

../../_images/mp-demo.png

The pointsource_distance is also crucial when using the point source gridding approximation: then it can be used to speedup calculations even when the nodal plane and hypocenter distributions are trivial and no speedup would be expected.

NB: the pointsource_distance approximation has changed a lot across engine releases and you should not expect it to give always the same results. In particular, in engine 3.8 it has been extended to take into account the fact that small magnitudes will have a smaller collapse distance. For instance, if you set pointsource_distance=100, the engine will collapse the ruptures over 100 km for the maximum magnitude, but for lower magnitudes the engine will consider a (much) shorter collapse distance and will collapse a lot more ruptures. This is possible because given a tectonic region type the engine knows all the GMPEs associated to that tectonic region and can compute an upper limit for the maximum intensity generated by a rupture at any distance. Then it can invert the curve and given the magnitude and the maximum intensity can determine the collapse distance for that magnitude.

In engine 3.11, contrarily to all previous releases, finite side effects are not ignored for distance sites, they are simply averaged over. This gives a better precision. In some case (i.e. the Alaska model) versions of the engine before 3.11 could give a completely wrong hazard on some sites. This is now fixed.

Note: setting pointsource_distance=0 does not completely remove finite size effects. If you want to replace point sources with points you need to also change the magnitude-scaling relationship to PointMSR. Then the area of the underlying planar ruptures will be set to 1E-4 squared km and the ruptures will effectively become points.

The linear parameters: width_of_mfd_bin and intensity levels#

The number of ruptures generated by the engine is controlled by the parameter width_of_mfd_bin; for instance if you raise it from 0.1 to 0.2 you will reduce by half the number of ruptures and double the speed of the calculation. It is a linear parameter, at least approximately. Classical calculations are also roughly linear in the number of intensity measure types and levels. A common mistake is to use too many levels. For instance a configuration like the following one:

intensity_measure_types_and_levels = {
  "PGA":  logscale(0.001,4.0, 100),
  "SA(0.3)":  logscale(0.001,4.0, 100),
  "SA(1.0)":  logscale(0.001,4.0, 100)}

requires computing the PoEs on 300 levels. Is that really what the user wants? It could very well be that using only 20 levels per each intensity measure type produces good enough results, while potentially reducing the computation time by a factor of 5.

concurrent_tasks parameter#

There is a last parameter which is worthy of mention, because of its effect on the memory occupation in the risk calculators and in the event based hazard calculator.

concurrent_tasks:

This is a parameter that you should not set, since in most cases the engine will figure out the correct value to use. However, in some cases, you may be forced to set it. Typically this happens in event based calculations, when computing the ground motion fields. If you run out of memory, increasing this parameter will help, since the engine will produce smaller tasks. Another case when it may help is when computing hazard statistics with lots of sites and realizations, since by increasing this parameter the tasks will contain less sites.

Notice that if the number of concurrent_tasks is too big the performance will get worse and the data transfer will increase: at a certain point the calculation will run out of memory. I have seen this to happen when generating tens of thousands of tasks. Again, it is best not to touch this parameter unless you know what you are doing.

Rupture sampling: how to get it wrong#

Rupture samplings is much more complex than one could expect and in many respects surprising. In the many years of existence of the engine, multiple approached were tried and you can expect the details of the rupture sampling mechanism to be different nearly at every version of the engine.

Here we will discuss some tricky points that may help you understand why different versions of the engine may give different results and also why the comparison between the engine and other software performing rupture sampling is nontrivial.

We will start with the first subtlety, the interaction between sampling and filtering. The short version is that you should first sample and then filter.

Here is the long version. Consider the following code emulating rupture sampling for poissonian ruptures:

import numpy

class FakeRupture:
    def __init__(self, mag, rate):
        self.mag = mag
        self.rate = rate

def calc_n_occ(ruptures, eff_time, seed):
    rates = numpy.array([rup.rate for rup in ruptures])
    return numpy.random.default_rng(seed).poisson(rates * eff_time)

mag_rates = [(5.0, 1e-5), (5.1, 2e-5), (5.2, 1e-5), (5.3, 2e-5),
             (5.4, 1e-5), (5.5, 2e-5), (5.6, 1e-5), (5.7, 2e-5)]
fake_ruptures = [FakeRupture(mag, rate) for mag, rate in mag_rates]
eff_time = 50 * 10_000
seed = 42

Running this code will give you the following numbers of occurrence for the 8 ruptures considered:

>> calc_n_occ(fake_ruptures, eff_time, seed)
[ 8  9  6 13  7  6  6 10]

Here we did not consider the fact that engine has a minimum_magnitude feature and it is able to discard ruptures below the minimum magnitude. But how should it work? The natural approach to follow, for performance-oriented applications, would be to first discard the low magnitudes and then perform the sampling. However, that would have effects that would be surprising for most users. Consider the following two alternative:

def calc_n_occ_after_filtering(ruptures, eff_time, seed, min_mag):
    mags = numpy.array([rup.mag for rup in ruptures])
    rates = numpy.array([rup.rate for rup in ruptures])
    return numpy.random.default_rng(seed).poisson(
        rates[mags >= min_mag] * eff_time)

def calc_n_occ_before_filtering(ruptures, eff_time, seed, min_mag):
    mags = numpy.array([rup.mag for rup in ruptures])
    rates = numpy.array([rup.rate for rup in ruptures])
    n_occ = numpy.random.default_rng(seed).poisson(rates * eff_time)
    return n_occ[mags >= min_mag]

Most users would expect that removing a little number of ruptures has a little effect; for instance, if we set min_mag = 5.1 such that only the first rupture is removed from the total 8 ruptures, we would expect a minor change. However, if we follow the filter-early approach the user would get completely different occupation numbers:

>> calc_n_occ_after_filtering(fake_ruptures, eff_time, seed, min_mag)
[13  6  9  6 13  7  6]

It is only by using the filter-late approach that the occupation numbers are consistent with the no-filtering case:

>> calc_n_occ_before_filtering(fake_ruptures, eff_time, seed, min_mag)
[ 9  6 13  7  6  6 10]

The problem with the filtering is absolutely general and not restricted only to the magnitude filtering: it is exactly the same also for distance filtering. Suppose you have a maximum_distance of 300 km and than you decide that you want to increase it to 301 km. One would expect this change to have a minor impact; instead, you may end up sampling a very different set of ruptures.

It is true that average quantities like the hazard curves obtained from the ground motion fields will converge for long enough effective time, however in practice you are always in situations were

  1. you cannot perform the calculation for a long enough effective time since it would be computationally prohibitive

  2. you are interested on quantities which are strongly sensitive to aany change, like the Maximum Probable Loss at some return period

In such situations changing the site collection (or changing the maximum distance which is akin to changing the site collection) can change the sampling of the ruptures significantly, at least for engine versions lower than 3.17.

Users wanting to compare the GMFs or the risk on different site collections should be aware of this effect; the solution is to first sample the ruptures without setting any site collection (i.e. disabling the filtering) and then perform the calculation with the site collection starting from the sampled ruptures.