Release notes v3.21#

Version 3.21 is the culmination of 4 months of work involving over 240 pull requests. It is the release featuring the greatest performance improvements of the last few years. It is aimed at users wanting the latest features, bug fixes and maximum performance. Users valuing stability may want to stay with the LTS release instead (currently at version 3.16.8).

The complete set of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.21/debian/changelog

A summary is given below.

Classical calculations#

We significantly reduced the memory consumption in classical calculations: now all the models in the GEM mosaic run with less than 2 GB per core (previously we required 4 GB per core). This result was mostly achieved by reducing the precision of the PoEs to 32 bit instead of 64 bit, which required a major refactoring and working in terms of rates, since rates are much less subject to numerical errors than PoEs.

The other mechanism to reduce the memory consumption was changing the implementation of CollapsedPointSources to use arrays rather than plain source objects. To give a concrete example, in the case of the USA model that reduced the memory consumption (when using the tiling strategy) from about 3 GB per core to just 1.2 GB per core.

Moreover, CollapsedPointSources have been extended to collect sources with different magnitude-scaling-relationships, which also makes the engine faster, however the change affects only a few models with minor differences.

Thanks to the memory reduction and other optimizations, like refining the size of the arrays when computing the mean and standard deviations from the GMPEs, or optimizing the procedure updating the probabilities of no exceedency, now 3x faster, the performance of classical calculations improved substantially. For instance, the model for EUR is nearly 2x faster than before, the model for Canada is 1.4x faster, while the model for USA is 1.25x faster on a machine with 128 cores.

We added a tiling flag in the job.ini file to make it possible to specify which version of the classical calculator to use (tiling or regular); if the flag is not specified, the engine will continue to autonomously select the most appropriate calculator, however with a different logic than in the past.

We optimized the logic tree processor so that it is now possible to compute mean rates exactly (if use_rates is set) even with millions of realizations. In particular we managed to run the EUR model with 302,990,625 realizations, but it required over 100 GB of RAM.

The splitting of the sources in groups now works slightly differently and more groups can be generated sometimes, but there is an additional guarantee: the sources in a same group will all have the same temporal occurrence model. This allowed some internal simplification.

The –sample-sources feature now works differently (by tiling) and it is faster (measured 4x faster for the USA model) and more reliable than before.

Finally, logging the progress of the calculation now works much better (previously often the progress went from 0 to 50% abruptly and then from 50% to 100% equally abruptly).

Event based/scenario calculations#

In event based calculations we were able to spectacularly reduce the memory consumption in situations with millions of sites. For instance, whereas previously a calculation with 6 million sites required over 80 GB per core and it inevitable ran out of memory, now it requires less than 2 GB per core and it completes correctly.

Moreover, the performance improved tremendously in the case of many GMPEs, like for EUR, where there are up to 19 GMPEs per tectonic region type. The reason is that some calculations were needlessly repeated in the GmfComputer. Now generating the GMFs in the EUR model is 13 times faster than before. In models with only 2 or 3 GMPEs the improvement is less visible, but still there.

We decided to make the parameters minimum_magnitude and minimum_intensity mandatory, except in toy calculations. The reason is that users often forget to set such parameters and as a consequence their calculations become extremely slow or run out of memory after many hours. Now they will get a clear error message even before starting the calculation.

If the custom_site_id field is missing in the site model, we are now generating it automatically as a geohash, since it is extremely useful when comparing and debugging GMFs.

Some users wanted to be able to access intermediate results (in particular the GMPE-computed mean and the standard deviations tau and phi) for each site, rupture, GMPE and intensity measure type. This is now possible by setting the flag

mea_tau_phi = true

in the job.ini file. Please notice that setting the flag will double the disk space occupation and make the calculation slower, so you are advised to enable this feature only for small calculations for debugging purposes. It will work both for scenarios and event based calculations.

We changed the scenario calculator to discard sites over the integration distance from the rupture, for consistency with the event based calculator and for consistency with the algorithm used in Aristotle calculations.

hazardlib#

Enrico Abcede and Francis Bernales extended the Campbell and Bozorgnia (2014) GMPEs to work with the intensity measure types IA and CAV.

Fatemeh Alishahiha contributed a couple of GMPEs: one with the Zafarani et al (2018) and one with the Ambraseys (2005), relevant for Iran.

Chris di Caprio contributed a bug fix for the Kuehn (2020) GMPE, in the RegularGridInterpolator, that must use extrapolation to handle floating point edge cases.

Eric Thompson pointed a typo in the aliases for the Kuehn et al GMPE (“KuehnEtAl2021SInter” in place of “KuehnEtAl2020SInter”) and missing aliases for “KuehnEtAl2020SSlab”. It has been fixed now.

Kyle Smith contributed the Arias Intensity and Cumulative Absolute Velocity ground motion models from Sandikkaya and Akkar (2017).

Nicolas Schmid fixed an incorrect formula used in ShakeMap calculations (see https://github.com/gem/oq-engine/pull/9890).

Marco Pagani extended the Chiou Youngs (2014) GMPEs to optionally include the modifications by Boore Et Al. (2022).

We added a helper function valid.modified_gsim to instantiate ModifiableGMPE objects with less effort, for usage in the GEESE project, You can see an example in https://docs.openquake.org/oq-engine/master/manual/user-guide/inputs/ground-motion-models-inputs.html#modifiablegmpe.

Since the beginning hazardlib has been affected by a rather annoying issue: pollution by large CSV files. CSV files are required by the GMPE tests to store the expected mean and standard deviations computed from a set of input parameters like magnitude, distance and vs30.

Unfortunately people tended to be liberal with the size of the CSV files, including files larger than needed, and we ended up with 480 MB of expected data. The sheer amount of data slows down the download of the repository, even with the shallow clone, and makes the tests slower than needed.

We were able to cut down the CSV files by two thirds without losing any test coverage. Moreover, we added a check forbidding CSV files larger than 600k to keep the problem from reappearing in the near future.

AELO project#

Work on the AELO project continued with various AELO year 3 updates. We can now run all available IMTs (before we considered only 3 IMTs) and use the ASCE7-22 parameters (see https://github.com/gem/oq-engine/pull/9648).

At the level of the Web Interface it is now possible to choose the desired asce_version between ASCE 16 and ASCE 41. The version used to perform the calculation is then displayed in the outputs page.

We began the work for supporting vs30 values different from 760 m/s by implementing the ASCE soil classes.

We added exporters for the Maximum Considered Earthquake and for the ASCE07 and ASCE41 results.

The geolocate functionality has been extended to support multipolygons, which are used to model geometries crossing the International Date Line.

As a consequence, a point exactly in the middle of two mosaic models can be associated differently than before.

Aristotle project#

We continued the work on the Aristotle project.

Given a single file exposure.hdf5 containing the global exposure, global vulnerability functions, global taxonomy mappings and global site model, and a rupture file or ShakeMap event id, an Aristotle calculation is able to compute the seismic risk in any place in the world, apart from the oceans.

In this release, the procedure to generate the exposure.hdf5 file from the global risk mosaic has been improved and the taxonomy mappings for Canada and USA, which were missing, have been added.

Moreover the Aristotle web interface has been much improved. It is possible for the user to download from the USGS site not only rupture files but also station files, thus taking advantage of the Conditioned GMFs calculator. Custom rupture files or station files can also be uploaded if the user has them locally. Errors in the station files are properly notified, just as errors in the rupture geometries (not all USGS ruptures can be converted to OpenQuake ruptures yet).

When the rupture.json file is not available on the USGS page for an event, the software checks if the finite-fault rupture is available and it loads the relevant information from it instead. Note that this fallback approach may also fail, especially in the early stage right after an event, when not enough information has been uploaded to the USGS website yet. In this case, in order to proceed, the user needs to upload the rupture data via the corresponding web form.

We improved the procedure for generating hazardlib ruptures, automatically setting the dip and strike parameters when possible. In case of errors in the geometry a planar rupture with the given magnitude, hypocenter dip and strike is automatically generated.

The visualization of the input form and the output tables has improved.

It is now possible to specify the parameter maximum_distance_stations, either from command line or from the WebUi. The maximum_distance was already there, but now has a default of 100 km to ensure faster calculations.

We added a dropdown menu so that the user can override the time_event parameter, which by default is automatically set depending on the local time of occurrence of the earthquake.

The page displaying the outputs of a calculation now includes some additional information. On top of the page, the local time of the event is displayed, together with the indication of how much time passed between the event itself and the moment the calculation was started. This is useful for instance to compare results obtained before and after rupture or station data become available in the USGS site.

The plots showing the average ground motion fields and the assets have been improved. In particular, the latter now also display the boundaries of the rupture.

We added a method GsimLogicTree.to_node, useful to produce an XML representation of the gsim logic tree used by an Aristotle calculation.

Secondary perils#

Lana Torodović implemented the Nowicki Jessee et al. (2018) landslide model, which is part of the USGS Ground Failure Product. It is documented here: https://docs.openquake.org/oq-engine/3.21/manual/underlying-science/secondary-perils.html#nowicki-jessee-et-al-2018

When exporting the gmf_data table, we decided to include the secondary peril values, if present. Technically they are not GMFs, just fields induced by the GMFs, but it is convenient to keep them in the gmf_data table, also for visualization purposes via the QGIS plugin.

Bug fixes#

There were various fixes to the conditioned GMFs calculator:

  • we changed the Cholesky decomposition algorithm to reduce the issue of small negative eigenvalues

  • we fixed an error when using a region_grid_spacing by extending the site collection to include the stations

  • we fixed a couple of bugs in the association of the site parameters to the sites

  • we added an uniqueness check for the station coordinates

We fixed a bug with the conditional spectrum calculator in the case of calculations with non-contributing tectonic region types, i.e. when all the sources of a given tectonic region type are outside the integration distance.

Running a scenario or event based calculation starting from a CSV file of ruptures raised a KeyError: “No ‘source_mags/*’ found in ” when exporting the ruptures. It has been fixed now.

The new Japan model was failing with an error in event based calculations due to an excessive check (a leftover from the past). It has been fixed now.

The avg_gmf was failing sometimes in the presence of a filtered site collection. It has been fixed now.

In situations with more than ~90 realizations, exporting the realizations CSV was causing an encoding error. It has been fixed now.

In presence of multi-fault sources far away from the sites (i.e. over the integration distance) an error “object has no attribute msparams” was raised. It has been fixed now.

We improved the error message displayed when the user forgets to specify the exposure file in a risk calculation.

We removed the exporter for the output disagg-stats-traditional since it was incorrect, with no easy way to fix it. We kept the exporter for disagg-rlzs-traditional, which was correct.

Using half the memory on Windows#

Recent processors have significantly more threads than in the past, however the typical amount of RAM on consumer machines has not increased much. Moreover, often a significant amount of RAM is reserved to the GPU and a huge amount of memory is eaten by the browser.

Therefore, if the engine kept spawning a process for each thread as in the past, it would likely run out of memory or degrade the performance to the so-called “slow mode”, with a single thread being used.

To cope with this situation, now the engine on Windows uses by default only half of the available threads. You can change this default by setting the parameter num_cores in the [distribution] section of the file openquake.cfg in your home directory. Be warned that increasing num_cores will increase the memory consumption but not necessarily the performance.

On non-Windows platforms nothing changes, i.e. the default is still to use all available threads. The reason is that Apple silicon does not use hyperthreading (i.e. it uses half the memory compared to processors with hyperthreading) while Linux is mostly used for servers that typically have more memory than a consumer machine, so there is no pressing need to save memory.

HPC support via SLURM#

Since version 3.18 the engine has experimental support for HPC clusters using the SLURM scheduler. However, the original logic required spawning a SLURM job for each engine task, meaning that we immediately run into the limit of 300 SLURM jobs when the CEA institute graciously gave us access to their cluster.

To overcome this limitation, the approach used in engine 3.18 has been abandoned and the code completely rewritten. With the current approach there is a single SLURM job for each engine job and we can easily scale to thousand of cores by simply passing the number of desired nodes to the oq engine --run command (see https://docs.openquake.org/oq-engine/3.21/manual/getting-started/installation-instructions/slurm.html)

Notice that supporting thousands of cores required substantial work on the calculators too.

For instance, now classical calculations make use of the custom_tmp directory in openquake.cfg (mandatory in SLURM mode) to store the PoEs in small temporary files, one per task. In tiling calculations that reduces the data transfer effectively to zero, while in non-tiling calculations it only reduce it, more or less depending on the calculation. That required to change the postclassical phase of the calculation to be able to read the PoEs from the custom_tmp and only from the calc_XXX.hdf5 file.

Moreover we had to invent a mechanism to limit the data transfer in non-tiling calculations, otherwise we would still run out of memory on the master node in large HPC clusters simply by adding new nodes.

The tiling feature is now less used than before. For instance, prior to engine 3.21 the USA hazard model was executed by using the tiling strategy. While this is acceptable if you have say 128 cores, it is extremely inefficient if you have 1024 cores or more. Thus in engine 3.21 we went through great effort to avoid the tiling approach when it is inefficient: in particular the USA model with 1024 cores is 5x faster by not using tiling.

Be warned that the more cores you have, the more you are in uncharted territory and it is entirely possible that adding nodes will slow down your calculation. We recommend not to exaggerate with the number of nodes, 8-16 nodes with 128 cores each have been tested, using more nodes comes at your risk.

The SLURM support is still considered experimental and we know that there are still a few issues, like the fact that the SLURM job may not be killed automatically in case of hard out-of-memory situations and therefore must be killed manually, or the presence of slow tasks in some calculations, however it is good enough to run all the hazard models in the GEM mosaic.

Commands like oq show performance, oq workers status and oq sensitivity_analysis now work correctly in SLURM mode: the first will show the total memory used on all nodes, and not only the master node. The second will show how many workers are active for each node. The third will generate a script that will launch a SLURM job for each combination of parameters in the sensitivity analysis.

Other oq commands#

After a major effort we were able to extend the command oq plot sources? to work with all kinds of sources.

We fixed oq prepare_site_model to compute z1pt0 and z2pt5 correctly for sites in Japan, i.e. by using the regionalized NGAWest2 basin effect equations.

We fixed oq reduce_sm for calculations with nonparametric sources, which was raising an error.

We fixed oq plot_avg_gmf to work remotely.

We extended oq info to .csv files, for pretty-printing purposes.

We improved the command oq plot_assets.

We added the commands oq plot "rupture?" and oq plot "rupture_3d?".

We added add command oq compare oqparam and we improved oq compare sitecol and oq compare assetcol.

We added a debugging command oq compare rates and a oq compare asce functionality.

We changed oq run to automatically generate the db in single user mode.

Other#

There were a few cosmetic improvements to the WebUI and we added a cookie consent functionality for compliance with the European law.

For usage in the GEESE model, we added a method OqParam.to_ini which can be used to generate an .ini file equivalent to the original input file. This is useful when the calculation parameters are dynamically generated. Currently it works only for hazard calculations.

As usual we improved the documentation, in particular about how to run calculations programmatically using the pair [create_jobs/run_jobs] (https://docs.openquake.org/oq-engine/3.21/manual/contributing/developing-with-the-engine.html#running-calculations-programmatically)