Release notes v3.25#

Version 3.25 is the culmination of 3 months of work involving over 320 pull requests. It is propedeutic to the 2026 release of the Global Hazard and Risk Models; as such, it focuses on improving performance and reducing memory usage, as well as supporting the latest science. Users valuing stability may want to stay with the LTS release instead (currently at version 3.23.4).

The complete set of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.25/debian/changelog

A summary is given below.

Global Stochastic Event Set#

This release features a major shift in the way the Global Risk Model is computed: instead of starting from Ground Motion Fields, we will start from sets of ruptures (regional SES): ground motion fields will be computed on the fly and never stored, thus avoiding inefficiencies due to saving and reading large amounts of data. This strategy has been available in the engine for nearly 10 years by specifying calculation_mode = ebrisk, but now it is the default, so the ebrisk calculator has been removed.

Since SES are so crucial now, we improved the script generating the global SES file and now we store information about the sources, so that it is possible to determine the name of the source that generated a given rupture.

We also store the calculation parameters in an oqparam dataset, so that it is possible to read the generated HDF5 file as a regular datastore, although without an associated calculation ID.

We note that the ruptures are not filtered when they are generated, as it was in past versions of the engine and therefore you will see all the possible ruptures generated by the sources over the minimum magnitude, even at locations far away from the given sites; however, they will be filtered and discarded when performing the calculation.

This is convenient when running calculations on a region containing multiple countries, because all countries will see the same ruptures/events and comparisons will become possible.

We made it possible to run multiple calculations starting from the same SES with a single command, as in the following example:

$ oq engine --run job_Laos.ini job_Brunei.ini job_Malaysia.ini \
                  job_Cambodia.ini job_Myanmar.ini job_Singapore.ini \
                  job_Philippines.ini job_Thailand.ini \
                  job_Timor_Leste.ini job_Indonesia.ini \
                  job_Vietnam.ini --hc SES_Southeast_Asia.hdf5

The risk calculations will be run in parallel if the --multi flag is passed, sequentially otherwise.

Notice that the provisional syntax recognized in the job.ini file

rupture_model_file = SES.hdf5

has been removed in favor of using --hc, since it avoids a special case and makes calculations starting from a SES regular calculations.

Not only has --hc been extended to accept an HDF5 file instead of a calculation ID, now it also accepts a job.ini file and in that case it simply runs it before running the child calculation.

Hazard and risk calculations starting from a SES have been hugely optimized. The crucial achievement was an extreme optimization of the reading of the ruptures. Even if you have tens of millions of ruptures, only the ruptures around the sites of interest will be used. In previous versions of the engine, the filtering was so inefficient that the strategy was not viable, whereas now it is nearly instantaneous.

We fixed a bug in the association between event IDs and rupture IDs in the events table that caused some confusion, even though it did not affect the correctness of the results.

We extended the event based calculator to compute the GMF also on custom sites and not on;y on the grid used when building the SES. Notice that only the sites close to the ruptures are considered, and only those are associated with the global site parameters in the SES file.

We refined the estimated computational weight for the ruptures to greatly reduce the number of slow tasks, although more work on that is expected in the future.

We reduced data transfer by reading the site collection from the datastore and transferring only the filtered site IDs.

We added a consistency check: if the effective investigation time is different from that in the parent calculation, an error is raised early.

There is now a warning when trying to compute avg_gmf with too many sites, so that there is a reason when a calculation seems to hang. Also, for simplicity, we now compute avg_gmf only in the absence of a parent calculation.

Hazard: preclassical#

Before performing a classical PSHA calculation, the engine performs an analysis of the source models in the so-called preclassical phase. In this phase, the engine estimates the computational weight of the sources, which is crucial for the performance of the next step of the calculation. The estimate was very heuristic and failed in a few cases, resulting in slow tasks that degraded performance.

Now the computational weight is directly proportional to the number of contexts generated by each source. This is simpler and much better than before (particularly for point sources), resulting in a huge reduction in the slow-task issue.

The speedup depends very much on the model and how many cores you have: the more cores, the worse the slow-task issue can be. For instance, for the USA model we are now 2.5 times faster than in version 3.24 on a machine with 192 cores; in other models or with fewer cores, the improvement can be insignificant.

We changed the way the CompositeSourceModel is stored and read from the datastore: this is an implementation detail, but it has an effect on performance because it allows the classical calculator to read the sources directly from disk instead of transferring them in a less efficient way.

In some models, the new preclassical phase is much faster than before because we moved the gridding of the point sources from the master node to the workers; i.e., it is parallelized and up to N times faster if you have N cores. The actual speedup may vary since the new parallelization strategy used in the preclassical phase is based on the spawning of subtasks and cannot be easily predicted.

Setting complex_fault_mesh_spacing in the job.ini is now mandatory in calculations with complex fault sources. Before, it was optional: when missing, the engine used the default for rupture_mesh_spacing (5 km), which is too small, resulting in calculations up to 10–20 times slower than necessary.

Hazard: classical#

There was a major change in the task distribution strategy also in the classical phase of the computation. In most cases, the engine generates more tasks than before, since it uses an advanced subtask strategy: if a task is taking too long to complete, it is automatically split into subtasks. There is an exception for multifault sources: they are collected in tasks without subtasks, in order to reduce the memory consumption in the distance (dparam) cache.

The task splitting is generally fully automatic, but the user can tune it by using the split_time parameter in the job.ini. It should never be necessary to change it, and currently it should be considered an internal parameter: it could change or disappear in the future.

In order to support the USA model, we started supporting region-dependent GSIM logic trees in version 3.24. The feature is called internally “ilabel”, since you can enable it by adding an ilabel column in the site model file, with an integer that is referenced in the site_labels dictionary in the job.ini file, an example being:

site_labels = {"Cascadia": 1, "LosAngeles": 2}

The feature is documented in the section “Site-dependent logic trees” of the manual. It was experimental in version 3.24 and had a restricted range of validity, whereas now it should work in all cases, including disaggregation calculations. However, region-dependent logic trees are ignored in event-based calculations: only the default logic tree is used, and there are no plans to extend the feature.

Supporting the USA model also required changing the way the engine manages cluster sources: now all cluster groups are managed together, and since there are around 400 groups, we require approximately 400 times less memory than before.

More generally, we now avoid keeping large arrays (the so-called RateMaps) in memory when not needed. That reduced memory consumption even in the absence of cluster sources. Thanks to that, we were able to remove the auto-tiling functionality that was used to reduce memory usage. You can still use full tiling, but you have to specify tiling=true in the job.ini explicitly.

We changed the algorithm for saving rates in large computations: now the temporary files (there is one for each relevant group and IMT) are stored in the calculation directory $OQDATA/calc_XXX and not in custom_tmp. As a consequence, in SLURM clusters there is no need to configure a scratch directory anymore.

Classical calculations now also work on servers without a graphical display, since the environment variable MPLBACKEND="Agg" is automatically set when generating PNG images.

Hazard: postclassical#

Parallelization in the postclassical phase, where hazard curves/maps and their statistics are generated, has been significantly improved by setting the number of generated tasks equal to the number of available cores. Also, the “combine pmaps” operation has been optimized, and now the entire postclassical phase, in the most common case of computing the means, is dominated purely by the time spent reading rates from disk. Computing the quantiles is significantly slower and more memory intensive since it requires computing all realizations for each site.

We now gzip the rates dataset in the datastore, thus reducing disk space usage by approximately a factor of 3 and reducing the time required to read the rates by a similar amount.

A minor optimization is that we now avoid initializing the logic tree twice when using the --hc option to re-run the postclassical phase.

When starting a postprocessing calculation with --hc, the flag use_rates was not being honored; i.e., only the value set in the parent calculation was considered. This has been fixed.

We added a new exporter hmaps-stats producing one file per return period and statistic, thus avoiding nested fields in the CSV header. This is convenient for building the Global Hazard Map.

Calculations with few sites#

We introduced the possibility of specifying in the job.ini file a siteid parameter associated with the sites parameter, used to give short (up to 8-character) names to the coordinates. This is similar to specifying a custom_site_id column in the site model file; indeed, the siteid ends up inside the custom_site_id field of the site collection.

siteid strings are restricted to the URL-safe base64 alphabet, so that they can be used in web applications. The convenience is that it is sufficient to list the coordinates and the site parameters will be automatically associated from the site model file or from the parameters of the parent calculation, if any.

hazardlib: general#

There was some general infrastructure work on the GSIM classes at the level of the MetaGSIM class. Now each GMPE instance has an attribute ._toml that is automatically set and used to compute the hash of the GMPE. This means that all GMPEs are hashable; therefore, they can be used as keys in dictionaries and can be cached. Previously, we relied on the user remembering to set the _toml attribute or to call the valid.gsim factory function.

We changed the implementation of interpolated tables keyed by GSIM, magnitude, and IMT, used in subclasses of GMPETable. Before, the interpolated tables were computed in the __init__ method; however, that caused problems because the interpolated magnitudes were hard to pass correctly to the underlying GMPEs (in the case of advanced GMPEs). Now the interpolation happens in the compute method (i.e., in the workers, not in the master node), but it is still performed only once because it is cached.

Thanks to this change, the GMPE GmpeIndirectAvgSA now works when the underlying GMPE is a GMPETable subclass. We also fixed the edge case where the job.ini file does not contain the IMT AvgSA.

The change also fixed a number of bugs in Conditional Ground Motion Models, which are ModifiableGMPE instances depending on a dictionary of underlying GMPEs keyed by Intensity Measure Type. They should now work with all kinds of underlying GMPEs.

As a consequence of the change, we now have a general mechanism for managing GSIM class warnings that guarantees they are displayed only once, even if the GSIM is instantiated multiple times.

We removed from hazardlib dozens of calls to super().__init__() since now there is no need to call it in the vast majority of cases, resulting in less coupling and simpler code.

We added a method CompositeSourceModel.set_msparams to properly initialize multifault sources after reading the source models from hazardlib (this is expensive and needed only for classical analyses, not for event-based ones).

We added a method CompositeSourceModel.get_cmakers returning a ContextMakerSequence object that can be used to implement custom versions of the classical PSHA calculator.

We extended CompositeSourceModel.get_sources to accept an smr index so that advanced users can perform analysis one source model at a time.

Both new methods and the smr index are documented in the manual, in the section “Reading the hazard sources programmatically”. Moreover, we extended the documentation about implementing advanced GSIMs, including those using Machine Learning models.

Internally, all the logic about the CompositeSourceModel has been moved into a new module openquake.hazardlib.source_group.

We added a method SiteCollection.lower_res to reduce the resolution of a site collection by using Uber’s h3 library; this is used by the engine when prefiltering sources and ruptures.

We fixed a bug in the conversion to geometric mean, which was applied to all IMTs, including unsupported ones, causing ZeroDivision errors.

Sources with a NegativeBinomialTOM temporal occurrence model (used in the New Zealand model) are now treated differently, allowing for a simplification; however, the user will not see any significant change in the results or performance.

As a consequence of the simplification, the class method ContextMaker.from_srcs(sources, sitecol) now returns a context array instead of a list of context arrays.

Finally, after several years of deprecation, we removed the method get_mean_and_stddevs from all GMPEs. If you still have code calling the old method, you should replace it with the function contexts.mean_stds (rup_ctx, gsim, imt, idx) following the examples in https://github.com/gem/oq-engine/pull/11194/changes.

hazardlib: new GMPEs and fixes#

The Abrahamson & Bhasin (2020) conditional GMPE was implemented by Lana Todorovic.

The EMME24 site model was added to the existing EMME backbone model for the Middle East by Christopher Brooks, using files shared by Abdullah Sandıkkaya, Özkan Kale and Baran Güryuva.

Christopher Brooks added the option to use a modified form of the Campbell and Bozorgnia (2014) GMM sigma model within the Kuehn et al. (2020) GMMs, as required for the 2023 USGS Alaska model. He also added the USGS Alaska bias adjustment for NGA-SUB interface GMMs.

We fixed a small bug in the Hashash et al. (2020) site term implementation within the NGA East models and regenerated the test tables (very small differences are observed and only for SA(0.4)).

We fixed an error in the GMPEs computing lateral spread displacements, i.e., Youd et al. (2002) and Zhang and Zhao (2005).

Moreover, we received several contributions from our community.

Yen Shin Chen contributed several utilities for Probabilistic Fault Displacement Hazard Analysis (PFDHA).

Ji Kun contributed a GMPE for the Azores islands.

Maoxin Wang contributed ground-motion models for Turkey, for Arias Intensity, Cumulative Absolute velocity, and Significant Durations.

Amirhossein Mohammadi contributed the GMPE Mohammadi2023Turkiye, based on a Machine Learning model.

Nicholas Clemett contributed his correlation models.

Antonio Scala contributed three GMPEs for Campi Flegrei in Italy.

Risk#

For the sake of the Global Risk Model, and since it is useful in general, we added a new output “Average Losses By Taxonomy” (avg_losses_by) aggregating average losses. If the exposure contains a MACRO_TAXONOMY field, it also aggregates by it, meaning that the exporter will produce two files: avg_losses_by_-taxonomy.csv and avg_losses_by_-MACRO_TAXONOMY.csv.

Scenario calculations have been changed to consider only the sites with assets around the rupture (within the maximum distance), except for conditioned scenarios. Previously, the full site collection was considered, thus requiring more memory and computation time.

We extended the minimum_intensity feature to also work for secondary perils. For instance, setting in the job.ini

minimum_intensity = {'LiqProb': .02, 'LSE': .001}

will discard liquefaction probabilities below 2% and liquefaction spatial extent below 0.1%. This makes secondary peril calculations faster and requires less disk space.

In infrastructure calculations, we turned the hard-coded parameter max_nodes_network into a job.ini parameter with a default value of 1000.

If the site parameters are more distant than ASSOC_DIST=8 km from the sites, we now raise an error instead of a warning.

For scenario_risk calculations with quantiles and a single GSIM, the output “Aggregate Risk Statistics” was not visible. This is now fixed.

We fixed a bug in classical_risk in the presence of nontrivial weights in the taxonomy mapping file, reported by Lisa Jusufi on the OpenQuake mailing list.

oq commands#

The internal command oq run has been changed to support workflow files. These are TOML files describing multiple calculations that should be performed together. This is essential for the 2026 Global Hazard Model and Global Risk Models since we want to be able to run all the hazard models in the mosaic or compute risk profiles for all the countries in the world using a single configuration file. The format is still experimental and internal, but it is expected to become official in the near future.

oq run also accepts a --cache flag: when set to true, calculations that have already been performed are not repeated. To determine whether a calculation has already been performed, the engine looks at the checksum of the input files stored in the database. The feature is NOT enabled by default since it is potentially dangerous: for instance, a bugfix to a GMPE is a change in the code, not in the input files, so using --cache=true would retrieve old (incorrect) results. Therefore, the cache must be enabled manually only when the user knows that there have been no significant changes to the code. This is essential for workflow ergonomics in the presence of errors, since you can easily relaunch a workflow without having to repeat successful calculations.

We extended oq shell to accept dotted names, making it easy to call Python modules as scripts. For example:

$ oq shell openquake.engine.global_ses --help

to generate the global Stochastic Event Set.

It is now valid to pass a “prejob.ini” file to --hazard-calculation-id, rather than simply an integer ID. Thus, a command like

$ oq engine --run job.ini --hc prejob.ini

will perform two calculations: first the one corresponding to prejob.ini, and then the one corresponding to job.ini, starting from the previous one.

The old syntax

$ oq engine --run prejob.ini job.ini

still works, but it is deprecated, and in the future only the explicit syntax with --hc may be accepted.

We extended the list of calculations generated by --list-hazard-calculations and --list-risk-calculations to show the calculation_mode as well.

The command oq purge has been extended to remove failed calculations, old calculations, or orphan calculations (i.e., calc_XXX.hdf5 files not referenced in the database).

Finally, oq plot has been optimized by avoiding a costly buffer around mosaic/country geometries. We also improved the plotting of event-based ruptures in calculations starting from an SES.hdf5 file.

WebUI#

There were fixes to the navigation bar, which was not properly displaying some elements in TOOLS_ONLY mode.

We added filtering functionality to the page displaying calculations. For instance, calling the URL https://hostname/v1/calc/list?user_name_like=%test% will return only calculations of users containing “test” in their name (internally performing a LIKE query in the database).

When exporting multiple files as a single archive, the engine was littering the temporary directory with zip files. This is now fixed. Also, only the specified custom_tmp directory is used, as intended.

We refactored the WebUI JavaScript and CSS code, separating the logic into multiple files and adding integration tests written in the Playwright framework.

We added an endpoint v1/calc/validate_ini to validate a local .ini file, to be used in the context of the PAPERS project.

We added a <calc_id>/extract/exposure_by_location endpoint to extract exposure aggregated by asset locations.

Dozens of pull requests were made in the engine codebase to support the AELO and OQImpact platforms. However, since those are private platforms, the related improvements are not listed here.

IT#

In this release, we dropped support for Python 3.10 and added support for Python 3.13. That included upgrading NumPy to version 2 and all NumPy-dependent libraries (including geospatial libraries). We upgraded pyproj to 3.7.1 to address a CRS-related crash in 3.6.1.

We removed several warnings caused by the library upgrade, although some still remain.

We now set PYTHONUTF8 mode on Windows to avoid possible encoding errors.

Database migrations are now automatically performed without requiring user confirmation.

We fixed a couple of issues in install.py. When installing with a flag like --version=3.23 (without the patch number), we now install the latest available patch from PyPI instead of the first patch release(3.23.0). When installing from a branch (i.e., with --version=engine-3.23), we now extract the requirements from that branch rather than from master.

We changed install.py to install the version-specific demos rather than the master demos.

A user reported that the LOCKDOWN option in Docker (when running docker run -e LOCKDOWN=True openquake/engine) was not honored. This has been fixed.

Some important .txt files in the GSIMs were not distributed in the packaged version of the engine. They are now included. The same applies to the .onnx and .onnx.gz files required by some models. Moreover, the Python package for the engine now includes the demos.