Release notes v3.25
===================

Version 3.25 is the culmination of 3 months of work involving over 320
pull requests. It is propedeutic to the 2026 release of the Global Hazard
and Risk Models; as such, it focuses on improving performance and reducing
memory usage, as well as supporting the latest science.
Users valuing stability may want to stay with the LTS release instead
(currently at version 3.23.4).

The complete set of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.25/debian/changelog

A summary is given below.

Global Stochastic Event Set
---------------------------

This release features a major shift in the way the Global Risk Model
is computed: instead of starting from Ground Motion Fields, we
will start from sets of ruptures (regional SES): ground motion fields will be
computed on the fly and never stored, thus avoiding inefficiencies due
to saving and reading large amounts of data. This strategy has been
available in the engine for nearly 10 years by specifying
`calculation_mode = ebrisk`, but now it is the default, so the
`ebrisk` calculator has been removed.

Since SES are so crucial now, we improved the script generating the
global SES file and now we store information about the sources, so
that it is possible to determine the name of the source that generated
a given rupture.

We also store the calculation parameters in an `oqparam` dataset,
so that it is possible to read the generated HDF5 file as a regular
datastore, although without an associated calculation ID.

We note that the ruptures are not filtered when they are generated, as
it was in past versions of the engine and therefore you will see all
the possible ruptures generated by the sources over the minimum
magnitude, even at locations far away from the given sites; however,
they will be filtered and discarded when performing the calculation.

This is convenient when running calculations on a region
containing multiple countries, because all countries will see the same
ruptures/events and comparisons will become possible.

We made it possible to run multiple calculations starting from the
same SES with a single command, as in the following example:
```bash
$ oq engine --run job_Laos.ini job_Brunei.ini job_Malaysia.ini \
                  job_Cambodia.ini job_Myanmar.ini job_Singapore.ini \
                  job_Philippines.ini job_Thailand.ini \
                  job_Timor_Leste.ini job_Indonesia.ini \
                  job_Vietnam.ini --hc SES_Southeast_Asia.hdf5
```
The risk calculations will be run in parallel if the `--multi` flag
is passed, sequentially otherwise.

Notice that the provisional syntax recognized in the `job.ini` file
```
rupture_model_file = SES.hdf5
```
has been removed in favor of using `--hc`, since it avoids a special
case and makes calculations starting from a SES regular calculations.

Not only has `--hc` been extended to accept an HDF5 file instead of a
calculation ID, now it also accepts a `job.ini` file and in that case it
simply runs it before running the child calculation.

Hazard and risk calculations starting from a SES have been hugely
optimized. The crucial achievement was an extreme optimization of the
reading of the ruptures. Even if you have tens of millions of
ruptures, only the ruptures around the sites of interest will be
used. In previous versions of the engine, the filtering was so
inefficient that the strategy was not viable, whereas now it is nearly
instantaneous.

We fixed a bug in the association between event IDs and rupture IDs
in the `events` table that caused some confusion, even though it did not
affect the correctness of the results.

We extended the event based calculator to compute the GMF also on
custom sites and not on;y on the grid used when building the SES.
Notice that only the sites close to the ruptures are considered, and
only those are associated with the global site parameters in the SES
file.

We refined the estimated computational weight for the ruptures
to greatly reduce the number of slow tasks, although more work
on that is expected in the future.

We reduced data transfer by reading the site collection from
the datastore and transferring only the filtered site IDs.

We added a consistency check: if the effective investigation
time is different from that in the parent calculation, an error is
raised early.

There is now a warning when trying to compute `avg_gmf` with too many
sites, so that there is a reason when a calculation seems to hang.
Also, for simplicity, we now compute `avg_gmf` only in the absence of a
parent calculation.

## Hazard: preclassical

Before performing a classical PSHA calculation, the engine performs an
analysis of the source models in the so-called *preclassical
phase*. In this phase, the engine estimates the computational weight
of the sources, which is crucial for the performance of the next
step of the calculation. The estimate was very heuristic and failed
in a few cases, resulting in slow tasks that degraded performance.

Now the computational weight is directly proportional to the number of
contexts generated by each source. This is simpler and much
better than before (particularly for point sources), resulting in a
huge reduction in the slow-task issue.

The speedup depends very much on the model and how many cores
you have: the more cores, the worse the slow-task issue can be.
For instance, for the USA model we are now 2.5 times faster
than in version 3.24 on a machine with 192 cores; in other
models or with fewer cores, the improvement can be insignificant.

We changed the way the `CompositeSourceModel` is stored and read from
the datastore: this is an implementation detail, but it has an effect on
performance because it allows the classical calculator to read the
sources directly from disk instead of transferring them in a less
efficient way.

In some models, the new preclassical phase is much faster than before because
we moved the gridding of the point sources from the master node to the
workers; i.e., it is parallelized and up to N times faster if you have
N cores. The actual speedup may vary since the new parallelization
strategy used in the preclassical phase is based on the spawning of
subtasks and cannot be easily predicted.

Setting `complex_fault_mesh_spacing` in the job.ini is now mandatory
in calculations with complex fault sources. Before, it was optional:
when missing, the engine used the default for `rupture_mesh_spacing`
(5 km), which is too small, resulting in calculations up to 10–20 times
slower than necessary.

## Hazard: classical

There was a major change in the task distribution strategy also in the
classical phase of the computation. In most cases, the engine
generates more tasks than before, since it uses an advanced subtask
strategy: if a task is taking too long to complete, it is
automatically split into subtasks. There is an exception for
multifault sources: they are collected in tasks without subtasks, in
order to reduce the memory consumption in the distance (dparam) cache.

The task splitting is generally fully automatic, but the user can tune
it by using the `split_time` parameter in the job.ini. It should never
be necessary to change it, and currently it should be considered an
internal parameter: it could change or disappear in the future.

In order to support the USA model, we started supporting region-dependent GSIM
logic trees in version 3.24. The feature is called internally "ilabel", since
you can enable it by adding an `ilabel` column in the site model file, with
an integer that is referenced in the `site_labels` dictionary in the
`job.ini` file, an example being:

``` site_labels = {"Cascadia": 1, "LosAngeles": 2} ```

The feature is documented in the section "Site-dependent logic trees" of the
manual. It was experimental in version 3.24 and had a restricted range of
validity, whereas now it should work in all cases, including disaggregation
calculations. However, region-dependent logic trees are ignored in
event-based calculations: only the default logic tree is used, and there are
no plans to extend the feature.

Supporting the USA model also required changing the way the engine manages
cluster sources: now all cluster groups are managed together, and since there
are around 400 groups, we require approximately 400 times less memory than
before.

More generally, we now avoid keeping large arrays (the so-called RateMaps) in
memory when not needed. That reduced memory consumption even in the absence
of cluster sources. Thanks to that, we were able to remove the auto-tiling
functionality that was used to reduce memory usage. You can still use full
tiling, but you have to specify `tiling=true` in the job.ini explicitly.

We changed the algorithm for saving rates in large computations: now the
temporary files (there is one for each relevant group and IMT) are stored in
the calculation directory `$OQDATA/calc_XXX` and not in `custom_tmp`. As a
consequence, in SLURM clusters there is no need to configure a scratch
directory anymore.

Classical calculations now also work on servers without a graphical display,
since the environment variable `MPLBACKEND="Agg"` is automatically set when
generating PNG images.

---

## Hazard: postclassical

Parallelization in the postclassical phase, where hazard curves/maps and their
statistics are generated, has been significantly improved by setting the
number of generated tasks equal to the number of available cores. Also,
the "combine pmaps" operation has been optimized, and now the entire
postclassical phase, in the most common case of computing the means, is
dominated purely by the time spent reading rates from disk. Computing the
quantiles is significantly slower and more memory intensive since it requires
computing all realizations for each site.

We now gzip the rates dataset in the datastore, thus reducing disk space usage
by approximately a factor of 3 and reducing the time required to read the
rates by a similar amount.

A minor optimization is that we now avoid initializing the logic tree twice
when using the `--hc` option to re-run the postclassical phase.

When starting a postprocessing calculation with `--hc`, the flag `use_rates`
was not being honored; i.e., only the value set in the parent calculation was
considered. This has been fixed.

We added a new exporter `hmaps-stats` producing one file per return period and
statistic, thus avoiding nested fields in the CSV header. This is convenient
for building the Global Hazard Map.

---

## Calculations with few sites

We introduced the possibility of specifying in the `job.ini` file a `siteid`
parameter associated with the `sites` parameter, used to give short (up to
8-character) names to the coordinates. This is similar to specifying a
`custom_site_id` column in the site model file; indeed, the `siteid` ends up
inside the `custom_site_id` field of the site collection.

`siteid` strings are restricted to the URL-safe base64 alphabet, so that they
can be used in web applications. The convenience is that it is sufficient to
list the coordinates and the site parameters will be automatically associated
from the site model file or from the parameters of the parent calculation, if
any.

---

## hazardlib: general

There was some general infrastructure work on the GSIM classes at the level of
the `MetaGSIM` class. Now each GMPE instance has an attribute `._toml` that
is automatically set and used to compute the hash of the GMPE. This means
that *all* GMPEs are hashable; therefore, they can be used as keys in
dictionaries and can be cached. Previously, we relied on the user remembering
to set the `_toml` attribute or to call the `valid.gsim` factory function.

We changed the implementation of interpolated tables keyed by GSIM, magnitude,
and IMT, used in subclasses of `GMPETable`. Before, the interpolated tables
were computed in the `__init__` method; however, that caused problems because
the interpolated magnitudes were hard to pass correctly to the underlying
GMPEs (in the case of advanced GMPEs). Now the interpolation happens in the
`compute` method (i.e., in the workers, not in the master node), but it is
still performed only once because it is cached.

Thanks to this change, the GMPE `GmpeIndirectAvgSA` now works when the
underlying GMPE is a `GMPETable` subclass. We also fixed the edge case where
the job.ini file does not contain the IMT `AvgSA`.

The change also fixed a number of bugs in Conditional Ground Motion Models,
which are `ModifiableGMPE` instances depending on a dictionary of underlying
GMPEs keyed by Intensity Measure Type. They should now work with all kinds of
underlying GMPEs.

As a consequence of the change, we now have a general mechanism for managing
GSIM class warnings that guarantees they are displayed only once, even if the
GSIM is instantiated multiple times.

We removed from hazardlib dozens of calls to `super().__init__()` since now
there is no need to call it in the vast majority of cases, resulting in less
coupling and simpler code.

We added a method `CompositeSourceModel.set_msparams` to properly initialize
multifault sources after reading the source models from hazardlib (this is
expensive and needed only for classical analyses, not for event-based ones).

We added a method `CompositeSourceModel.get_cmakers` returning a
`ContextMakerSequence` object that can be used to implement custom versions
of the classical PSHA calculator.

We extended `CompositeSourceModel.get_sources` to accept an `smr` index so
that advanced users can perform analysis one source model at a time.

Both new methods and the `smr` index are documented in the manual, in the
section "Reading the hazard sources programmatically". Moreover, we extended
the documentation about implementing advanced GSIMs, including those using
Machine Learning models.

Internally, all the logic about the `CompositeSourceModel` has been moved into
a new module `openquake.hazardlib.source_group`.

We added a method `SiteCollection.lower_res` to reduce the resolution of a
site collection by using Uber's `h3` library; this is used by the engine when
prefiltering sources and ruptures.

We fixed a bug in the conversion to geometric mean, which was applied to all
IMTs, including unsupported ones, causing ZeroDivision errors.

Sources with a `NegativeBinomialTOM` temporal occurrence model (used in the
New Zealand model) are now treated differently, allowing for a
simplification; however, the user will not see any significant change in the
results or performance.

As a consequence of the simplification, the class method
`ContextMaker.from_srcs(sources, sitecol)` now returns a context array
instead of a list of context arrays.

Finally, after several years of deprecation, we removed the method
`get_mean_and_stddevs` from all GMPEs. If you still have code calling the old
method, you should replace it with the function `contexts.mean_stds
(rup_ctx, gsim, imt, idx)` following the examples in
https://github.com/gem/oq-engine/pull/11194/changes.

---

## hazardlib: new GMPEs and fixes

The Abrahamson & Bhasin (2020) conditional GMPE 
[was implemented](https://github.com/gem/oq-engine/pull/10896) 
by [Lana Todorovic](https://github.com/LanaTodorovic93).

The EMME24 site model [was added](https://github.com/gem/oq-engine/pull/11200) 
to the existing EMME backbone model for the Middle East by 
[Christopher Brooks](https://github.com/CB-quakemodel),
using files shared by Abdullah Sandıkkaya, Özkan Kale and Baran Güryuva.

[Christopher Brooks](https://github.com/CB-quakemodel) 
[added the option](https://github.com/gem/oq-engine/pull/11116) 
to use a modified form of the Campbell and Bozorgnia (2014) GMM sigma model
within the Kuehn et al. (2020) GMMs, as required for the 2023 USGS Alaska model.
He also [added the USGS Alaska bias adjustment](https://github.com/gem/oq-engine/pull/11103) 
for NGA-SUB interface GMMs.

We [fixed a small bug](https://github.com/gem/oq-engine/pull/11197) 
in the Hashash et al. (2020) site term implementation
within the NGA East models and regenerated the test tables (very small
differences are observed and only for SA(0.4)).

We [fixed an error](https://github.com/gem/oq-engine/pull/10929) 
in the GMPEs computing lateral spread displacements, i.e.,
Youd et al. (2002) and Zhang and Zhao (2005).

Moreover, we received several contributions from our community.

[Yen Shin Chen](https://github.com/vup1120) contributed several utilities 
for Probabilistic Fault Displacement Hazard Analysis (PFDHA).

[Ji Kun](https://github.com/JIKUN1990) contributed a 
[GMPE for the Azores islands](https://github.com/gem/oq-engine/pull/11009).

[Maoxin Wang](https://github.com/MaoxinWang) contributed 
[ground-motion models for Turkey](https://github.com/gem/oq-engine/pull/11062), 
for Arias Intensity, Cumulative Absolute velocity, and Significant Durations.

[Amirhossein Mohammadi](https://github.com/amirxdbx) contributed the GMPE
[`Mohammadi2023Turkiye`](https://github.com/gem/oq-engine/pull/11006),
based on a Machine Learning model.

[Nicholas Clemett](https://github.com/nc-hsu) contributed his
[correlation models](https://github.com/gem/oq-engine/pull/11168).

[Antonio Scala](https://github.com/antonio-scala) contributed
[three GMPEs for Campi Flegrei](https://github.com/gem/oq-engine/pull/11084)
in Italy.

---

## Risk

For the sake of the Global Risk Model, and since it is useful in general, we
added a new output "Average Losses By Taxonomy" (`avg_losses_by`) aggregating
average losses. If the exposure contains a `MACRO_TAXONOMY` field, it also
aggregates by it, meaning that the exporter will produce two files:
`avg_losses_by_-taxonomy.csv` and `avg_losses_by_-MACRO_TAXONOMY.csv`.

Scenario calculations have been changed to consider only the sites with assets
around the rupture (within the maximum distance), except for conditioned
scenarios. Previously, the full site collection was considered, thus
requiring more memory and computation time.

We extended the `minimum_intensity` feature to also work for secondary perils.
For instance, setting in the job.ini

``` minimum_intensity = {'LiqProb': .02, 'LSE': .001} ```

will discard liquefaction probabilities below 2% and liquefaction spatial
extent below 0.1%. This makes secondary peril calculations faster and
requires less disk space.

In infrastructure calculations, we turned the hard-coded parameter
`max_nodes_network` into a job.ini parameter with a default value of 1000.

If the site parameters are more distant than ASSOC_DIST=8 km from the sites,
we now raise an error instead of a warning.

For `scenario_risk` calculations with quantiles and a single GSIM, the
output "Aggregate Risk Statistics" was not visible. This is now fixed.

We [fixed a bug](https://github.com/gem/oq-engine/pull/10982) in `classical_risk` 
in the presence of nontrivial weights in the taxonomy mapping file,
[reported by Lisa Jusufi](https://groups.google.com/g/openquake-users/c/-bKfDupdtqE/m/AmmbHVbRBAAJ)
on the OpenQuake mailing list.

---

## oq commands

The internal command `oq run` has been changed to support workflow files.
These are TOML files describing multiple calculations that should be
performed together. This is essential for the 2026 Global Hazard Model and
Global Risk Models since we want to be able to run all the hazard models in
the mosaic or compute risk profiles for all the countries in the world using
a single configuration file. The format is still experimental and internal,
but it is expected to become official in the near future.

`oq run` also accepts a `--cache` flag: when set to true, calculations that
have already been performed are not repeated. To determine whether a
calculation has already been performed, the engine looks at the checksum of
the input files stored in the database. The feature is NOT enabled by default
since it is potentially dangerous: for instance, a bugfix to a GMPE is a
change in the code, not in the input files, so using `--cache=true` would
retrieve old (incorrect) results. Therefore, the cache must be enabled
manually only when the user knows that there have been no significant changes
to the code. This is essential for workflow ergonomics in the presence of
errors, since you can easily relaunch a workflow without having to repeat
successful calculations.

We extended `oq shell` to accept dotted names, making it easy to call Python
modules as scripts. For example:
```bash
$ oq shell openquake.engine.global_ses --help
```
to generate the global Stochastic Event Set.

It is now valid to pass a "prejob.ini" file to `--hazard-calculation-id`,
rather than simply an integer ID. Thus, a command like
```bash
$ oq engine --run job.ini --hc prejob.ini
```
will perform two calculations: first the one corresponding to `prejob.ini`,
and then the one corresponding to `job.ini`, starting from the previous one.

The old syntax
```bash
$ oq engine --run prejob.ini job.ini
```
still works, but it is deprecated, and in the future only the explicit syntax
with `--hc` may be accepted.

We extended the list of calculations generated by `--list-hazard-calculations`
and `--list-risk-calculations` to show the `calculation_mode` as well.

The command `oq purge` has been extended to remove failed calculations, old
calculations, or orphan calculations (i.e., `calc_XXX.hdf5` files not
referenced in the database).

Finally, `oq plot` has been optimized by avoiding a costly buffer around
mosaic/country geometries. We also improved the plotting of event-based
ruptures in calculations starting from an SES.hdf5 file.

---

## WebUI

There were fixes to the navigation bar, which was not properly displaying some
elements in TOOLS_ONLY mode.

We added filtering functionality to the page displaying
calculations. For instance, calling the URL
https://hostname/v1/calc/list?user_name_like=%test% will return only
calculations of users containing "test" in their name (internally
performing a LIKE query in the database).

When exporting multiple files as a single archive, the engine was littering
the temporary directory with zip files. This is now fixed. Also, only the
specified `custom_tmp` directory is used, as intended.

We refactored the WebUI JavaScript and CSS code, separating the logic into
multiple files and adding integration tests written in the Playwright
framework.

We added an endpoint `v1/calc/validate_ini` to validate a local .ini file, to
be used in the context of the PAPERS project.

We added a `<calc_id>/extract/exposure_by_location` endpoint to extract
exposure aggregated by asset locations.

Dozens of pull requests were made in the engine codebase to support the AELO
and OQImpact platforms. However, since those are private platforms, the
related improvements are not listed here.

---

## IT

In this release, we dropped support for Python 3.10 and added support for
Python 3.13. That included upgrading NumPy to version 2 and all
NumPy-dependent libraries (including geospatial libraries). We upgraded
pyproj to 3.7.1 to address a CRS-related crash in 3.6.1.

We removed several warnings caused by the library upgrade, although some still
remain.

We now set `PYTHONUTF8` mode on Windows to avoid possible encoding errors.

Database migrations are now automatically performed without requiring user
confirmation.

We fixed a couple of issues in install.py. When installing with a flag like
`--version=3.23` (without the patch number), we now install the latest
available patch from PyPI instead of the first patch release(3.23.0). When
installing from a branch (i.e., with `--version=engine-3.23`), we now extract
the requirements from that branch rather than from master.

We changed install.py to install the version-specific demos rather than the
master demos.

A user reported that the LOCKDOWN option in Docker (when running `docker
run -e LOCKDOWN=True openquake/engine`) was not honored. This has been
fixed.

Some important `.txt` files in the GSIMs were not distributed in the packaged
version of the engine. They are now included. The same applies to the `.onnx`
and `.onnx.gz` files required by some models. Moreover, the Python package
for the engine now includes the demos.