Release notes v3.25 =================== Version 3.25 is the culmination of 3 months of work involving over 320 pull requests. It is propedeutic to the 2026 release of the Global Hazard and Risk Models; as such, it focuses on improving performance and reducing memory usage, as well as supporting the latest science. Users valuing stability may want to stay with the LTS release instead (currently at version 3.23.4). The complete set of changes is listed in the changelog: https://github.com/gem/oq-engine/blob/engine-3.25/debian/changelog A summary is given below. Global Stochastic Event Set --------------------------- This release features a major shift in the way the Global Risk Model is computed: instead of starting from Ground Motion Fields, we will start from sets of ruptures (regional SES): ground motion fields will be computed on the fly and never stored, thus avoiding inefficiencies due to saving and reading large amounts of data. This strategy has been available in the engine for nearly 10 years by specifying `calculation_mode = ebrisk`, but now it is the default, so the `ebrisk` calculator has been removed. Since SES are so crucial now, we improved the script generating the global SES file and now we store information about the sources, so that it is possible to determine the name of the source that generated a given rupture. We also store the calculation parameters in an `oqparam` dataset, so that it is possible to read the generated HDF5 file as a regular datastore, although without an associated calculation ID. We note that the ruptures are not filtered when they are generated, as it was in past versions of the engine and therefore you will see all the possible ruptures generated by the sources over the minimum magnitude, even at locations far away from the given sites; however, they will be filtered and discarded when performing the calculation. This is convenient when running calculations on a region containing multiple countries, because all countries will see the same ruptures/events and comparisons will become possible. We made it possible to run multiple calculations starting from the same SES with a single command, as in the following example: ```bash $ oq engine --run job_Laos.ini job_Brunei.ini job_Malaysia.ini \ job_Cambodia.ini job_Myanmar.ini job_Singapore.ini \ job_Philippines.ini job_Thailand.ini \ job_Timor_Leste.ini job_Indonesia.ini \ job_Vietnam.ini --hc SES_Southeast_Asia.hdf5 ``` The risk calculations will be run in parallel if the `--multi` flag is passed, sequentially otherwise. Notice that the provisional syntax recognized in the `job.ini` file ``` rupture_model_file = SES.hdf5 ``` has been removed in favor of using `--hc`, since it avoids a special case and makes calculations starting from a SES regular calculations. Not only has `--hc` been extended to accept an HDF5 file instead of a calculation ID, now it also accepts a `job.ini` file and in that case it simply runs it before running the child calculation. Hazard and risk calculations starting from a SES have been hugely optimized. The crucial achievement was an extreme optimization of the reading of the ruptures. Even if you have tens of millions of ruptures, only the ruptures around the sites of interest will be used. In previous versions of the engine, the filtering was so inefficient that the strategy was not viable, whereas now it is nearly instantaneous. We fixed a bug in the association between event IDs and rupture IDs in the `events` table that caused some confusion, even though it did not affect the correctness of the results. We extended the event based calculator to compute the GMF also on custom sites and not on;y on the grid used when building the SES. Notice that only the sites close to the ruptures are considered, and only those are associated with the global site parameters in the SES file. We refined the estimated computational weight for the ruptures to greatly reduce the number of slow tasks, although more work on that is expected in the future. We reduced data transfer by reading the site collection from the datastore and transferring only the filtered site IDs. We added a consistency check: if the effective investigation time is different from that in the parent calculation, an error is raised early. There is now a warning when trying to compute `avg_gmf` with too many sites, so that there is a reason when a calculation seems to hang. Also, for simplicity, we now compute `avg_gmf` only in the absence of a parent calculation. ## Hazard: preclassical Before performing a classical PSHA calculation, the engine performs an analysis of the source models in the so-called *preclassical phase*. In this phase, the engine estimates the computational weight of the sources, which is crucial for the performance of the next step of the calculation. The estimate was very heuristic and failed in a few cases, resulting in slow tasks that degraded performance. Now the computational weight is directly proportional to the number of contexts generated by each source. This is simpler and much better than before (particularly for point sources), resulting in a huge reduction in the slow-task issue. The speedup depends very much on the model and how many cores you have: the more cores, the worse the slow-task issue can be. For instance, for the USA model we are now 2.5 times faster than in version 3.24 on a machine with 192 cores; in other models or with fewer cores, the improvement can be insignificant. We changed the way the `CompositeSourceModel` is stored and read from the datastore: this is an implementation detail, but it has an effect on performance because it allows the classical calculator to read the sources directly from disk instead of transferring them in a less efficient way. In some models, the new preclassical phase is much faster than before because we moved the gridding of the point sources from the master node to the workers; i.e., it is parallelized and up to N times faster if you have N cores. The actual speedup may vary since the new parallelization strategy used in the preclassical phase is based on the spawning of subtasks and cannot be easily predicted. Setting `complex_fault_mesh_spacing` in the job.ini is now mandatory in calculations with complex fault sources. Before, it was optional: when missing, the engine used the default for `rupture_mesh_spacing` (5 km), which is too small, resulting in calculations up to 10–20 times slower than necessary. ## Hazard: classical There was a major change in the task distribution strategy also in the classical phase of the computation. In most cases, the engine generates more tasks than before, since it uses an advanced subtask strategy: if a task is taking too long to complete, it is automatically split into subtasks. There is an exception for multifault sources: they are collected in tasks without subtasks, in order to reduce the memory consumption in the distance (dparam) cache. The task splitting is generally fully automatic, but the user can tune it by using the `split_time` parameter in the job.ini. It should never be necessary to change it, and currently it should be considered an internal parameter: it could change or disappear in the future. In order to support the USA model, we started supporting region-dependent GSIM logic trees in version 3.24. The feature is called internally "ilabel", since you can enable it by adding an `ilabel` column in the site model file, with an integer that is referenced in the `site_labels` dictionary in the `job.ini` file, an example being: ``` site_labels = {"Cascadia": 1, "LosAngeles": 2} ``` The feature is documented in the section "Site-dependent logic trees" of the manual. It was experimental in version 3.24 and had a restricted range of validity, whereas now it should work in all cases, including disaggregation calculations. However, region-dependent logic trees are ignored in event-based calculations: only the default logic tree is used, and there are no plans to extend the feature. Supporting the USA model also required changing the way the engine manages cluster sources: now all cluster groups are managed together, and since there are around 400 groups, we require approximately 400 times less memory than before. More generally, we now avoid keeping large arrays (the so-called RateMaps) in memory when not needed. That reduced memory consumption even in the absence of cluster sources. Thanks to that, we were able to remove the auto-tiling functionality that was used to reduce memory usage. You can still use full tiling, but you have to specify `tiling=true` in the job.ini explicitly. We changed the algorithm for saving rates in large computations: now the temporary files (there is one for each relevant group and IMT) are stored in the calculation directory `$OQDATA/calc_XXX` and not in `custom_tmp`. As a consequence, in SLURM clusters there is no need to configure a scratch directory anymore. Classical calculations now also work on servers without a graphical display, since the environment variable `MPLBACKEND="Agg"` is automatically set when generating PNG images. --- ## Hazard: postclassical Parallelization in the postclassical phase, where hazard curves/maps and their statistics are generated, has been significantly improved by setting the number of generated tasks equal to the number of available cores. Also, the "combine pmaps" operation has been optimized, and now the entire postclassical phase, in the most common case of computing the means, is dominated purely by the time spent reading rates from disk. Computing the quantiles is significantly slower and more memory intensive since it requires computing all realizations for each site. We now gzip the rates dataset in the datastore, thus reducing disk space usage by approximately a factor of 3 and reducing the time required to read the rates by a similar amount. A minor optimization is that we now avoid initializing the logic tree twice when using the `--hc` option to re-run the postclassical phase. When starting a postprocessing calculation with `--hc`, the flag `use_rates` was not being honored; i.e., only the value set in the parent calculation was considered. This has been fixed. We added a new exporter `hmaps-stats` producing one file per return period and statistic, thus avoiding nested fields in the CSV header. This is convenient for building the Global Hazard Map. --- ## Calculations with few sites We introduced the possibility of specifying in the `job.ini` file a `siteid` parameter associated with the `sites` parameter, used to give short (up to 8-character) names to the coordinates. This is similar to specifying a `custom_site_id` column in the site model file; indeed, the `siteid` ends up inside the `custom_site_id` field of the site collection. `siteid` strings are restricted to the URL-safe base64 alphabet, so that they can be used in web applications. The convenience is that it is sufficient to list the coordinates and the site parameters will be automatically associated from the site model file or from the parameters of the parent calculation, if any. --- ## hazardlib: general There was some general infrastructure work on the GSIM classes at the level of the `MetaGSIM` class. Now each GMPE instance has an attribute `._toml` that is automatically set and used to compute the hash of the GMPE. This means that *all* GMPEs are hashable; therefore, they can be used as keys in dictionaries and can be cached. Previously, we relied on the user remembering to set the `_toml` attribute or to call the `valid.gsim` factory function. We changed the implementation of interpolated tables keyed by GSIM, magnitude, and IMT, used in subclasses of `GMPETable`. Before, the interpolated tables were computed in the `__init__` method; however, that caused problems because the interpolated magnitudes were hard to pass correctly to the underlying GMPEs (in the case of advanced GMPEs). Now the interpolation happens in the `compute` method (i.e., in the workers, not in the master node), but it is still performed only once because it is cached. Thanks to this change, the GMPE `GmpeIndirectAvgSA` now works when the underlying GMPE is a `GMPETable` subclass. We also fixed the edge case where the job.ini file does not contain the IMT `AvgSA`. The change also fixed a number of bugs in Conditional Ground Motion Models, which are `ModifiableGMPE` instances depending on a dictionary of underlying GMPEs keyed by Intensity Measure Type. They should now work with all kinds of underlying GMPEs. As a consequence of the change, we now have a general mechanism for managing GSIM class warnings that guarantees they are displayed only once, even if the GSIM is instantiated multiple times. We removed from hazardlib dozens of calls to `super().__init__()` since now there is no need to call it in the vast majority of cases, resulting in less coupling and simpler code. We added a method `CompositeSourceModel.set_msparams` to properly initialize multifault sources after reading the source models from hazardlib (this is expensive and needed only for classical analyses, not for event-based ones). We added a method `CompositeSourceModel.get_cmakers` returning a `ContextMakerSequence` object that can be used to implement custom versions of the classical PSHA calculator. We extended `CompositeSourceModel.get_sources` to accept an `smr` index so that advanced users can perform analysis one source model at a time. Both new methods and the `smr` index are documented in the manual, in the section "Reading the hazard sources programmatically". Moreover, we extended the documentation about implementing advanced GSIMs, including those using Machine Learning models. Internally, all the logic about the `CompositeSourceModel` has been moved into a new module `openquake.hazardlib.source_group`. We added a method `SiteCollection.lower_res` to reduce the resolution of a site collection by using Uber's `h3` library; this is used by the engine when prefiltering sources and ruptures. We fixed a bug in the conversion to geometric mean, which was applied to all IMTs, including unsupported ones, causing ZeroDivision errors. Sources with a `NegativeBinomialTOM` temporal occurrence model (used in the New Zealand model) are now treated differently, allowing for a simplification; however, the user will not see any significant change in the results or performance. As a consequence of the simplification, the class method `ContextMaker.from_srcs(sources, sitecol)` now returns a context array instead of a list of context arrays. Finally, after several years of deprecation, we removed the method `get_mean_and_stddevs` from all GMPEs. If you still have code calling the old method, you should replace it with the function `contexts.mean_stds (rup_ctx, gsim, imt, idx)` following the examples in https://github.com/gem/oq-engine/pull/11194/changes. --- ## hazardlib: new GMPEs and fixes The Abrahamson & Bhasin (2020) conditional GMPE [was implemented](https://github.com/gem/oq-engine/pull/10896) by [Lana Todorovic](https://github.com/LanaTodorovic93). The EMME24 site model [was added](https://github.com/gem/oq-engine/pull/11200) to the existing EMME backbone model for the Middle East by [Christopher Brooks](https://github.com/CB-quakemodel), using files shared by Abdullah Sandıkkaya, Özkan Kale and Baran Güryuva. [Christopher Brooks](https://github.com/CB-quakemodel) [added the option](https://github.com/gem/oq-engine/pull/11116) to use a modified form of the Campbell and Bozorgnia (2014) GMM sigma model within the Kuehn et al. (2020) GMMs, as required for the 2023 USGS Alaska model. He also [added the USGS Alaska bias adjustment](https://github.com/gem/oq-engine/pull/11103) for NGA-SUB interface GMMs. We [fixed a small bug](https://github.com/gem/oq-engine/pull/11197) in the Hashash et al. (2020) site term implementation within the NGA East models and regenerated the test tables (very small differences are observed and only for SA(0.4)). We [fixed an error](https://github.com/gem/oq-engine/pull/10929) in the GMPEs computing lateral spread displacements, i.e., Youd et al. (2002) and Zhang and Zhao (2005). Moreover, we received several contributions from our community. [Yen Shin Chen](https://github.com/vup1120) contributed several utilities for Probabilistic Fault Displacement Hazard Analysis (PFDHA). [Ji Kun](https://github.com/JIKUN1990) contributed a [GMPE for the Azores islands](https://github.com/gem/oq-engine/pull/11009). [Maoxin Wang](https://github.com/MaoxinWang) contributed [ground-motion models for Turkey](https://github.com/gem/oq-engine/pull/11062), for Arias Intensity, Cumulative Absolute velocity, and Significant Durations. [Amirhossein Mohammadi](https://github.com/amirxdbx) contributed the GMPE [`Mohammadi2023Turkiye`](https://github.com/gem/oq-engine/pull/11006), based on a Machine Learning model. [Nicholas Clemett](https://github.com/nc-hsu) contributed his [correlation models](https://github.com/gem/oq-engine/pull/11168). [Antonio Scala](https://github.com/antonio-scala) contributed [three GMPEs for Campi Flegrei](https://github.com/gem/oq-engine/pull/11084) in Italy. --- ## Risk For the sake of the Global Risk Model, and since it is useful in general, we added a new output "Average Losses By Taxonomy" (`avg_losses_by`) aggregating average losses. If the exposure contains a `MACRO_TAXONOMY` field, it also aggregates by it, meaning that the exporter will produce two files: `avg_losses_by_-taxonomy.csv` and `avg_losses_by_-MACRO_TAXONOMY.csv`. Scenario calculations have been changed to consider only the sites with assets around the rupture (within the maximum distance), except for conditioned scenarios. Previously, the full site collection was considered, thus requiring more memory and computation time. We extended the `minimum_intensity` feature to also work for secondary perils. For instance, setting in the job.ini ``` minimum_intensity = {'LiqProb': .02, 'LSE': .001} ``` will discard liquefaction probabilities below 2% and liquefaction spatial extent below 0.1%. This makes secondary peril calculations faster and requires less disk space. In infrastructure calculations, we turned the hard-coded parameter `max_nodes_network` into a job.ini parameter with a default value of 1000. If the site parameters are more distant than ASSOC_DIST=8 km from the sites, we now raise an error instead of a warning. For `scenario_risk` calculations with quantiles and a single GSIM, the output "Aggregate Risk Statistics" was not visible. This is now fixed. We [fixed a bug](https://github.com/gem/oq-engine/pull/10982) in `classical_risk` in the presence of nontrivial weights in the taxonomy mapping file, [reported by Lisa Jusufi](https://groups.google.com/g/openquake-users/c/-bKfDupdtqE/m/AmmbHVbRBAAJ) on the OpenQuake mailing list. --- ## oq commands The internal command `oq run` has been changed to support workflow files. These are TOML files describing multiple calculations that should be performed together. This is essential for the 2026 Global Hazard Model and Global Risk Models since we want to be able to run all the hazard models in the mosaic or compute risk profiles for all the countries in the world using a single configuration file. The format is still experimental and internal, but it is expected to become official in the near future. `oq run` also accepts a `--cache` flag: when set to true, calculations that have already been performed are not repeated. To determine whether a calculation has already been performed, the engine looks at the checksum of the input files stored in the database. The feature is NOT enabled by default since it is potentially dangerous: for instance, a bugfix to a GMPE is a change in the code, not in the input files, so using `--cache=true` would retrieve old (incorrect) results. Therefore, the cache must be enabled manually only when the user knows that there have been no significant changes to the code. This is essential for workflow ergonomics in the presence of errors, since you can easily relaunch a workflow without having to repeat successful calculations. We extended `oq shell` to accept dotted names, making it easy to call Python modules as scripts. For example: ```bash $ oq shell openquake.engine.global_ses --help ``` to generate the global Stochastic Event Set. It is now valid to pass a "prejob.ini" file to `--hazard-calculation-id`, rather than simply an integer ID. Thus, a command like ```bash $ oq engine --run job.ini --hc prejob.ini ``` will perform two calculations: first the one corresponding to `prejob.ini`, and then the one corresponding to `job.ini`, starting from the previous one. The old syntax ```bash $ oq engine --run prejob.ini job.ini ``` still works, but it is deprecated, and in the future only the explicit syntax with `--hc` may be accepted. We extended the list of calculations generated by `--list-hazard-calculations` and `--list-risk-calculations` to show the `calculation_mode` as well. The command `oq purge` has been extended to remove failed calculations, old calculations, or orphan calculations (i.e., `calc_XXX.hdf5` files not referenced in the database). Finally, `oq plot` has been optimized by avoiding a costly buffer around mosaic/country geometries. We also improved the plotting of event-based ruptures in calculations starting from an SES.hdf5 file. --- ## WebUI There were fixes to the navigation bar, which was not properly displaying some elements in TOOLS_ONLY mode. We added filtering functionality to the page displaying calculations. For instance, calling the URL https://hostname/v1/calc/list?user_name_like=%test% will return only calculations of users containing "test" in their name (internally performing a LIKE query in the database). When exporting multiple files as a single archive, the engine was littering the temporary directory with zip files. This is now fixed. Also, only the specified `custom_tmp` directory is used, as intended. We refactored the WebUI JavaScript and CSS code, separating the logic into multiple files and adding integration tests written in the Playwright framework. We added an endpoint `v1/calc/validate_ini` to validate a local .ini file, to be used in the context of the PAPERS project. We added a `/extract/exposure_by_location` endpoint to extract exposure aggregated by asset locations. Dozens of pull requests were made in the engine codebase to support the AELO and OQImpact platforms. However, since those are private platforms, the related improvements are not listed here. --- ## IT In this release, we dropped support for Python 3.10 and added support for Python 3.13. That included upgrading NumPy to version 2 and all NumPy-dependent libraries (including geospatial libraries). We upgraded pyproj to 3.7.1 to address a CRS-related crash in 3.6.1. We removed several warnings caused by the library upgrade, although some still remain. We now set `PYTHONUTF8` mode on Windows to avoid possible encoding errors. Database migrations are now automatically performed without requiring user confirmation. We fixed a couple of issues in install.py. When installing with a flag like `--version=3.23` (without the patch number), we now install the latest available patch from PyPI instead of the first patch release(3.23.0). When installing from a branch (i.e., with `--version=engine-3.23`), we now extract the requirements from that branch rather than from master. We changed install.py to install the version-specific demos rather than the master demos. A user reported that the LOCKDOWN option in Docker (when running `docker run -e LOCKDOWN=True openquake/engine`) was not honored. This has been fixed. Some important `.txt` files in the GSIMs were not distributed in the packaged version of the engine. They are now included. The same applies to the `.onnx` and `.onnx.gz` files required by some models. Moreover, the Python package for the engine now includes the demos.