Release notes v3.24#

Version 3.24 is the culmination of 9 months of work involving around 510 pull requests. It is aimed at users wanting the latest features, bug fixes and maximum performance. Users valuing stability may want to stay with the LTS release instead (currently at version 3.23.3).

The complete set of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.24/debian/changelog

The major changes are in multifault sources (relevant for the USA, China, Japan and New Zealand models), in secondary perils (i.e. liquefaction and landslides) and in the script generating the global stochastic event set from the GEM hazard models. Moreover, we optimized the memory consumption in scenario calculations (both conditioned and not) and in large event based risk calculations, that were running out of memory due to the large data transfer in the average losses.

A detailed summary is given below.

Hazard sources#

We enhanced multifault sources. It is now possible to specify their composition in faults with a syntax like the following:

This is used when disaggregating the hazard by source. The mechanism works by splitting the multifault source in faults before starting the calculation. Then in the source_info dataset you will see not the original multifault source (“MF0” in this example), but the two multifault sources “MF0@tag1” and “MF0@tag2”. You can find an example in classical/case_75.

We reduced the memory consumption in multifault sources by splitting them in blocks of ruptures, with size determined by the openquake.cfg parameter config.memory.max_multi_fault_ruptures.

We added checks for missing hdf5 files when reading nonparametric and multifault ruptures, so that the user gets clear error messges.

We added a parsing check in pointlike sources, that is the depths in the hypodepth distribution must be within the upper_seismogenic_depth and the lower_seismogenic_depth.

We improved the error message for invalid polygons, so that the user can see which area source causes the infamous error “polygon perimeter intersects itself”.

We extended the source_id feature. For instance setting source_id=NewMadrid in the job.ini will keep all the sources starting with the string “NewMadrid” instead of keeping only the source with ID “NewMadrid”.

Classical calculations#

There was a substantial amount of work to support the USA (2023) model. While the model is not supported yet - it will require engine v3.25 - all of the required GMMs have been added to hazardlib.

Moreover, there were multiple improvements in the management of clusters of sources (like the NewMadrid cluster) that now are a lot more efficient: computationally, in terms of memory consumption and in terms of data transfer. Even estimating the weight of cluster sources in the preclassical phase is much faster. Most of all, the disk space consumption has been reduced by one order of magnitude for USA.

We fixed the “OverflowError: Python integer out of bounds for uint32” happening when saving the source_info table due to the size of the model.

Far away point sources (as determined by the preclassical phase) are not transferred anymore, thus improving performance and reducing data transfer and slow tasks.

We optimized the preclassical phase in the case of point sources when using the ps_grid_spacing approximation. We also changed slightly the logic of the approximation so that the numbers are slightly different. This will help towards implementing additional optimizations in future engine releases.

We removed the parameter reference_backarc since it was not used.

Event based calculations#

We added a parameter max_sites_correl with default 1000 to stop users trying to compute correlated GMFs with too many sites, resulting in serious out of memory situations. The parameter does not affect scenario calculations, only event based ones.

We made the job.ini parameters minimum_intensity and minimum_magnitude mandatory in the case of event based calculations with more than max_sites_disagg=10 sites. This solves the issue of users forgetting to set such parameters and running calculations much larger than needed.

We introduced a parameter config.memory.gmf_data_rows to make it impossible to compute hazard curves from excessively large datasets: this happened when the user forgot to remove hazard_curves_from_gmfs which is meant to be used only for few sites.

We exposed the output avg_gmf that before was internal. Moreover, we extended it to consider secondary IMTs. Notice that avg_gmf is computed and stored only in small calculations. The controllining parameter in the job.ini file is called gmf_max_gb, with a default value of 0.1, meaning that avg_gmf is computed and stored only if gmf_data contains less than 102.4 MB of data. You can raise the parameter but then the calculation will become slower.

We improved the script global_ses.py, which is able to generate a global (or partial) Stochastic Event Set starting from the hazard models in the GEM mosaic. In particular it now is possible to pass the models to consider and various parameters. Moreover, we documented it. It is not considered experimental anymore.

We also worked at avoiding out of memory errors in calculations starting from the global SES; the performance, however, is still not ideal and further work will be required. Moreover, you cannot run risk calculations directly from the global SES without storing intermediate ground motion fields first.

Scenario calculations#

We reduced the memory consumption in scenario calculations by discarding the sites outside the integration distance from the rupture.

We futher reduced the memory consumption in conditioned GMFs calculations by using shared memory to keep the distance matrices.

We fixed the issue of small negative damages due to rounding errors in some scenario damage calculations.

hazardlib#

We slightly optimized the generation of context arrays for point sources.

We worked at the Australian GMMs (NGAEastAUS2023GMPE), by introducing a way to manage variable vs30 values, which is essential for risk calculations.

We adjusted the basin terms in many GMMs for the USA model to determine the basin parameters from the vs30 via empirical relationships. Such feature is enabled only if the z1pt0 and z2pt5 parameters in the site model are set to the sentinel value -999.

We added a BSSA14SiteTerm GMPE for use in the USA model.

Our users at USGS reported a bug with aliases, such that valid.gsim("[AbrahamsonGulerce2020SInterAlaska]") was returning the generic GMPE and not the regional specialization for Alaska. This is fixed now.

We fixed modified_gsim to work with the NGAEast GMPEs.

We fixed the site term for Abrahamson and Gulerce (2020) to be exactly consistent with the paper.

We fixed the BA08 site term: the rjb distance was not computed when needed by the underlying GMPE, thus resulting into an error.

We fixed the issue of NaNs coming from get_middle_point() for kite surfaces by discarding NaNs before computing the middle point.

We extended ComplexFaultSource.iter_ruptures to create ruptures for a range of aspect ratios.

We added a classmethod from_moment to the TaperedGRMFD Magnitude Frequency Distribution.

In the HMTK we extended the get_completeness_counts function to optionally return an empty array (the default is to raise an error). Moreover we extended the serialise_to_nrml method to accept a mesh spacing parameters, as requested by Francis Jenner Bernales.

We implemented the Taherian et al. (2024) GMM for Western Iberia, both for inland and offshore scenarios. This is the first GMM in the engine using Machine Learning techniques, i.e. the onnxruntime library. Such library at the moment is optional, i.e. if missing you can run everything else except this GMM.

Graeme Weatherill contributed an implementation of the MacedoEtAl2019SInter/SSlab conditional GMPE.

Baran Güryuva contributed the EMME24 backbone GMMs for crustal earthquakes.

Guillermo Aldama-Bustos contributed the Douglas et Al. (2024) GMM.

Valeria Cascone added get_width and get_length methods to the Strasser et al. (2010) MSR and then to the Thingbaijam et al.(2017) MSR.

Chiara Mascandola added a class LanzanoEtAl2019_RJB_OMO_NESS2 to Lanzano (2019) set of GMPEs and submitted an implementation of Ambraseys (1996) and Sapetta-Pugliese (1996). She also submitted Ramadan (2023) GMPEs.

We changed contexts.get_cmakers to return a ContextMakerSequence instance, so that it is possible to compute the RateMap generated by a CompositeSourceModel with a single call. Then we removed the function hazard_curve.calc_hazard_curve since it is now redundant.

logic trees#

We improved the library to generate logic tree files (hazardlib.lt.build). In particular we used it to generate the logic tree for the Drouet calculation for France using 4 blocks of 100 branches each, thus avoiding the limit of 183 branches per branchset.

We added a new uncertainty abMaxMagAbsolute to change three parameters of the Gutenberg Richter MFD at the same time (aValue, bValue and MaxMax).

We added a new uncertainty areaSourceGeometryAbsolute and then two seismogenic depth uncertainties (upper and lower) to pointlike sources.

We extended the check on duplicated branches to the source model logic tree.

We extended the check on number_of_logic_tree_samples to all calculators and not only to event_based.

We changed the realization string representation regular, i.e. the first branch of the source model logic tree regular uses a single letter abbreviation, all the other branches.

We fixed two small bugs in oq show rlz: it was not working for a couple of tests (case_12 and case_68).

If the source model logic tree is trivial (i.e. with a single branch) it is possible to not specify it and just set the source_model_file parameter.

Finally, we included in the engine repository a private _unc package containing a prototype implementation of the POINT (Propagation Of epIstemic uNcerTainty) methodology for managing correlated uncertainties. This is not ready for general usage yet.

Risk inputs#

We added two new loss types injured and affectedpop with their associated vulnerability functions. They are being used by the OQImpact platform.

We extended the exposure reader to accept .csv.gz files, thus saving disk space now that the exposure files are getting large.

We made the USA exposure parsing 10x faster by fixing get_taxidx that was performing redundant operations.

Missing columns in the exposure.csv files - missing with respect to the fields declared in the exposure.xml file - are now automatically filled with ‘No_tag’.

We added an experimental option to aggregate the exposure: multiple assets on the same hazard site and with the same taxonomy are considered as a single aggregated assets. The feature is experimental and meant to be used in the global risk model as a performance hack.

We saved a bit of memory when reading the asset collection in the workers in event based risk calculations, by reading the assets by taxonomy and by setting max_assets_chunk. We also managed to increase the reading speed substantially (6x in an example for Pakistan).

Infrastructure risk calculations for large networks could easily run out of memory since the efficiency loss (EFL) calculation is quadratic in the number of nodes. Now we just raise an error if the number of nodes exceeds max_nodes_network, for the moment hard coded to 1000.

Calculations starting from ShakeMaps required a pre_job.ini file and then the usual job.ini file. Now everything can be done with a single file, thus making the user experience simpler and less error prone. The two-files approach is still supported (it is useful for instance in sensitivity analysis, when you want to perform multiple calculations without downloading multiple times the same ShakeMap) but it is not mandatory anymore.

We removed the obsolete parameter site_effects in the job.ini of scenario from ShakeMaps calculations. It was meant to manage amplification effects, but such effects are already included in modern ShakeMaps.

Risk outputs#

We fixed a bug in post_risk when computing the quantiles stored in aggrisk_quantiles.

We reduced substantially the data transfer in avg_losses (4x for the Philippines) by performing a partial aggregation in the workers.

We changed the risk exporters to export longitude and latitude of the assets at 5 digits, consistently with the site collection exporter.

We added a method to aggregate the asset collection over a list of geometries, for internal usage.

We added an extractor for damages-stats, similar to the extractor for damages-rlzs, since it was not implemented yet.

There is now an exposure extractor which however is not accessible from the WebUI for security issues (the exposure can contain sensitive data).

There is also an exporter for the vulnerability functions and one for the job.zip file: the latter, however, is still experimental and known not to work in some cases.

We added an extractor for losses_by_location, which is aggregating the assets on the same location (i.e. not necessarily on the the hazard sites).

We are gzipping the avg_losses datasets, thus reducing the disk space occupation by 2-3 times. This is important in large scenario calculations.

The extractor for aggregated risk and exposure was not supporting multiple tags, i.e. the semicolon syntax as in aggregate_by = NAME_1, OCCUPANCY; NAME_1; OCCUPANCY. This is now fixed. Moreover, the exporter aggexp_tags exports a file for each tag combination.

Secondary perils#

The secondary peril implementation is still experimental and this release contains a great deal of changes (see https://github.com/gem/oq-engine/pull/10254) plus a new feature: it is possible to generate the equivalent of hazard curves and maps for secondary IMTs.

Many new models were added, many were fixed, some model names were changed. Calculations using the old names will fail with a clear error message suggesting to the user the new names.

In the case of landslides we renamed the job.ini parameter crit_accel_threshold as accel_ratio_threshold. Moreover, we discarded displacement values below 1E-4 to avoid wasting disk space.

We fixed the unit of measure for the slope parameter (m/m) when computing the landaslide peril.

There were two important fixes to the TodorovicSilva2022NonParametric model: first, we solved the NameError: Missing liquefaction:LiqProb in gmf_data that happened in some cases. Second, we solved a much trickier issue, overparallelization due to the underlying machine learning library that could totally hang large calculations.

We changed the multi_risk calculator to be more similar to a scenario_risk calculator with secondary perils; in particular the special-case syntax

multi_peril_csv = {‘ASH’: “ash_fall.csv”, ‘LAHAR’: “lahar-geom.csv”, ‘LAVA’: “lava-geom.csv”, ‘PYRO’: “pyro-geom.csv”}

has been replaced with

secondary_perils = Volcanic sec_peril_params = [{“ASH”: “ash_fall.csv”, “LAHAR”: “lahar-geom.csv”, “LAVA”: “lava-geom.csv”, “PYRO”: “pyro-geom.csv”}]

The plan for the future is to replace the multi_risk calculator with a regular scenario_risk.

We changed the storage of secondary intensity measure types (IMTs) inside gmf_data to keep the secondary peril class as a prefix in the column name: this makes it possible to manage multiple perils within a single computation.

Each secondary peril subclass has now a peril attribute pointing to landslide or liquefaction.

If the risk functions contain IMTs which are not in the secondary IMTs set a warning is printed, since it means that such risk functions will be ignored. This is a protection against typos.

If there are secondary IMTs not covered by the risk function an error is raised, since it is impossible to compute the secondary peril.

We added a demo called InfrastructureMultiPeril to showcase a Multi-Peril risk calculator using a small infrastructure network with four nodes and four edges.

Bug fixes#

In the past users could mistakingly set max_sites_disagg to be (much) larger than the total number of sites, thus resulting in too many postclassical tasks to be generated, with a terrible performance. Now the engine takes the total number of sites in consideration when determining the number of postclassical tasks to generate.

Extracting the realization specific disaggregation outputs in traditional format, with a call like get("disagg?kind=TRT_Mag_Dist_Eps&imt=SA(0.075)&spec=rlzs-traditional&poe_id=0&site_id=0") failed with a numpy error. This is now fixed.

We fixed getters.get_ebruptures to work also for scenario calculations.

Some corporate users have no permission to write on their Windows temporary directory and they could not run the demos. We worked around this issue. You can still export by using the oq engine --export-outputs command and by passing to it the path to a writeable directory.

hcurves-rlzs and hmaps-rlzs were created in single site calculations even if individual-rlzs was not set. This is now fixed.

oq commands#

The commmand oq plot rupture to plot the geometry of a rupture has been slightly improved and we added a command oq plot build_rupture to plot planar ruptures (see oq plot examples for examples of usage).

We added a command oq plot_h3 accepting a list of h3 codes and visualizing the hexagons with country borders.

We extended oq plot_assets to also plot the stations if available.

We added a command oq info apply_uncertainty to display the recognized uncertainties in the logic tree.

We removed the --config-file option from the oq engine command since there is the OQ_CONFIG_FILE environment variable for the rare cases when it is needed.

We removed the --reuse-input flag since it can be replaced with a cache enabled via the config.dbserver.cache flag. The cache mechanism is still experimental and disabled by default.

We added an utility utils/find_exposure <dirname> to find the largest exposure in a directory.

For single site calculations with disagg_by_src it is possible to visualize the most relevant sources with the command oq show relevant_sources:<IMT>.

We extended oq engine --delete-calculation to remove both calc_XXX.hdf5 and calc_XXX_tmp.hdf5 when deleting a job.

We extended oq importcalc calc_hdf5 to also import calculation files outside oqdata by moving them inside oqdata first.

WebUI#

We added an endpoint /v1/calc/:calc_id/exposure_by_mmi to extract the exposure_by_mmi output (in JSON format) for risk calculations from ShakeMaps.

We added an endpoint /v1/calc/:calc_id/impact to extract the impact results (in JSON format) for scenario risk calculations.

We added an endpoint /v1/calc/jobs_from_inis to determine which job.ini files have been already run (in that case we return the associated job_id) or not (in that case we return 0). This was required by the PAPERS project.

We added a private endpoint /v1/calc/run_ini to run calculations starting from a remote .ini file, assuming the client knows its full path name, again fro the PAPERS project.

We made the hidden outputs (i.e. exposure and job_zip) visible and downloadable only for users of level 2 or when the WebUI authentication is disabled.

We made it possible to select multiple input files (and not only a single zip file) when running a job via the WebUI.

We do not show anymore in the WebUI the jobs that have the relevant flat set to false. This is a way to hide jobs without removing them.

IT#

In this release we removed the pyshp dependency (since we have fiona) and we added the Uber’s package h3 (version 3.7.7). This is the last release supporting Python 3.10 and numpy 1.26: in the next release we will upgrade all numpy-dependent libraries (including the geospatial libraries) and we will require at least Python 3.11.

We changed the parallelization parameters to try to ensure at least one GB of RAM per thread, possibly at the cost of using less threads than available. For instance, if your machine has 22 threads and 16 GB of RAM only 16 threads will be used by default.

It is still possible, as always, to specify the num_cores parameter in the openquake.cfg file to explicitly control the number of processes in the process pool. Users on a laptop with not much memory are recommended to close the browser and run their calculations from the command line.

AELO and OQImpact#

Hundreds of pull requests were made in the engine codebase to support the AELO and OQImpact platforms. However, since those are private platforms, the related improvements will not be listed here.