Release notes v3.16#
Version 3.16 is the second Long Term Support (LTS) release of the engine, replacing version 3.11 after a gap of two years. It is the result of five months of work involving over 320 pull requests, featuring major optimizations and new features.
The complete list of changes is listed in the changelog:
https://github.com/gem/oq-engine/blob/engine-3.16/debian/changelog
A summary is given below.
Memory optimizations in classical PSHA#
Later this year GEM will release an updated version of the Global Hazard Mosaic using a finer grid (with ~3 times more sites) and more intensity measure types and levels (7x25 instead of 6x20). Therefore the new models will be roughly 5 times more computational intensive, requiring 5 times more memory and disk space.
Special care was taken to reduce the memory consumption of classical calculations. For instance, the ESHM20 model for Europe, which before ran on our server with 512 GB of RAM, with version 3.15 would require over 2 TB and it would just run out of memory.
With version 3.16 the engine automatically splits the sites in tiles to keep the memory below a limit of around 2 GB per core and runs the tiles in parallel. For even larger calculations, that would not be enough, since the logic of the classical calculators requires keeping a huge array of PoEs (~160 GB for the updated European model) in the master node that would ran out of memory. In that case the engine runs the tiles sequentially: for instance, by splitting in 4 tiles, only 40 GB would be required on the master node for Europe.
All that is done automatically: previously the user had to painfully
determine the right max_sites_per_tile
parameter, which is not
needed anymore. There is instead a parameter pmap_max_gb
with a
default value 1 which can be used to control the memory used on the
workers, but regular users will never have to touch it.
We improved the point source grid approximation to keep the runtime of the models reasonable. On top of a performance improvement (up to a 2x speedup by keeping the same precision) we fixed a memory issue with models containing point sources with very large magnitudes (such practice is arguably incorrect, but common in some of the models of the GEM mosaic). In such situations, the magnitude-scaling relationship can produce rupture lengths of thousands of kilometers, causing the point source gridding approximation to keep in memory huge amount of data and thus sending the system out of memory. We have solved the problem by limiting the rupture radius to the integration distance; moreover the engine prints a warning when point sources with magnitude >= 8 are found, so that new models can avoid the issue altogether.
Finally, we notice that for small calculations the improvements will be less visible or even not visible at all, depending on the parameters and the optimizations used.
Other improvements in classical PSHA and disaggregation#
We optimized the parsing of the sources in XML format (with a 35x speedup for the Alaska model) since for some models it was the dominating factor in single-site calculations.
We optimized the preclassical phase of a calculation in presence of large complex fault sources (this was affecting the South America model and others) since slow tasks were the dominating factor in single-site calculations.
We improved substantially the preclassical runtime in calculations with multifault sources (relevant for the UCERF model and others) by splitting the sources upfront.
As usual the source weighting algorithm has been refined to reduce the slow tasks in the classical phase of the calculation.
The postclassical calculator has been optimized to reduce the PoEs reading time, which could become substantial in extra-large calculations, expecially on clusters with a large number of cores. This has become relevant now that the calculations are becoming 5 times larger; for instance for the European model the number of PoEs to read will increase from ~32 GB to ~160 GB.
As part of the future Global Hazard Model update the models have changed to convert the Intensity Measure Component of the GMPEs to “Geometric Mean” when possible; when not possible, a warning is printed. To enable this feature you just need to add a line
horiz_comp_to_geom_mean = true
to your job.ini file.
In the context of the METIS project we added a way to include the aftershock contribution to the hazard by reading a file of corrected occurrence rates for the ruptures of a model.
The disagg_by_src
feature has been changed to store only the mean
PoEs across the realizations for models using sampling. The user can
customize what happens by setting the flag collect_rlzs
(“true”
means store the mean, “false” mean store all realizations). Thanks to
this change now large models like EUR do not run out of memory when using
the flag disagg_by_src=true
.
There was a major optimization in the disaggregation calculator (we measured
speedups of over 76 times in the disaggregation part) obtained by replacing
the scipy.truncnorm.sf
function with our own function. Also, our own
truncnorm_sf
function has been simplified and truncation levels close
to zero (<= 1E-9) are now as treated as zero.
We added the possibility to define the edges of the bins explicitly, which is useful when comparing disaggregations coming from different calculations.
When using num_disagg_rlzs=0
the engine was logging the nonsensical
message Total output size: 0 B
. Now you get the correct output size.
In the context of the new release of the Global Hazard Mosaic we are adding some feature to make the engine aware of the mosaic, like a utility to determine which mosaic model to use given a longitude and latitude.
The header of the UHS files changed slightly, using fewer digits, to make the test run across different platforms in spite of minor numeric differences.
Additions to hazardlib#
Prajakta Jadhav and Dharma Wijewickreme contributed a new GMPE for Zhang and Zhao (2005) (see https://github.com/gem/oq-engine/pull/7766).
Trevor Allen contributed some enhancements to Australian GMPEs (see https://github.com/gem/oq-engine/pull/8205).
Guillaume Daniel contributed a bug-fix to the HMTK, in the function used for the Stepp (1972) completeness analysis (see https://github.com/gem/oq-engine/pull/8127).
Graeme Weatherill contributed some aliases for the GSIMs used in the ESHM20 model for Europe.
C. Bruce Wprden asked to change the AbrahamsonEtAl2014 GMPE to extrapolate the Vs30 so that it could be used with the official J-SHIS Vs30 model of Japan (see https://github.com/gem/oq-engine/pull/8171).
We implemented the correction of Lanzano et al. (2019) as described in Lanzano et al. (2022).
We introduced a new GMPE KothaEtAl2020regional where the site-specific
(delta_c3
and its standard error) and source-specific (delta_l2l
and
its standard error) values are automatically selected. Since the
procedure is slow it is meant to be used solely for single-site
calculations. It is also likely to change in the future.
Since a few versions ago, the engine has the ability to modify the magnitude frequency distribution from the logic tree with code like the following:
<uncertaintyModel>
<faultActivityData slipRate="20.0" rigidity="32" />
</uncertaintyModel>
Now it is also possible to specify a constant_term
; before it was hard
coded to 9.1.
ModifiableGMPEs with underlying GMPETables were not receiving a
single magnitude when calling the compute
method, thus resulting
into an error. This has been fixed.
The AtkinsonBoore2006 GMPE was giving an error when used with stress drop adjustment, a regression caused by the vectorization work performed months ago. This has been fixed.
Source groups with sources producing mutually exclusive ruptures have
been extended to include the concept of grp_probability
(before it
was hard-coded to 1); this is relevant for the new Japan model.
Risk#
We have two major new features, which for the moment are to be considered experimental: ground motion fields conditioned on station data, as discussed in https://github.com/gem/oq-engine/issues/8317, and reinsurance calculations, as described in https://github.com/gem/oq-engine/issues/7886.
We welcome users wanting to try the new features and understanding that usage and implementation details may change in future versions of the engine. We also welcome feedback on these experimental features.
We optimized the rupture sampling for MultiFaultSources and now the UCERF model is usable, even if still slow in the sampling part.
We implemented a major optimization in event_based_risk
starting from
precomputed ground motion fields. As a matter of fact, it is now
possible to compute GMFs at continental scale, producing hundreds
of gigabytes of data, and then run risk calculations country-by-country
without running out of memory. This case was previously intractable.
As part of this work, we removed the limit of 4 billion rows for the
gmf_data
table and we added a parameter max_gmvs_per_task
in the
job.ini that can be used to regulate the memory consumption (the
default is 1,000,000).
We added a parameter max_aggregations
in the job.ini: its purpose is
to make it possible to increase the number of risk aggregations that was
previously hard-coded to 65,536. The default is now 100,000 aggregations.
As a convenience, we changed the risk calculators to reuse the
hazard exposure when running with the --hc
flag: before the exposure
had to be read every time, even if it was already saved in the hazard
datastore, which was annoying and slow for large exposures with millions
of assets.
We added an early consistency check on the taxonomy mapping in case of consequences, so that now you get a clear error before starting the calculation and not a confusing error in the middle of it.
For large exposures and many realizations now the engine raises an early
error forcing the user to set the parameter collect_rlzs
.
This is preferable to going out of memory in the middle of a computation.
Finally, we changed the logic in the calculation of loss curves and
averages in classical_risk/classical_bcr
calculations, by taking
into consideration the risk_investigation_time
parameter (see
https://github.com/gem/oq-engine/pull/8046). As a consequence, the
numbers generated are slightly different than before. We now also raise an
error when a loss curve is computed starting from a flat hazard
curve, since in that case numeric errors make the results
unreliable. The solution is to reduce the hazard investigation time to
have a less flat curve.
Bug fixes and new checks#
We fixed a long standing a bug which entered in engine 3.9 and was affecting the USA model, specifically the area around the New Madrid cluster, producing incorrect hazard curves and maps.
We fixed a bug in models using the reqv
feature to collapse point sources:
all point sources were collapsed, and not only the ones with the
tectonic region types specified in the reqv
dictionary.
We raised the recursion limit to work around an error
maximum recursion depth exceeded while pickling an object
happening in
classical calculations with extra-large logic trees.
We fixed a bug breaking the fullreport.rst output for NGAEast GMPEs.
min_mag
and max_mag
were not honored when using a
magnitude-dependent maximum distance: this is now fixed.
We fixed a bug when running a classical calculation starting from a preclassical one, appearing only in the case of tabular GMPEs, like in the Canada model.
We fixed a bug such that using the --hc
option caused the site model of
the child calculation to be associated incorrectly.
We removed the conditional spectrum calculator which was giving incorrect results. You should use a later versions of the engine (>=3.17) if you want it to work reliably.
We fixed a bug in the classical_risk
calculator, where the avg_losses
output was not stored and therefore not exportable
(see https://github.com/gem/oq-engine/pull/8267).
We fixed a bug in vulnerability functions using the Beta distribution: the case of zero coefficients of variation was not treated correctly (see https://github.com/gem/oq-engine/pull/8060).
We fixed a few bugs affecting the visualization of risk curves and aggregated risk via the QGIS plugin.
oq commands#
For years the engine has had a command oq nrml_to
to convert source
model in NRML format to CSV or geopackage format, but we were missing
a command oq nrml_from
to convert back to NRML. This has been
finally implemented, therefore it is now possible to read a source model,
convert it into .gpkg, modify it with QGIS and covert it back to NRML,
a feature users wanted for years.
However, not all source models are convertible since not all source typologies are convertible, nor there are plan to make them convertible in the future. For instance multi-fault sources have an efficient HDF5 storage and it would make little sense to convert them into .gpkg, because they are so large that they would simply send QGIS out of memory, not the mention the fact that it would be impossible to edit millions of surfaces by hand. The feature instead is very useful for area sources, simple fault sources and complex fault sources which are fully supported.
We fixed a small bug in the command oq shakemap2gmfs
that was not
accepting fractional truncation levels, only integer ones.
We added a command oq purge failed
to remove old calculations that
ended up in status ‘failed’; it can be run periodically to save
disk space.
We added a command oq workers debug
to test the correctness of
a cluster installation.
IT#
There were a couple of major changes in the zmq distribution mechanism in cluster environments. The first change was to move the task queue to the worker nodes: as a consequence, calculations that before were running out of memory on the master node now run without issues. The second change was to implement a partial load balancing of the tasks, resulting in huge improvements in calculations affected by slow tasks.
At user request, we added to the WebUI the ability to specify a non-standard prefix path by setting the environment variable WEBUI_PATHPREFIX. This is documented here: https://github.com/gem/oq-engine/blob/engine-3.16/docker/README.md
We fully removed the celery support that has been deprecated for 5 years.
We removed support for Python <= 3.7 and added support for Python 3.10 on all platforms (Linux, Windows, macOS).
We added support for the geospatial library fiona on all platforms.
We produced RPM and debian packages, as well as an .exe installer for Windows.
The universal installer has grown a --venv
option so that you can
chose where to create the virtual environmente (the default is still
$HOME/openquake).
We revamped the docs site and now both the regular manual and the advanced manual are versioned (see https://docs.openquake.org/oq-engine/master/).
We now distribute a test calculation https://downloads.openquake.org/jobs/performance.zip which can be used to measure the performance of a server. It runs in ~30 minutes on a recent MacBook with the M1 processor or a recent 18 core Xeon processor.