Release notes v3.5#

This is a major release featuring a new multi_risk calculator, several improvements to the hazard and risk calculators as well as a few bug fixes. Nearly 140 pull requests were merged. For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.5/debian/changelog

Hazard calculators#

There was a big improvement in the case of extra-large source models. Now we use approximately 4 times less memory then before: for instance, we went down from 100 GB of RAM required to run Australia, to only 25 GB. This makes it possible to run large calculations on memory-constrained machines. The change also reduces substantially the data transfer in sources. It was made possible by an advance in the parallelization strategy introduced in engine 3.4 (subtasks) that also reduced substantially the problem of slow tasks.
We reduced the data transfer when computing the hazard curves and statistics, sometimes with spectacular results (like reducing the calculation time from 42 minutes to 42 seconds). Usually however the computation of the curves and statistics does not dominate the total time, so you may not see any significant difference.
We optimized the checks performed on the source model logic tree, things like making sure that applyToSources refers to sources that actually exists. Again, even if the gain was spectacular (from 15 minutes to 15 seconds in the case of Australia), you will likely see no difference because those checks are not dominating the total computation time, unless you are using the preclassical calculator.
We restored the traditional sampling logic used in the engine until 18 months ago. This makes the implementation of some feature easier, since all the realizations have the same weight, at the cost of making more difficult other features that were deemed less important. In practice, it means that you will have slightly different numbers in calculations with sampling of the logic tree, but such differences are akin to a change of the random_seed, i.e. not relevant.
Since a long time ago, the event based calculator has a debug flag compare_with_classical=true which can be used to compare the produced hazard curves with the classical hazard curves. Since it was a debugging flag meant for internal usage it was missing some pieces, in particular it was not possible to export the generated curves in the usual way. This has been fixed.
Since release 3.4 the GMF exporter exports a file sig_eps.csv containing for each event ID the associated inter-event standard deviation and the residual. There is now an additional column rlzi for the associated realization index.

Risk calculators#

A major refactoring of all risk calculators was performed, with a significant benefit both in terms of reduced complexity and of improved speed. In particular we saw a speedup of 2x in some event based risk calculations (in the risk part, not the hazard part).
In order to make the refactoring possible we had to change the classical_risk calculator, that was using different loss ratios for different taxonomies. Now the calculator uses the same loss ratios for all taxonomies. As a consequence, you may see some slight difference in the generated loss curves.
We changed the order in which the events are stored, with an effect on event based risk calculations with coefficients of variations. The change is akin to a change of seed, i.e. not relevant. Moreover, now the epsilons are stored and not transferred via rabbitmq, thus making the calculator simpler and more efficient.
Thanks to the change in the epsilons, now the ebrisk calculator is able to manage vulnerability functions with coefficients of variations, which means that it is getting close to become a full replacement for the event based risk calculator. Also, some export bugs in ebrisk were fixed.
event_based_risk calculations with zero coefficients of variations (i.e. with no epsilons) have been optimized in the same way as we did for ebrisk. This makes a difference if you have multiple assets of the same taxonomy on the same hazard site, otherwise the performance is the same as before.
The way the risk models are stored internally has changed significantly, to make it possible (in the future) an unification of the scenario_risk and scenario_damage calculators.
We changed the scenario_damage calculator to accept silently single event calculations (before it was raising a warning): in this case we do not compute and do not export the standard deviations (before they were exported as NaNs).
A new flag modal_damage_state has been added to the scenario damage calculator. If it is set, instead of exporting for each asset the number of buildings in each damage state, the engine will export for each asset the damage state where most buildings are. This is a new and still experimental feature.
A new experimental calculator called multi_risk has been introduced in the context of the CRAVE project (Collaborative Risk Assessment for Volcanoes and Earthquakes). It allows to compute losses and fatalities for volcanic perils associated to ash fall, lava, lahar and pyroclastic flow, but it is designed to be extensible to other kind of perils.

hazardlib#

We had five external contributions to hazarlib.

Michal Kolaj provided tabular GMPEs for Canada.
Graeme Weatherill provided the Kotha et al. (2019) shallow crustal GMPE and added a few adjustments to the BC Hydro GMPE to allow the user to calibrate the anelastic attenuation coefficient (theta 6) and the statistical uncertainty (sigma mu), for use in SERA project.
Guillaume Daniel updated the Berge-Thierry (2003) GSIMs and added several alternatives for use with Mw.
Chris van Houtte from New Zealand added a new class for Christchurch GSIMs of kind Bradley (2013b).
Giovanni Lanzano from INGV contributed the Lanzano and Luzi (2019) GMPE for volcanic zones in Italy

The job queue#

The engine has now a job queue. The feature for the moment is experimental and disabled by default, but it will likely become the standard in the future. To enable the queue set the serialize_jobs flag in the openquake.cfg file:

[distribution]
serialize_jobs = true

When the queue is on, calculations are serialized, i.e. if N users try to run calculations concurrently only one calculation will run (the first submitted) and the other N-1 will wait. As soon as a calculation ends, the next one in the queue will start, by preserving the submission order. The queue is very simple and has no concept of priority, but it is extremely useful in case of large calculations. It solves the problem of a large calculation sending the cluster out of memory and killing the calculations of other users.

As a side effect of the work on the queue system, various things have been improved. In particular, now importing the engine as a library will not change the way the SIGTERM, SIGINT and SIGHUP signals are managed, an ugly side effect of previous releases of the engine.

Bug fixes#

There was a bug in the management of the disaggregation variable iml_disagg: now the IMTs are correctly normalized. Without the fix, using “SA(0.7)” in iml_disagg and “SA(0.70)” in the vulnerability functions (or viceversa) would have raised an error.
When reading the site model in CSV format (a recently introduced feature) the field names were not ordered and vs30measured was read as a float, not as a boolean. This caused an error which has been fixed.
There was a bug while exporting the hazard curves in the case of GMPETables (i.e. for the Canada model). It has been fixed.
There was also a bug in the GMF export when the GMFs were originally coming from a CSV file, a regression introduced in engine 3.4. It has been fixed.
We were losing line number information when parsing invalid gsim_logic_tree.xml files: now the error message contains the right line number.
There was a bug when using applyToSources with applyToBranches in source model logic tree with multiple source models. It has been fixed and now it is actually required to specify applyToBranches in such situations.
It was impossible to export individual hazard curves from an event based calculations, since only the statistical hazard curves were stored. The issue has been fixed.

oq commands#

We extended the command oq prepare_site_model to work with sites.csv files and not only exposures. This make it possible to prepare the site model files for hazard calculations, when the exposure is not know.
We extended the command oq restore to download from URLs and to change the calculation owner: as a consequence, we have now an official mechanism to distribute engine calculations as a zip archive of HDF5 files.

Deprecations#

Source model logic trees with more than one branching level are deprecated and raise a warning: they will raise an error in the future.
Windows 7 is deprecated as a platform for running the engine since it is reaching the End-of-Life.

Removals#

We removed the dependency from rtree. Now it is a bit easier to install the engine on Python 3.7 where the rtree wheel is not available.
We changed a bit the web API used by the QGIS plugin and removed the endpoints aggregate_by/curves_by_tag and aggregate_by/losses_by_tag. They were experimental and we have now a better way to perform the aggregations.
We removed the ‘gsims’ column from the realizations output since it was buggy (the names of the GSIMs could be truncated for long lists) and not particularly useful.
We removed from the engine the ability to compute insured losses. This feature has been deprecated for a long time and was also buggy for scenario_risk. Users wanting the feature back should talk with us.
We removed the parallel reading of the exposure introduced in engine v3.4 since it was found buggy when used with the ebrisk calculator. It may return in the future.