Release notes v3.8#
This is a major release featuring several optimizations, new features and bug fixes: around 285 pull requests were merged.
For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.8/debian/changelog
Major optimizations#
Classical calculations dominated by point sources with non-trivial
nodal plane / hypocenter distributions and using the
pointsource_distance
approximation (see
https://github.com/gem/oq-engine/blob/engine-3.8/doc/adv-manual/common-mistakes.rst#pointsource_distance)
have been optimized. The improvement applies to several models, and in
particular the once extra-slow Canada and Australia national hazard
calculations are now 2-3× faster.
Large ebrisk calculations has been substantially optimized in memory occupation, particularly when generating aggregate loss curves for a large numbers of tags. The gain can be of orders of magnitude. This was made possible by storing the (partial) asset loss table (see https://github.com/gem/oq-engine/blob/engine-3.8/doc/adv-manual/risk.rst#the-asset-loss-table) and by computing the loss curves in post-processing.
In general the memory occupation of the engine in calculations with subtasks (i.e. classical and ebrisk) has gone down, since we discovered that more data than needed was accidentally transferred and the issue has been fixed resulting in a substantial improvement (like from 120 GB to 30 GB of RAM used in the master node for the Australia calculation).
The task distribution, has been substantially changed and
improved. Now we are back to an algorithm that makes the number of
generated tasks deterministic, while in engine 3.6 and 3.7 the number was
dependent on the load on the server. The difficult part was to make
this possible without incurring in the slow task penalty and actually
we managed to improve on that front too. Moreover, differently from
engine 3.7, now the engine will try to submit all tasks upfront,
unless the parameter num_cores
is set in the job.ini
file.
Other optimizations#
We reduced the data transfer in classical and event based calculations by reading the site collection instead of transferring it, thus fixing a (minor) regression temporarily introduced in engine 3.7.
We were parsing the gsim logic tree file (and the underlying files, if any) multiple times; this has been fixed.
We changed the seed algorithm used when sampling the source models to avoid using more GMPEs than needed in some cases. Due to this change, if there is more than one source model you could get different numbers with respect to engine 3.7, but this is a non-issue. We also changed the logic tree sampling algorithm by ordering the branchsets lexicographically by tectonic region type. This can also affect the numbers but, again, it is akin to a change of seed and therefore a non-issue.
We reduced the memory footprint in event based and ebrisk calculations by reading one rupture at the time in the worker nodes, instead of a bunch of ruptures. Moreover the rupture prefiltering mechanism has been greatly optimized by using a KDTree approach and by removing the need for prefiltering the ruptures twice.
We optimized the computation of PointSources in the case when only the rjb distance is required: then the hypocenter distribution can be collapsed.
Some models (like the Canada model 2015) may have duplicated values in the nodal plane or hypocenter distributions, causing the calculation to become slower. This has been fixed and now the engine automatically regularizes the nodal plane and hypocenter distributions, by printing a warning.
New features#
We extended the framework to compute consequences from scenario damage calculations. Any kind of consequence can be computed by adding a custom plugin function, even if at the moment the only plugin function included in the engine is the one to compute economic losses. We will add more plugins in the near future.
We changed the format for the consequence file, which now can be a simple CSV file with the coefficients per each damage state per each loss type per each tag name per each plugin function (see https://github.com/gem/oq-engine/blob/engine-3.8/doc/adv-manual/risk-features.rst for an explanation).
There is a new experimental calculator event_based_damage
which is
able to compute damage probabilities starting from an event based hazard
calculation and a set of fragility curves. Essentially, it is a
generalization of the scenario_damage
calculator to the case of
multiple ruptures. The calculator is also able to compute extended consequences
from the damage probabilities. The work is in progress and the calculator will
likely be extended and improved in the future; if you are interested in
become beta testers, please let us know.
There is an experimental way to serialize ruptures in text
files (oq extract rupture/<rup_id> <calc_id>
) but it has been left
undocumented on purpose since it will likely change.
Scenario calculators can read a rupture exported in the new format,
so it is already possible to run an event based
calculation, extract interesting ruptures and then perform scenarios
on those. The challenge is to make this work across different versions
of the engine and this is why it is still in experimental status.
Beta testers are welcome.
We extended the disaggregation calculator so that it can work with
multiple realizations at once, provided the user specifies
the rlz_index
or the num_rlzs_disagg
parameters in the job.ini.
This is useful in order to assess the variability of the disaggregation
results depending on the chosen realizations. Beta testers are welcome.
We revised the logic to manage GMPE depending on an external file - like
the commonly used GMPETable
which depends on an .hdf5 file - and
introduced a naming convention: now GMPE arguments ending in _file
or _table
as considered path names relative to the gsim_logic_tree
file and are automagically converted into absolute path names.
Moreover the GMPE logic tree is properly saved in the datastore,
including the external files, so that a calculation can be copied
to a different machine without losing information (before this was
implemented in a hackish way working only for .hdf5 tables).
Finally we added a mechanism to override the job.ini
parameters from
the command line; here is an example overriding the export_dir
parameter:
$ oq engine --run job.ini --exports csv --param export_dir=/my/output/dir
Work on hazardlib#
G. Weatherill added a configurable nonergodic option to BC Hydro and SERA GMPEs. Moreover he revised the SERA BCHydro Epistemic GMPES, updated the SERA craton GMPE coefficients, added a fix to the Kotha 2019 GMPE, and implemented Abrahamson et al. (2018) “BC Hydro Update” GMPE.
M. Pagani added some of the Morikawa-Fujiwara GMPEs for Japan.
Moreover he added an alternative way controlling the way the OQ Engine
models hypocenters in distributed seismicity (see
https://github.com/gem/oq-engine/pull/5209). In order to use this option,
the user must set in the .ini file shift_hypo = true
.
R. Gee fixed the CampbellBozorgnia2003NSHMP2007
GMPEs which was
missing the line DEFINED_FOR_REFERENCE_VELOCITY= 760
. This affected
the latest Alaska model that could not be run.
M. Simionato introduced a geometric average GMPE called
AvgGMPE
which is able to compute the geometric average of its underlying
GMPEs, with the proper weights. It can be used to collapse the GMPE logic
tree, for users wanting to do so. This can reduce the size of the generated
GMFs by orders on the magnitude, depending on the size of the
GMPE logic tree.
Finally, we cached the method CoeffsTable.__getitem__
to avoiding excessive
interpolation. The issue was visible if the CoeffsTable
was
instantiated inside the method get_mean_and_stddevs
, which is a very
bad idea performance-wise. Now we have added a check to forbid such
instantiation. The right place where to instantiate the CoeffsTable
is
inside the __init__
method of the GMPE.
Changes in the inputs#
In the job.ini
file we recommend to use the names mean
and quantiles
instead of old names mean_hazard_curves
and quantile_hazard_curves
. The old
names still work and will always work for backward compatibility, but the
new names are better since the calculation of statistics applies also to
other outputs and not only to the hazard curves.
In the job.ini
we changed the syntax for the RepiEquivalent
feature, see
https://github.com/gem/oq-engine/blob/engine-3.8/doc/adv-manual/equivalent-distance-approximation.rst#equivalent-epicenter-distance-approximation
The old syntax is still working but it raises a deprecation warning.
The GMF importer in CSV format has been extended, and it can import files with more IMTs than the ones used in the calculation: they are simply imported and then ignored, without raising an exception.
Changes in the outputs#
We harmonized the headers in the CSV files exported from the engine. In particular we renamed rlzi -> rlz_id, ordinal -> rlz_id, asset -> asset_id, id -> asset_id, rupid -> rup_id, id -> event_id in various files (there were plenty of such inconsistencies for historical reasons).
We fixed the order when exporting the ruptures in CSV: this is very
useful when comparing the results of two calculations with different
parameters. The order is now by rupture ID, rup_id
.
We fixed the CSV exporter for the ruptures, since the boundary
information was truncated at 100 characters. Actually, we completely
rewrote it and now the information is extracted via the
/extract/rupture_info
in a very efficient way, because the WKT is
gzipped, with a reduction in the data transfer up to 100x.
The QGIS plugin has been updated to use the new API which has been
extended to accept a min_mag
parameter to filter out the small
magnitudes and reduce substantially the download time.
We deprecated the XML exporter for the ruptures, since it is extremely
verbose and inefficient. In the future we will replace it with a better
solution. For the moment you can still use it or, better, you can use
the /extract/rupture_info
binary API.
In ebrisk
calculations we have clearly split the outputs aggregated
by tag from the total outputs: the new names are ‘Aggregate
Losses/Aggregate Loss Curves’, and ‘Total Losses/Total Loss Curves’
respectively. This avoids a lot of confusion, because in the past the
total loss curves where are also called aggregate. The total outputs
are computed from the event loss table while the aggregate outputs are
computed from the asset loss table. We have also removed the outputs
agg_maps-rlzs
and agg_maps-stats
that were only accidentally
exported in engine 3.7.
Bug fixes#
We fixed the ShakeMap download to support again ShakeMaps in zipped format, a regression accidentally introduced in engine 3.7.1.
We fixed an issue of quotes in the exposure tags.
We fixed a long standing issue with NaNs in scenario_damage
calculations:
the cause was a missing noDamageLimit
in the fragility files. Now a
missing noDamageLimit
is treated as zero.
We fixed a SWMR bug in the disaggregation calculator, happening some times with large calculations and giving mysterious read errors.
The GMFs were stored even with ground_motion_fields=false
in case of
hazard_curves_from_gmfs=true
. This has been fixed.
The removal of calculations that was not working since it was
trying to delete obsolete and non-existing files has now been fixed.
We also fixed the command oq reset
when used with a stopped DbServer.
We hard-coded the distance used in the filtering to rrup
, to simplify
the logic and to avoid an error in disaggregation with GMPEs not using
rrup
(the distance was not saved but needed, thus causing a failure).
We fixed a bug in ebrisk
with aggregate_by
when building the
rup_loss_table
.
One of our users discovered a RecursionError when pickling Python objects in a disaggregation calculation:
pickle.dumps(obj, pickle.HIGHEST_PROTOCOL)
RecursionError: maximum recursion depth exceeded while calling a Python object
We temporarily fixed this by raising the recursion limit, but let us know if you get the same error (it should appear only in extra-large calculations).
The error message for pickling over the 4 GB limit is now sent back to the controller node instead of appearing only in the worker logs.
New checks#
We relaxed a check that was too string on the minimum_intensity
parameter
of the previous calculation.
We relaxed the check on IMT-dependent weights: now in case of sampling the IMT-dependent weights are ignored and the engine prints a warning, but does not stop.
We added a check on acceptable input keys in the job.ini
to protect against
mispellings like esposure_file
instead of exposure_file
.
We added a warning against implicit hazard levels, extracted from the risk functions. In the future specifying explicitly the hazard levels will become mandatory.
We added a check for duplicated sites in the site model. This typically happens when the user supplies more than 5 digits for the geographic coordinates, while the engine truncate to 5 digits (1 meter resolution) and then sites that looks different becomes duplicated. In turn, this may cause wrong asset associations and produce completely bogus numbers in the final results.
There is now an error You are specifying grid and sites at the same time:
which one do you want? to force users to be explicit with their input files.
Moreover, setting both hazard_curves.csv
and site_model.csv
is an error
and it is correctly flagged so.
On the other hand, speciying both sites and site models at the same time is now valid again and the “sites overdetermined” check has been removed.
Trying to read a GMFs file in XML format, a feature which had been removed long ago, now raises a clear error message.
There is a new upper limit on the size of event based calculations, to
stop people from trying to run impossibly large calculations. You will
get an error if the number of sites times the number of events is larger
than the parameter max_potential_gmfs
which has a default value of 2E11.
We improved the error message when using precomputed gmfs in scenarios
with event_id
not starting from zero.
We improved the error message for empty risk EB calculations: now you will get There are no GMFs available: perhaps you set ground_motion_fields=False or a large minimum_intensity.
WebUI and WebAPI#
As usual, we worked on the WebAPI to better support the
QGIS plugin; in particular it is now possible to extract a specific
ground motion field by specifying the ID of the event generating it
and by calling the URL /v1/calc/ID/extract/gmf_data?event_id=XXX
.
It is also possible to extract the total number of events with a call
to /v1/calc/ID/extract/num_events
.
The call to /v1/calc/ID/status
now correctly returns an HTTP error
404 for non-existing calculations.
We now store the WKT representation of each source geometry and we
added an endpoint /extract/source_info?sm_id=0
to get that information.
We added an endpoint /v1/calc/validate_zip
for use in the Input
Preparation Toolkit, to validate the input files.
When using access control in the WebUI we changed the default ACL_ON = True
to False, thus making it possible to export results from calculations
ran by other users, if the calculation ID is known.
Finally, Django has been upgraded to the version 2.2.
IT#
The zmq
mechanism, which has been in experimental stage
for years, has finally been promoted to production ready: both of our clusters
use it. The celery/rabbitmq distribution mechanism is not deprecated yet,
but eventually it will be, because zmq
is a superior alternative, using
less memory and being more efficient, as well as having no dependency
from Erlang.
The engine distribution now includes pandas, a feature much requested by our users. There is also some support for converting datasets in the datastore into pandas DataFrames. The goal is to make it easy to postprocess the engine results with pandas. For instance the portfolio loss curves in event based risk calculations are now computed with pandas.
We introduced support for RHEL and CentOS 8, which is also used for our Docker images.
We improved the Windows distribution that now can be used also to migrate to a development installation from a nightly release.
Alberto Chiusole made the point that the engine should report the number of available cores, not the number of real cores, since they are not necessarily the same in a container. We fixed that on Linux by using the cpu_affinity function (not available on macOS).
Some internal commands (in particular oq show performance
,
oq show task_info
and oq show job_info
) have been fixed,
changed or enhanced, so that they now return more correct
information. After two years we are finally back to a situation
where they can be called on a running calculation, thanks
to the usage of the SWMR mode in the HDF5 libraries.