Release notes v3.6#

This release features several major new features (including completely revised disaggregation, automatic optimization of duplicated sources, fast exposure importer and taxonomy mapping) and lots of improvements, new checks and bug fixes. Nearly 200 pull requests were merged. For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.6/debian/changelog

Disaggregation#

The most relevant development on the hazard side was the work on the disaggregation calculator. We changed substantially the business logic. While in previous versions of the engine we were disaggregating for all possible realizations in order to compute disaggregation statistics, in this version we gave up on statistics. Instead, we disaggregate only for a specific realization. The realization can be specified by the user with the rlz_index parameter in the job.ini file, or it can be determined automatically by the engine as the realization closest to the mean curve for the given disaggregation site.

Moreover, now the disaggregation calculation works like a post-calculator (i.e. with the --hc option) and it is able to reuse information computed in its parent calculation: the net effect is that it is always faster than the corresponding classical calculation while in the past it was several times slower. We also fixed a couple of performance bugs: there was a slow operation truncnorm.cdf in an inner loop and ruptures outside the integration distance were not discarded. Finally, we changed the file names of the disaggregation outputs.

For models with thousands of realizations, the disaggregation can easily be thousands of times faster than before.

Classical PSHA calculator#

The engine is now smart enough to recognize duplicated sources appearing in different branches of the composite source model and to avoid redundant computations. Because this optimization is always on, the flag optimize_same_id_sources has been removed, as it has now been rendered useless. There are several models in the hazard mosaic with duplicates sources and the new optimization has a significant impact on those. Moreover the demo LogicTreeCase2ClassicalPSHA has become an order of magnitude faster than before thanks to the reduction of the duplicated sources.

There was a big improvement in the computation of the statistical hazard curves which now is not only faster, but uses a lot less memory than before. The trick was to consider one site at the time, instead of a block of sites. As a consequence it is now possible to consider tens of thousands of realizations for hundreds of thousands of sites without requiring terabytes of RAM. Moreover the data transfer has been reduced by storing some auxiliary information in the datastore and reading it from the workers instead of transferring it via celery/rabbitmq.

There was a substantial change in the way the tasks are distributed for a classical calculation. The engine has acquired the ability to estimate the runtime of each task and if the estimated time exceeds a task_duration parameter, the engine is able to split the task in subtasks that run in less than task_duration seconds. The user can set the task_duration manually in the job.ini, or she can leave it empty; in that case the engine will figure out a reasonable value for it.

The approach is not perfect since there are non-splittable sources, so there is a minimum size for a given subtask and sometimes subtasks taking much longer that the task_duration parameter may still appear: however, the new approach is a drastic improvement and the situation was never better than it is now.

We added a check on sources with a suspiciously large spatial extent (more than 5,000 km) so that a warning is printed. Usually this means that there was a bug in the generation of the source model.

We added a check on sources with suspicious hypo-depths and nodal plane distributions (i.e. distributions with duplicated values) since they make the calculation slower.

In extra-large models saving some debugging information (eg. the number of sites affected by each source) was exceedingly slow, so now the information is stored only if there are fewer than 100,000 relevant sources.

Logic trees#

There was a tricky bug with the minimum_distance feature in presence of multiple GSIMs in a logic tree branchset. Now each GSIMs keeps its own minimum distance; before they were all getting the same minimum distance, causing wrong results to be computed. Fortunaly the minimum_distance feature is rarely used (and only for internal purposes) so the bug is minor. The feature is documented here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/special-features.rst#gmpe-logic-trees-with-minimum_distance

We implemented zero weights for intensity measure types that should be discarded in the GSIM logic tree. You can see the relevant documentation here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/special-features.rst#gmpe-logic-trees-with-weighted-imts

We implemented risk logic trees, a.k.a. the taxonomy mapping feature. The idea is that users can map the taxonomy strings in their exposure to one or more vulnerability/fragility functions, with corresponding weights for each function assignment, to take into account the epistemic uncertainty in the exposure ⟷ vulnerability domain. The feature is documented here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/risk-features.rst

A big conceptual change (but with no impact on the user) was the simplification of the source model logic tree XML file. Before it was necessary to specify a logicTreeBranchingLevel node that was not used internally, now that node is optional. Old files will keep working, as long as the logicTreeBranchingLevel contains only a single subnode. The case of multiple subnodes is now correctly flagged as an error. Thanks to the change, source model logic trees, gsim logic tree, and risk logic trees are now stored in the same way internally.

Lastly, we fixed a bug in source model logic trees with the options applyToSources and applyToBranches on; in some times a fake error about the source not being in the source model - even if it actually was - was raised.

Event based hazard#

We introduced a parameter max_sites_per_gmf in the job.ini (only for event_based calculations that are trying to store ground motion fields), with a default of 65,536 sites. A user trying to run an event_based calculation that has ground_motion_fields = true, with more than the number of sites permitted by max_sites_per_gmf will now get an early validation error instead of running out of memory after several hours of calculations. The max_sites_per_gmf limit can be raised beyond the default of 65,536 sites, at the user’s own responsibility.

We also added a limit of 2**32 events in event based calculations: this is a hard limit that cannot be raised. If your calculation produces more than 4 billion events, it will need to be be split into smaller calculations. Such calculations involving billions of ruptures would likely never work anyway, because it would eventually run out of memory.

We added a check for missing intensity_measure_types: this avoids cryptic errors in the middle of the computation of the ground motion fields.

We optimized the rupture sampling procedure for point sources (which includes multi point sources and area sources). The improvement can be quite significant, for instance the generation of ruptures for a large multipoint source for Colombia became 30x faster using 12x less memory.

We changed the way ruptures are stored internally: the code field in the ruptures dataset is now a unique checksum depending on the kind of rupture. Before it was an incremental number depending on the order of the Python module imports which was making debugging difficult.

The rupture CSV exporter has been enhanced, and now it exports the rupture surface boundaries as 3D multipolygons instead of 2D multipolygons.

We fixed a small bug in the rupture XML exporter, which was failing if the user did not specify the hazard sites.

We added the ability to generate hazard curves without storing the GMFs, simply by setting the flags

  hazard_curves_from_gmfs = true
  ground_motion_fields = false

This is useful when one is interested in the hazard curves generated by an event_based calculation but not in the ground motion fields themselves. Not storing the GMFs reduces the data transfer and the memory occupation.

In engine 3.5 we changed the gmf_data CSV exporter to export a file sitemodel.csv instead of the file sitemesh.csv. That change has been reverted because it was generating confusion. The right way to to export the site model information for the most recently completed calculation - which works for all calculators, not only for event based - is to use the command oq show sitecol > sitecol.csv

Importing GMFs from CSV has been enhanced and now it does not require anymore the field rlzi: previously, this was a required field, but it was assumed to contain always the value 0. On the other hand, now the GMF exporter to CSV does not export the field rlzi, because it is redundant: the association between events and realizations can be found in the events table and it is exported in the file sigma_epsilon.csv.

In the sigma_epsilon.csv file, we renamed the field eid to event_id in order to avoid confusion with the naming used in the gmf_data.csv file (event_id is the 64 bit event ID in the events table in the datastore, eid is the 32 bit index to the event ID record).

Event based risk#

There was a huge refactoring of all risk calculators. As a consequence the event_based_risk calculator has become simpler and faster than before (twice as fast in some cases).

In the ebrisk calculator it is now possible to aggregate by asset_id and therefore to produce individual loss curves and maps for each asset. Needless to mention, this is only viable for exposures of manageable size.

There was some work to make the losses_by_event exporter for the ebrisk calculator more similar to the ones for event_based_risk and for scenario_risk.

We fixed a bug in the agg_curves-rlzs and agg_curves-stats outputs in ebrisk: they were missing the units compared to the same outputs coming from the event_based_risk calculator. This was breaking the QGIS plugin.

We changed the agglosses exporter in scenario_risk calculations, by adding a column with the realization index.

The agg_curves exporter for event based risk was broken if the exposure was imported in the parent calculation and not in the child calculation.

We fixed a bug in the exporter of the aggregate loss curves coming from an ebrisk calculation: now the loss ratios are computed correctly even in presence of occupants. Before the exporter was writing incorrect loss ratios to the output file.

Hazardlib#

Graeme Weatherill (@g-weatherill) contributed a finite rupture option to the Germany-adjusted Cauzzi and Derras GMPEs. Moreover, he contributed the Tromans et al. (2019) adjustable GMPE, used for a nuclear power plant in the UK.

Chris van Houtte (@cvanhoutte) contributed the Van Houtte et al. (2018) Significant duration model for New Zealand.

Robin Gee (@rcgee) fixed a bug in the GMPE Sharma (2009): there was a key error if the intensity measure level specified in the job.ini included periods that required interpolation.

Marco Pagani (@mmpagani) discovered a bug in calc_hazard_curves which was failing with a cryptic AttributeError: 'NoneType' object has no attribute 'within_bbox' when used in parallel mode. It has been fixed.

Risk#

The CSV importer for the exposure has been optimized. Before, for legacy reasons, the importer was converting the CSV records into node objects similar to the ones coming from the XML importer and then it was reusing the XML logic. Now we are doing the opposite: the XML importer is producing records and reusing the logic of the CSV importer. Thanks to this change for large CSV exposure the new importer is 4-5 times faster and uses over 10 times less memory than before.

Since a long time ago the engine has the ability to reduce the hazard site collection (which can be large, think of a fine grid) only to the locations where there are assets. Such feature has been optimized in this release, up to a spectacular extent in some cases: we measured a speedup from 2h to 0.1s for Canada.

We changed how zipped exposures are managed by the engine. In version 3.5 a zipped exposure was expected to contain an XML file with the same name of the archive, apart from the extension. Because of that the job.ini file had to contain a line exposure_file = <exposure_path>.xml while now it requires a line exposure_file = <exposure_path>.zip, which is clearer. The change was requested by the risk team in the context of the CRAVE project because it simplifies the unzipping of the exposure. Unzipping will overwrite files of the same name already present, but a warning will be printed and the original files will be not lost, but renamed with a .bak extension.

We added a consistency check between statistics for calculations leveraging the --hc option, because some users were making mistakes like trying to compute means in the child calculation without having them in the parent calculation. Now one gets a clear error message.

We fixed a bug in classical_damage from CSV with discrete fragility functions: for hazard intensity measure levels different from the fragility levels, the engine was giving incorrect results.

Vulnerability functions with the beta distribution must satisfy some consistency requirements if the coefficients of variation are nonzero. Unfortunately the consistency check were missing and it was possible to accept invalid functions raising and error in the middle of the computation. Now the error will be raised much early, at the time of the instantiating the vulnerability functions. See #4841 for more details.

Hyeuk Ryu (@dynaryu) discovered a bug in the agg_loss-curves outputs for the event_based_risk calculators, which has been fixed.

Finally there were some improvements to the multi_risk calculator in the context of the CRAVE project. In particular now the engine is able to manage the geometries of volcanic perils like lava, lahar and pyroclastic flow and it is also able to manage other binary perils without requiring the introduction of new intensity measure types.

General changes#

The CSV exporters have been enhanced: now there is an additional line before the header, starting with a # character, containing some metadata, like the date when the file the generated, the version of the engine that generated it, and some relevant parameters, like the investigation time in the case of the hazard outputs. In the future we may add even more metadata and extend the approach to other outputs.

Before release 3.3, the engine had the ability to associate site model parameters to hazard sites on a grid. This feature was sometimes buggy and removed, by recommending to the users the command oq prepare_site_model instead. oq prepare_site_model is able to produce a site_model.csv file with sites on the grid and it performs the associations explicitly, once and for all.

In this release, we restored the ability to perform the association directly in the engine. This is less efficient than using oq prepare_site_model, since the same associations will be recomputed during each run. It is still useful for people wanting to experiment with the grid spacing: they can run several calculations and when they are happy with the grid spacing, run oq prepare_site_model and fix the site model once and for all with the preferred grid spacing.

We fixed a performance regression in the ucerf_classical calculator, due to a change of logic in engine 3.5, which was trying to filter thousands of sources in the controller node instead than in the workers, thus becoming extra-slow.

We decided to change the realizations.csv output for scenario calculations, by replacing the branch_path field with the GSIM representation. This is more informative for the users and more convenient for the QGIS plugin too.

IT#

The job queue first introduced in engine-3.5 is now enabled by default. This means than only on job can run at a given time for a given engine instance.

The progress report has been improved: before in large classical calculations the progress started to be printed too late, even days after the start of the calculation.

We improved the oq abort command to remove submitted jobs too.

Deleting a calculation in the engine has always been tricky in the case of multiple users. In this release we fixed several issues and now an user can delete all of her calculations with the command oq reset. The engine will look inside the database and correctly remove the calculations of the user, including all the relevant .hdf5 files.

We improved the oq plot command by adding several new kinds of plot. They are still for internal use only (i.e. introspection and debugging).

We extended the command oq db to run generic queries for the openquake user. Other users can only run SELECT queries.

There was a bug in oq webui start not supplying the --noreload argument that has been fixed (the reload functionality of the Django development server interferes with SIGCHLD and causes zombies).

We fixed another bug with the --hc functionality in a multi-user situation, due to the fact that the engine was searching the the datastore of the parent calculation in the wrong directory.

There is now a better error message if the shared directory is not mounted.

Source models can now be serialized in TOML format, which is useful for debugging purposes.

Libraries and Python 3.7#

In this releases we updated some of our libraries (numpy from version 1.14 to 1.16 and scipy from version 1.0.1 to 1.3.0) to make it possible to use the engine with Python 3.7. We actually have a cluster using Python 3.7 in production.

In the future we may distribute installers for Windows and macOS based on Python 3.7, but for the moment Python 3.6 is still the only officially supported version and we not plan to abandon Python 3.6 any time soon.

We raised the minimum version for h5py from 2.8.0 to 2.9.0, fixed some compatibility issue with Django 2.1 and 2.2 and fixed several Python 3.7 deprecation warnings. Finally we removed the external dependency from the mock module since it is included in the standard library since Python 3.3.

Deprecations/removals#

For years the engine has been able to import ground motion fields and hazard curves in CSV format and NRML format, with the NRML format deprecated. Now finally the NRML importers have been removed.

There was an old deprecated GMF exporter in NRML format for scenario calculations. It has been finally deprecated. You should use the CSV exporter thas has been available for years instead.

We deprecated the XML disaggregation exporters in favor of the CSV exporters.

We removed the long time deprecated agg_loss_table exporter since now all the needed information is in the losses_by_event exporter.

We switched officially the testing framework from nosetests to pytest.