Release notes v3.10#

This is a major release featuring several optimizations, new features, and a bug fixes. Over 300 pull requests were merged.

For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.10/debian/changelog

Here are the highlights.

Disaggregation#

There was a substantial amount of work on optimizing the disaggregation calculator in the case of many sites and/or many realizations. Now it requires a lot less memory and it is significantly more performant than before (at least 3 times better then engine 3.9 in test calculations for Colombia and the Pacific Islands). However, for single-site and single-realization calculations you will not see much of a difference.

The distribution mechanism and rupture storage have been changed (ruptures are now distributed and stored by magnitude), thus reducing both the data transfer and the time required to read the ruptures. The effect is spectacular (orders of magnitude) in large calculations.

The way disaggregation outputs are stored internally has changed completely, resulting in speedups up to orders of magnitude in the saving phase. Also, we are logging much less when saving the disaggregation outputs. There was another huge speedup in the extraction routines for all outputs, including the disagg_layer output which is use by the QGIS plugin. Such output has been extended to include IML information and more.

The disagg_by_src functionality, temporarily removed in engine 3.9, has been re-implemented and there is a new extractor for it. However, there are a couple of constraints: in order to use it, you must convert the point sources of your model (if any) into multipoint sources; moreover, the size of disagg_by_src matrix (num_sites x num_rlzs x num_imts x num_levels x num_sources) must be under 4 GB.

If disaggregation by epsilon is not required and the parameter num_epsilon_bins is incorrectly set to a value larger than 1, now the engine automatically corrects the parameter (by warning the user), thus saving a lot of wasted calculation time.

We fixed a bug for the case of zero hazard: having zero hazard for a single return period caused the disaggregation to be zero for all periods.

We fixed a subtle bug for magnitudes on the edge of a magnitude bin, causing some small difference in the disaggregation by magnitude outputs.

We changed the algorithm for the Lon_Lat bins, to make sure that the number of bins is homogeneous across sites, thus changing the numbers for the Lon_Lat disaggregation.

Setting both rlz_index and num_rlzs_disagg is now an error, since the two options are mutually exclusive.

Setting max_sites_disagg to a value lower than the total number of sites is now flagged as an early error (before the error was raised in the middle of the computation).

We fixed the IML interpolation to be consistent with the algorithm used for the hazard maps: the change may cause slight differences in the numbers, in all kind of disaggregations except the ones using the parameter iml_disagg. Moreover, the parameter poes_disagg in the job.ini file is now deprecated and you can just use the parameter poes used for the hazard maps.

Multi-rupture scenarios#

At the end of a year-long effort, we have finally unified the scenario and event based calculators and introduced a full CSV format for the ruptures, i.e. a format containing also the geometries. The full CSV format is documented here. It should not be confused with the old CSV format which is missing the geometry information and therefore cannot be used to start scenario calculations. The new format is for the moment experimental, not guaranteed to stay the same in future releases. It completely supersedes the experimental TOML format for the ruptures introduced some time ago and that now has been removed.

It is now possible to run a slow event based calculation, export the ruptures in the full CSV format, trim the ruptures according to some criterium (for instance consider only the ruptures that dominate the risk and discard the irrelevant ones) and then perform fast scenario calculation starting for the reduced ruptures. This is of huge importance for any risk analyst.

Multi-rupture scenarios can also be used to study the uncertainties involved in a single-rupture scenario calculation. For instance you can change the parameters of the base rupture, generate a bunch of ruptures and then run a single multi-rupture scenario calculation. This is much more convenient than running multiple independent calculations like it was necessary in the past.

Risk calculators#

A much requested feature from our users was the ability to compute statistics for scenario_risk and scenario_damage calculations in presence of multiple GMPEs. Now that is possible and it actually enabled by default, as a consequence of the event_based/scenario unification.

We optimized the scenario_damage/event_based_damage calculators for the case of many sites by returning one big output per task instead of thousands of small outputs. The effect is significant when there are thousands of sites.

We reduced the time required to read the GMFs and we avoided sending multiple copies of the risk model to the workers, resulting in a good speedup for calculations with a large risk model.

Risk calculators are now logging the mean portfolio loss, for user convenience.

We added an exporter for the src_loss_table output giving the cumulative loss generated by each hazard source.

We made more hazard and risk outputs readable by pandas. Since the GMFs are readable with pandas we have removed the GMF npz exporter which was slow and inconvenient.

We made the scenario damage and classical damage outputs consistent with the event based risk outputs by removing the (unused) stddev field. See https://github.com/gem/oq-engine/issues/5802.

The event based damage calculator, introduced experimentally in release 3.9, is now official. We fixed a bug in the number of buildings in the no_damage state, which was incorrect. We fixed a bug in the calculation of the average damage, which is now correctly computed by dividing by the investigation time. We restored the ability to perform damage calculations with fractional asset numbers, by using the same algorithm as in engine 3.8 and previous versions. Finally, we added an official demo for the event based damage calculator and we documented it in the engine manual.

Other new features/improvements#

The logic tree sampling logic has been revised and the seed algorithm changed, therefore in engine 3.10 you should not expect to get the same branches sampled as in engine 3.9. The changed was needed to implement four new sampling methods which are documented here: https://docs.openquake.org/oq-engine/advanced/sampling.html

The new sampling methods will receive more work and experimentation in the future.

It is possible to introduce a custom_site_id unique integer parameter in the site model. The most common use case for this feature is to associate a site to a ZIP code when computing the hazard on a city.

The magnitude-dependent maximum distance feature has been restored and it is documented here: https://docs.openquake.org/oq-engine/advanced/common-mistakes.html#maximum-distance

The rupture-collapsing feature now works also for nonparametric ruptures. It is still experimental and disabled by default.

The engine was parsing the source_model_logic_tree and the the source models multiple times: it has been fixed.

There was additional work on the amplification of hazard curves with the convolution method and on the amplification of GMFs. Various bugs have been fixed, and we added an experimental and slow implementation of the kernel method for computing PSHA using amplification functions. Such features are still experimental.

We included in the engine some code to compute earthquake-induced secondary perils, like liquefaction and landslides. This part is still in an early stage of development and will be extended/improved in the future.

hazardlib/HTMK#

F. J. Bernales contributed the Gulerce et al. (2017) GMPE, Stewart et al. (2016) GMPE, Bozorgnia & Campbell (2016) GMPE.

G. Weatherill contributed the heteroskedastic standard deviation model for Kotha et al. (2020). He also fixed a bug in mixture models in logic trees with multiple GMPEs and he added PGV coefficients to USGS CEUS GMPE tables, where possible.

M. Pagani contributed a ModifiableGMPE class which is able to modify the inter_event parameter of a GMPE. He also contributed an implementation for the Zalachoris & Rathje (2019) GMPE. Finally, he fixed a bug in the ParseNDKtoGCMT parser in the HMTK and ported the method serialise_to_hmtk_csv implemented in the corresponding class of the catalogue toolkit.

M. Simionato fixed a subtle bug that caused a syntax error when reading stored GSIMs coming from subclasses of GMPE redefining __str__ (signalled by our Canadian users).

The sourcewriter has been improved. Now it calls check_complex_fault when serializing a complex fault source, so that incorrect sources cannot be serialized, as it was happening before. Moreover, it automatically serializes nonparametric griddedRuptures in a companion HDF5 file with the same name of the XML file. The source reader can read the companion file, but it also works with the previous approach, when everything was stored in XML. Since griddedRuptures are big arrays the serialization/deserialization in HDF5 format is much more efficient than in XML format.

Bugs#

We fixed a serious numeric issue affecting classical calculations with nonparametric ruptures. The effect was particularly strong around Bucaramanga in Colombia (see https://github.com/gem/oq-engine/issues/5901). The hazard curves were completely off and the disaggregation was producing negative probabilities.

In some situations - for logic trees using applyToSources and sampling - there was a bug so that the logic tree could not be deserialized from its HDF5 representation in the datastore. That affected the calculation of the hazard curves and maps.

We fixed an issue with noDamageLimit not being honored in continuous fragility functions.

We fixed a h5py bug in large ShakeMap calculation, manifesting as the random error 'events' not found.

We fixed an issue withe minimum_magnitude parameter not being honored in UCERF calculations.

We fixed a bug when a rupture occurs more than 65535 times, affecting scenario calculations with a large number of simulations. Similarly, the fields year and ses_id in the events table have been extended to support 4-byte integers instead of 2-byte integers.

We fixed a bug affecting Windows users preparing CSV exposures with Excel, which introduces a spurious Byte Order Mark (BOM).

We fixed a bug when saving arrays with string fields in npz format.

New checks and warnings#

Now the number of intensity measure levels must the same accross intensity measure types, otherwise an error is raised. In the past there was simply a deprecation warning.

We added a warning about suspiciously large complex fault sources, so that the user can fix the complex_fault_mesh_spacing parameter.

We added a warning against numeric errors in the total losses for the ebrisk and event_based_risk calculators, like “Inconsistent total losses for structural, rlz-96: 54535864.0 != 54840944.0”.

If a source model contains different sources with the same ID a warning is now printed. In the future this will likely become an error and the source model will have to be fixed with the command oq renumber_sm. Then the source ID will become a true ID and it will become possible to implement source-specific logic trees.

We added parameter a max_num_loss_curves with a default of 10,000. Without this limit it would be easier for the user to accidentally generate millions of aggregate loss curves, causing the calculation to go out of memory. You can raise the limit, if you really need.

At user request, we raised the limit on the asset ID length from 20 to 50 characters.

We added a check for missing intensity_measure_types in event based calculations, otherwise a calculation would complete successfully but without computing any ground motion field.

oq commands#

oq plot has been improved for disaggregation outputs, making it possible to plot all kinds of outputs and also to compare disaggregation plots.

We added a command oq plot vs30? to plot the vs30 field of the site collection.

We added a command oq nrml_to to convert source models into CSV and/or geopackage format. This is useful to plot source models with QGIS and it is meant to replace the command oq to_shapefile which has many limitations.

We added a command oq recompute_losses <calc_id> <aggregate_by>: this is useful if you want to aggregate an ebrisk calculation with respect to different tags and you do not want to repeat the whole calculation.

We fixed a 32 bit/64 bit bug in oq prepare_site_model affecting the case when the file sites.csv is the same as the vs30.csv file.

The semantic of the command oq engine --run --reuse_hazard has been changed: before it was finding an existing hazard calculation (if any) from the checksum of the input files and using it, now it is just reusing the cached source model, if present. The change was necessary since the checksum of the calculation often changes between version; moreover reusing the GMFs generated with a previous version of the engine is often inconsistent, i.e. it will not give the same results since the details of the seed generation may change.

The oq engine command has been enhanced and it is now possible to run multiple jobs in parallel as follows:

$ oq engine --multi ---run job1.ini job2.ini ... jobN.ini

This is useful if you have many small calculations to run (for instance many scenarios). Without the --many flag the calculations would be run sequentially, as in previous versions of the engine.

WebUI/WebAPI/QGIS plugin#

When running a classical calculation with the parameter poes specified, the engine now produces hazard maps in .png format that are visible through the WebUI by clicking the button on the bottom left side called “Show hazard map”. This is meant for debugging purposes only, the recommended way to visualize the hazard maps is still to use the QGIS plugin.

There is a new API for storing JSON information inside .npz files, which is being used to store metadata. The QGIS plugin has been adapted to take advantage of the new API and the command oq extract has been changed to save in .npz format and not in .hdf5 format.

We changed the call to /extract/events to return only the relevant events, i.e. the events producing nonzero GMFs, in case a minimum_intensity is specified.

The /extract/sitecol API has been extended so that it is possible to extract selected columns of the site collection, for instance /extract/sitecol?param=custom_site_id, if a custom_site_id is defined.

Packaging#

We removed the dependency from PyYAML and added a dependency for GDAL.