Release notes v3.12 =================== The 3.12 release is the result of 6 months of work involving around 550 pull requests and touching all aspects of the engine. Most of the work went into the optimization/enhancement of the risk calculators - notably the event based risk and damage calculators - and into a full rewrite of the GMPE library, described in this post: https://groups.google.com/g/openquake-users/c/Tj5t1rJ7MX0/m/BHLrOPt6AQAJ The complete list of changes is listed in the changelog: https://github.com/gem/oq-engine/blob/engine-3.12/debian/changelog New risk features ----------------- ### Flexible field names in exposure csv files Before engine 3.11 the header names of the columns in a CSV exposure was hard-coded and therefore one had to manually rename columns to strictly match the names required by the engine from any pre-existing exposure files. That often meant maintaning two exposure formats and required the exposure files for the engine to be regenerated after any changes in the original one. Now a user can simply specify a mapping of the custom column names in their exposure files to the names required by the engine, in the exposure.xml metadata file. An example is given here:\ https://github.com/gem/oq-engine/blob/engine-3.12/openquake/qa_tests_data/scenario/case_16/Example_Exposure.xml ### Other exposure-related improvements - The reading of CSV exposures has been optimized by using pandas. - We added an option to ignore encoding errors by skipping any offending characters. The reason is that often exposures are a patchwork of CSV files of unknown encoding. `ignore_encoding_errors = true` should be used only as a last resort: if possible, you should convert all of your exposures into UTF-8, the only encoding supported by the engine. However, in case there are a few bad characters in a description or geographic region, it is better to have a mispelling in the description or region name when exporting the results rather than having the entire calculation fail. ### ShakeMaps enhancements There was a nontrivial amount of work on the ShakeMaps module, mostly contributed by [Nicolas Schmid](https://github.com/schmidni) of ETH Zurich. Now it is possible to read a ShakeMap from a custom URL (or local file) and not only from the USGS site. It is also possible to read ShakeMaps in ShapeFile format, as well as many other formats (.xml, .zip, .npy). Moreover we now support the MMI intensity measure type if spatial and cross correlation are disabled. The advanced manual has been updated with the new features, see\ https://docs.openquake.org/oq-engine/advanced/risk-features.html#scenarios-from-shakemaps ### Risk calculations starting from GMFs in HDF5 It is now possible to run risk calculations starting from GMFs in HDF5 format by setting the option `gmfs_file = gmf-data.hdf5` but that involves some limitations. The HDF5 format of the GMFs is meant to be stable across engine versions. However, the details of the logic- tree implementation change in every release. Therefore, the approach can only work if risk calculations starting from GMFs in HDF5 format are restricted to see a single realization, obtained by collecting together all realizations. This is equivalent to using the new option `collect_rlzs=true` Notice that the results will be meaningful and corresponding to mean results *only if* originally all the realizations have the same weight, which is the case if the original hazard calculation was using sampling of the logic tree. Event based risk calculators ---------------------------- - The work on unifying the `scenario_risk`, `event_based_risk` and `ebrisk` calculators - started in engine 3.11 - has been finally completed. Thanks to this work the `ebrisk` calculator is now deprecated. You should use the `event_based_risk` calculator instead, since it is more efficient than `ebrisk` ever was. The trick was to change `event_based_risk` to use the same distribution mechanism as `ebrisk` (i.e. distribution by ruptures, not by site). - The management of random numbers in risk calculations has been changed. In particular, previously, it was impossible to run even medium-sized event based risk calculations if the vulnerability functions had non-zero coefficients of variation, because all the time was spent reading a huge matrix of epsilons which could be hundreds of GB in size. Now the epsilons matrix has disappeared since the corresponding random numbers (governed by the `master_seed` parameter) are generated dynamically by using modern numpy features (i.e. the `numpy.random.Philox` random number generator). However, running extra-large event based risk calculations may still impossible unless you set `ignore_master_seed = true`, which effectively turns off the generation of the epsilons. - The work on the risk random numbers allowed us to fix some long standing bugs in calculations with vulnerability functions using the Beta distribution (dist="BT"). In particular, the results are now independent from the number of spawned tasks, and are the same both for `ebrisk` and `event_based_risk`. Before engine 3.12 we were not able to ensure 100% replicability of results from risk calculations using the beta distribution in the vulnerability functions. - A lot of work went into estimating and saving the variance of the losses due to the coefficients of variation, therefore, it is possible to set `ignore_master_seed = true` for performance reasons but still have an indication of the uncertainty. Such information is only available by reading the event loss table with pandas and is not exposed as a CSV file, due to the sheer amount of data involved. - Aggregation by tag has been optimized - their calculation now utilizes all available cores and a lot less memory - and is documented in the advanced manual, see the section\ https://docs.openquake.org/oq-engine/advanced/risk.html#aggregating-by-multiple-tags - Moreover, it is now possible to set the parameter\ \ `collect_rlzs = true`\ \ in the `job.ini` file. That makes the risk part of an event based risk calculation even faster and more memory efficient than before, at the price of losing information about the specific realizations.\ \ For continental scale calculations setting `collect_rlzs = true` can make the difference between being able to run a calculation and being unable to do so due to memory constraint or computational constraints. - Finally the scenario from CSV ruptures calculators have been extended to work with multiple TRTs. Event based damage calculator ----------------------------- We introduced an experimental `event_based_damage` calculator in the engine in v3.8. Now it has been rewritten, optimized, and extended, so that it is possible to use it for very large calculations and to compute generic consequences based on the damage tables, not only for economic losses. In order to be efficient, some features had to be sacrified, and in particular it is not possible to compute consequences for each individual realization: we can can only compute means. This is equivalent to using `collect_rlzs = true` in `event_based_risk`. To be completely correct, it is actually possible to compute the consequences for a specific realization, but it is inconvenient since you have to manually change the logic tree until there is only the desired realization. If you want to see an example of usage of the calculator, you should look at the EventBasedDamage demo: https://github.com/gem/oq-engine/tree/engine-3.12/demos/risk/EventBasedDamage The new EventBasedDamage also works in presence of a taxonomy mapping file, a feature that was missing in the past. Currently the following generic consequences are supported: "losses", "collapsed", "injured", "fatalities", "homeless". Since the mechanism to add a new consequence is now quite simple, more are expected to be supported in the future. You can print the updated list of available consequences with the command `$ oq info consequences` New optimizations in the hazard calculators ------------------------------------------- - We improved the rupture weighting algorithm, thus removing some dramatic slow tasks in event based calculations. - We also saved some preprocessing time by weighting the heavy sources in parallel in event based calculations. - We saved memory when generating PoEs in classical calculations and we mitigated the slow task issue when using the point source gridding approximation. - We also did some experiments with the optimizing compiler numba and we were able to speedup significantly some parts of the engine - we measured a 54x speedup when computing the mean hazard curves - but sadly not the real bottlenecks. `numba` is not a dependency of the engine and everything works without it. The plan is to keep it that way. ### Refactoring of the GMPE library The GMPE library has been completely rewritten and the API for implementing new GMPEs has changed significantly. That means that if you have written a GMPE with the old API it will not work anymore once you upgrade to engine 3.12: you will have to rewrite it. This is quite simple is the GMPE is simple, but it can be quite difficult if the GMPE depends from other GMPEs in complex ways. Should you encounter any compatibility issue, please contact us. On the bright side, if you are just using library GMPEs by calling the method `.get_mean_and_stddevs`, your application should work exactly as before. We tried very hard to keep backward compatibility as much as possible. As part of the refactoring, 15 GMPEs of the SHARE model have been vectorized and are now a lot faster than before (up to 200x in single site situations). The rest of the GMPEs have not been vectorized, so they are slow as before. The good news is that with the new API it easy to vectorize a GMPE and more are expected to be vectorized in the future. Due to the refactoring work, many things have changed internally, and are listed here for completeness sake: - `hazardlib.const.TRT` is now an Enum class - `hazardlib.imt.PGA`, SA and all other constructors are now factory functions and not classes - there is a limit of 12 characters to IMT names - multiple inheritance in GMPE hierarchies has been forbidden - defining methods different from `__init__` and `compile` in GMPE classes is now an error - a name convention on GMPE classes has been enforced: attributes starting with COEFF must be instances of the the CoeffsTable class, which has been refactored too ### Other updates to hazardlib There was also a lot of activity not related to the refactoring: - we fixed a bug in the Abrahamson et al. (2014) GMPE and updated the verification tables - we added a `hazardlib.cross_correlation` module to compute the correlation between different intensity measure types - we implemented MultiFaultSources, a new typology of sources to be used in UCERF-like models - we improved the precision of site amplification with the convolution method - we added more epistemic uncertainties to the logic tree module - we fixed a few bugs in KiteSurfaces (having to do with NaN values) - we added a classmethod `PlanarSurface.from_hypocenter` - we updated the parameter DEFINED_FOR_REFERENCE_VELOCITY in a few GMPEs -we optimized the SiteCollection class and now unneeded parameters are not stored anymore; an example could be the parameter `reference_depth_to_2pt5km_per_sec` in a calculations with GMPEs that do not require it; same for the `reference_siteclass` and `reference_backarc` parameters - we changed how the SiteCollection is stored so that it can be read with pandas - we changed the signature of the functions `calc.hazard_curve.classical` and `calc.stochastic.sample_ruptures` ### New GMPEs Finally, many new GMPEs have been contributed. - [Graeme Weatherill](https://github.com/g-weatherill) has contributed the Abrahamson & Gulerce (2020) NGA Subduction Model as well as the Ameri (2014) Rjb GMPE. Moreover, he has updated the Atkinson (2015) GMPE in accordance with the indications of the original author. He also updated the KothaEtAl2020ESHMSlopeGeology GMPE, following changes to the underlying geology dataset. - [Nico Kuehn](https://github.com/nikuehn) and [Graeme Weatherill](https://github.com/g-weatherill) have contributed the NGA Subduction ground motion model of Si, Midorikawa and Kishida (2020) as well as the Kuehn et al. (2020) NGA Subduction Model. - [Stanley Sayson](https://github.com/stansays) contributed the Gulerce and Abrahamson (2011) GMPE for the vertical-to-horizontal (V/H) ratio model derived using ground motions from the PEER NGA-West1 Project. Moreover he contributed the SBSA15b GMPE by Stewart et al. (2016) vertical-to-horizontal ratio (V/H) for ground motions from the PEER NGA-West2 Project, as well as the GMPE by Bozorgnia & Campbell (2016) for vertical-to-horizontal (V/H) ratio. Finally he fixed a few bugs in the Campbell and Bozorgnia (2014) GMPE. - Chung-Han Chan and Jia-Cian Gao have contributed a couple of GMPEs for the Taiwan 2020 hazard model (TEM): Lin2011foot and Lin2011hanging. - [Pablo Heresi](https://github.com/pheresi) has contributed the Idini (2017) GMPE. - [Cladia Mascandola](https://github.com/mascandola) has contributed the Lanzano et al (2020) GMPE and the Sgobba et al. (2020) GMPE. - [Laurentiu Danciu](https://github.com/danciul) has contributed the Boore (2020) GMPE. Bugfixes -------- - We fixed a few bugs in the CSV exporters. First, the encoding was not specified, thus causing issues when exporting exposure data on systems with a non-UTF8 locale (affecting a Chinese user). Second, the CSV exporters on Windows were not producing the right line ending. Finally, we fixed some CSV exporters that were not generating the usual pre-header line with the metadata of the calculation, such as the date and the engine version. - There was a bug in scenario damage calculations, happening (rarely) in situations with very few events and causing an IndexError in the middle of the calculation. This is fixed now. - We fixed a subtle bug in risk calculations with a nontrivial taxonomy mapping: loss curves could be not computed due to spurious duplicated event IDs in the event loss table. - We fixed a bug in the serialization of the gsim logic tree in the datastore, preventing a correct deserialization due to missing branchset attributes. - We fixed a bug in the logic tree processing: the option `applyToSources` did not work with multipoint sources, by not modifying the parameters and by producing the same contribution for each branch. - We fixed the function `baselib.hdf5.dumps` that was generating invalid JSON for Windows pathnames, thus breaking the QGIS plugin on Windows. - We fixed a bug in the management of GMPE aliases; now the dictionary returned by `get_available_gsims()` contains also the aliases. That means that now the Input Preparation Toolkit (IPT) also work with GMPE aliases. New checks and warnings ----------------------- - Years ago we restricted the `asset_correlation` parameter to be 0 or 1. Setting an unsupported value now raises a clear error early in the calculation an not in the middle of it, also for scenario risk calculations. - We added a check to forbid `-Inf` in the sources. This happened to people generating the source model automatically, where the XML file contained things like ```xml ``` - We added an early check to discover situations in which the user mistakenly uses fragility functions in place of vulnerability functions or viceversa. - We added a warning if extreme ground motion values (larger than 10g) are generated by the engine. This may happen for sites extremely close to a fault. - The warning about discardable tectonic region types now appears in all calculations, not only in classical calculations. - A warning is now printed if the loss curves appear to be numerically instable. - Setting a too large `area_source_discretization` parameter was breaking the engine with an ugly error; now you get a clear error message. oq commands ----------- - The command `oq checksum source_model_logic_tree.xml` was broken, raising a TypeError. It is fixed now. - The command `oq workers kill` (to be used on a linux cluster) now calls `killall` and kills all processes of the user `openquake`, including possible zombies left by an out-of-memory crash, so it is much better than before. - We added a new kind of plot (`oq plot uhs_cluster`) displaying similar uniform hazard spectra from different realizations clustered together. - We added a new command `oq export disagg_traditional` to export the disaggregation outputs in traditional format (i.e. Bazzurro and Cornell 1999), where the probabilities sum up to 1. - The command `oq --version` now gives the git hash if the engine was installed with the universal installer using the `--version=master` option. Other changes ------------- - As often is the case with every new release, the inner format of the datastore has changed in several places, and in particular, the event loss table has been renamed from `losses_by_event` to `risk_by_event`, since this table can now also be populated by the event-based damage calculator, with consequences other than economic losses. - The XML exporter for the ruptures, deprecated years ago, has been finally removed. You should use the CSV exporter instead. - The experimental feature "pointsource_distance=?" has been removed. It was complicating the engine without giving a significant benefit. - The special feature `minimum_distance` (https://docs.openquake.org/oq-engine/advanced/special-features.html#the-minimum-distance-parameter) now works with a single parameter in the `job.ini` which is used for all GMPEs. This is simpler and more consistent than the previous approach that required changing the gsim logic tree XML file by adding an attribute to each GMPE. - For single-site classical calculations now the engine automatically stores individual hazard curves for each realization. - The hazard curve and UHS exporters now export the `custom_site_id` parameter if defined. - We improved the universal installer, especially on Windows. - We upgraded Django to release 3.2.6. - We updated the documentation (including the API docs) and the demos.