Release notes v3.11 =================== This is a major release featuring several optimizations, new features, and bug fixes. Nearly 400 pull requests were merged. For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.11/debian/changelog Here are the highlights. ## New features The classical PSHA calculator has a brand new optimization called *point source gridding*, based on the idea of using a raw grid of point sources for distant sites and a fine grid for close sites. The feature is still experimental and not enabled by default, but the first results are very encouraging: up to 3× improvement in the runtimes of a few large continental models without reduction in the accuracy of the results. The point source gridding optimization is documented here: https://docs.openquake.org/oq-engine/advanced/point-source-gridding.html and you are invited to try it. There is also a new syntax to perform sensitivity analysis, i.e. to run multiple calculations with different values of one (or more) parameters. This can be used to test the sensitivity to the parameters used in the point source gridding approximation, but in general it works for any global parameter. An example to assess the sensitivity to the integration distance is the following: ``` sensitivity_analysis = {'maximum_distance': [200, 300]} ``` Finally, now the engine can automatically download and run calculations from an URL containing a .zip archive. For instance ``` $ oq engine --run "https://github.com/gem/oq-engine/blob/engine-3.11/openquake/server/tests/data/classical.zip?raw=true" ``` ## Hazard calculators The memory occupation of classical PSHA calculations has been reduced significantly; for instance, models that used to require over 100 GB of RAM on the master node now runs with less than 32 GB on the master node. We also optimized the calculation of the probability of exceedence by carefully generating arrays with a size smaller than the CPU cache (that can give a speedup of a factor 2 or 3). A time-honored performance hack in event based calculations with full enumeration of the logic-tree has been finally removed: now the number of generated ruptures is consistent with the case of sampling of the logic-tree. This helps removing a source of confusion that was always present before. Event based calculations involving full-enumeration of the logic-tree are likely to require additional runtime now, but the calculations are manageable, whereas previously they might not have run to completion. The scenario and event based calculators have been fully unified within the engine internals. As a consequence, the parameter controlling the rupture seeds is now always `ses_seed`. Before it was `ses_seed` in event based but `random_seed` in scenario, a potential source of confusion for the users since `random_seed` was also used for the logic-tree sampling. Please read the [FAQ section on the seeds](https://github.com/gem/oq-engine/blob/engine-3.11/doc/faq-risk.md#what-implications-do-random_seed-ses_seed-and-master_seed-have-on-my-calculation) used by the engine for more details. We changed the algorithm generating the rupture seeds, thus the engine will not produce the same GMFs as before, but they will still be statistically equivalent. We refined the "minimum_intensity" approximation, by making it more precise: ground motion values below the specified threshold are replaced with zeros and not stored *if and only if* they are below the threshold for *all* intensity measure types. We improved the task distribution in event based calculations, because sometimes it was producing too many tasks and sometimes sending the tasks to the workers was excessively slow. We changed the task distribution also in scenario calculations: now we parallelize by `number_of_ground_motion_fields` if there are more than 10 sites. This improved a lot the performance in cases with many thousands of sites. The `scenario` and `event_based` calculators (including `ebrisk`) now generate and store as a pandas-friendly dataset the (geometrically) averaged GMF on the events. This is useful for plotting and debugging purposes. We reduced the data transfer due to the GMPEs (in particular the Kotha GMPEs): in some cases, this can make a huge difference (we saw a 10x reduction in the newest model for Europe) while for most models you may not see any sensible difference. The preclassical calculator has been made faster and improved to determine the source weights more reliably, thus reducing the slow task issue in classical calculations. Now all sources are split and prefiltered with a KDTree in the preclassical phase. We changed the semantics of the `pointsource_distance` approximation: before it was ignoring finite size effects, now it is just averaging them, so it is much more precise than before. For calculations with a few sites now we store the classical ruptures in a single pandas-friendly dataset, including information about the generating sources. We worked on improving the UCERF calculator, doing some minor optimizations, but a lot more could be done to improve its performance. ## Risk calculators The `scenario_risk` and `event_based_risk` calculator have been unified, as well as the `scenario_damage` and `event_based_damage` calculators, so do not worry if when running a `scenario_risk` calculation the progress log will say that you are running an `event_based_risk` calculation. The core calculation logic used for the calculations within the engine is the same now. We added the ability to compute aggregate losses to the scenario calculators and aggregate loss curves to the `event_based_risk` calculator. Notice, however, that they are still less efficient than the `ebrisk` calculator, which should be the preferred calculator when attempting to compute aggregated loss curves. We optimized the case of many tags so that now it is possible to aggregate by asset ID or by site ID by setting in the job.ini file ```ini aggregate_by = id ## compute loss curves for each asset aggregate_by = site_id ## compute loss curves aggregated by site ``` This works up to many thousands of assets/sites; previously it was simply impossible due to memory issues. There was a huge speedup in large `ebrisk` calculations due to the removal of zero losses (we measured a 7x speedup in a calculation in test runs for NRCan). The risk model (which comprises fragility functions, vulnerability functions, consequence models, and taxonomy mapping tables) is now stored in a pandas-friendly way; that improves by two orders of magnitude the saving time in calculations with many thousands of vulnerability/fragility functions. The `scenario_damage` calculator is more efficient than before and it stores the damage distributions in a pandas-friendly way. It also stores a dataset `avg_portfolio_damage` useful for comparison purposes. The CSV exporters have been updated to use pandas, thus improving the performance. Moreover various exporters have been changed in order to unify the aggregate losses outputs between `ebrisk`, `event_based_risk` and `scenario_risk` calculators. The most notable change is that the exporter for the loss curves aggregated by tag now also exports the total loss curve (in the same file). Here is an example: https://github.com/gem/oq-engine/blob/engine-3.11/openquake/qa_tests_data/event_based_risk/case_6c/expected/agg_curves_eb.csv ## Logic trees We made the engine smarter in the presence of different sources with the same ID, which are unavoidable in presence of logic trees changing the source parameters. Now internally the engine uses an unique ID. For instance in the case of two different sources with ID "A", the engine will generate two IDs: "A;0" and "A;1". The information about the sources is now stored in a pandas-friendly dataset `source_info` with an unique index `source_id`. Whe changed the internal storage of the PoEs in classical calculations to allow a substantial optimization of performance and memory occupation: this improvement is visible only in calculations with particularly complex logic trees. While at it, we fixed a bug in the sampling logic affecting some models in engine 3.10. We changed the string representation of the realizations to make it more compact (before it was practically impossible to print out the full names of the realizations for some models, because the strings were too long). We added a check on valid branch ID names: only letters, digits and the caracter "-", "_" an "." are accepted. We added a new type of uncertainty for the seismic sources called `TruncatedGRFromSlipAbsolute`. That required adding a classmethod `TruncatedGRMFD.from_slip_rate` and to update the I/O routines to recognize the `slip_rate` parameter. ## hazardlib/HTMK We introduced a new MFD parameter `slipRate` and implemented a new GMPE `AvgPoeGMPE` performing averages on the probabilities of exceedence: this is an alternative approach to the `AvgGMPE`, which performs geometric averages on the GMFs. The `AvgGMPE`, introduced over an year ago, has been extended to work also for scenario and event based calculations and has been documented here: https://docs.openquake.org/oq-engine/advanced/mean-ground-motion-field.html The `ModifiableGMPE` was enhanced with new methods set_scale_median_scalar, set_scale_median_vector, set_scale_total_sigma_scalar, set_scale_total_sigma_vector, set_fixed_total_sigma, set_total_std_as_tau_plus_delta, add_delta_std_to_total_std. [Richard Styron](https://github.com/cossatot) introduced a tapered Gutenberg-Richter MFD, closely following its implementation in the USGS NSHMP-HAZ code. [Marco Pagani](https://github.com/mmpagani) introduced a new distance called 'closest_point' and a method to create a `TruncatedGRMFD` from a value of scalar seismic moment. Moreover he also introduced a KiteSurface class and and KiteFaultSource class, which at the moment are still considered experimental. [Viktor Polak](https://github.com/viktor76525) contributed the GMPE Parker et al (2020). He also contributed the Hassani and Atkinson (2020) GMPE and added a new site parameter `fpeak`. Finally he contributed the GMPEs Chao et al. (2020) and Phung et al. (2020). [Laurentiu Danciu](https://github.com/danciul) and [Athanasios Papadopoulos](https://github.com/thpap) contributed several intensity prediction equations for use in the Swiss National Earthquake Risk Model. The new IPEs refer to models obtained from the ECOS (2009), Faccioli and Cauzzi (2006), Bindi et al. (2011), and Baumont et al. (2018) studies. They also extended the `ModifiableGMPE` class to allow amplification of the intensity of the parent IPE based on a new `amplfactor` site parameter. [Graeme Weatherill](https://github.com/g-weatherill) made some updates to the GMPEs used in the European Seismic Hazard Model 2020 (ESHM20). [Claudia Mascandola](https://github.com/mascandola) contributed the GMPE Lanzano et al. (2019) and the NI15 regional GMPE by Lanzano et al. (2016). We changed the `SourceWriter` not to save the `area_source_discretization` on each source when writing the XML files, otherwise the same parameter in the `job.ini` file would be ignored, which is normally undesirable. ## Bugs A regression entered in the `classical_risk` and `classical_damage` calculators in engine 3.10 causing an increase of the data transfer in hazard curves. That was killing the performance in the case of calculations with many thousands of sites. Fixed after a report by the [EUCENTRE](https://www.eucentre.it/). The exporters for the hazard maps and UHS were exporting zeros in the case of `individual_curves = true`. Fixed after the report by Jian Ma (https://groups.google.com/g/openquake-users/c/43flYFzOMoo/m/tpYFqv1pBAAJ). In presence of an unknown parameter in the `job.ini` file - typically because of a mispelling - the log was disappearing; this has been fixed. The boolean fields `vs30measured` and `backarc` were not cast correctly when read from a CSV field (the engine was reading the zeros as true values). Fixed after the report by Peter Pažák (https://groups.google.com/u/0/g/openquake-users/c/-8Abgea_Pu8/m/IHM0o68rDgAJ). We fixed a wrong check raising a `ValueError` incorrectly in the case of multi-exposures with multiple cost types. We fixed a bug in the calculation of average losses in `scenario_risk` computations: events with zero losses that were incorrectly discarded. Now `ignore_covs = 0` effectively sets *all* the coefficients of variation in the input vulnerability functions to zero, even when using the Beta distribution, which was not the case in previous versions. ## New checks and warnings We removed some annoying warnings in classical_damage calculations in the case of hazard curves with PoEs == 1. The engine logs a warning in case of a suspiciously large seed dependency in event-based/scenario calculations. The engine raises an early error if the parameter `soil_intensities` is set with an amplification method which is not "convolution". The engine raises an early error in case of zero probabilities in the hypocenter distribution or the nodal plane distribution in the XML source files. We added a check on the vulnerability functions with the Beta distribution: the mean loss ratios cannot contain zeros unless the corresponding coefficients of variation are zeros too. Now we perform the disaggregation checks before starting the classical part of the calculation, so that the user gets an early error in case of wrong parameters. The engine warns the user if it discover a situation with zero losses corresponding to non-zero GMFs. We now accept vulnerability functions for taxonomies missing in the exposure: such functions are just ignored. This is useful since it means that a vulnerability model file prepared for a full exposure can be used on a reduced exposure missing some taxonomy strings. ## oq commands We replaced the command `oq workers inspect` with `oq workers status`. We renamed `oq recompute_losses` as `oq reaggregate` and made it to work properly. We enhanced the command `oq compare` and extended it also to the `avg_gmf` outputs. We improved a fixed a few `oq plot` subcommands. We enhanced `oq plot sources` to plot point sources and to manage the internationa date line. We fixed a bug in `oq prepare_site_model` ` when `sites.csv` is the same as the `vs30.csv` file and there is a grid spacing parameter. The command `oq nrml_to` has been documented. ## WebUI/WebAPI/QGIS plugin If the authentication is off now the WebUI shows the calculations of all users and not only the calculations of the current user. We improved the submission of calculations to the WebAPI: now they can be run on a full cluster, `serialize_jobs` is honored and the log level is configurable with a variable `log_level` in the file `openquake.cfg`. We updated the QGIS plugin to reflect the changes in the engine outputs. ## Other The flag `--reuse-hazard` has been replaced by a flag `-reuse-input` that allows the user to reuse only source models and exposures. This is safer than trying to reuse the GMFs, which should be done with the `--hc` option instead. The `num_cores` parameter has been moved from the `job.ini` file to the `openquake.cfg` file and now it works as expected. There was a lot of work on secondary perils, both on the hazard and on the risk side, but the feature is still not ready for primetime. ## Packaging We now have a universal installer working on Linux, Windows and Mac (see https://github.com/gem/oq-engine/blob/engine-3.11/doc/installing/universal.md). The universal installer is now the only supported way to install the engine on Mac and generic Linux systems. It works by using a pre-installed Python, which can be Python 3.6.x, 3.7.x. or 3.8.x. Python 3.9 is not supported yet; if you have an older Python (≤3.5) you must install a newer, supported version of Python and only then proceed with installing the engine. For Debian-based systems the universal installer works just fine, but we also provide packages that include their own Python (version 3.8). For RedHat-based systems we also provides packages that include their own Python (version 3.6). Notice that due to the change of policy of RedHat about the CentOS operating system, it is not clear if we will keep supporting it with the packages, but the universal installer will work. We upgraded h5py to version 2.10 (for performance improvements) and shapely to version 1.7.1 (to unofficially support macOS Big Sur). Notice that *macOS Big Sur is still not officially supported* since we cannot reliably run tests for the engine on Big Sur, given that GitHub's Continuous Integration system does not support it yet. But we know of several users for whom the engine works on Big Sur, via the universal installer. The latest generation of MacBooks based on the Apple M1 CPU architecture, i.e., the MacBook Air (M1, 2020), Mac mini (M1, 2020), and the MacBook Pro (13-inch, M1, 2020) are not officially supported.