Release notes v3.17#

Version 3.17 is the culmination of 5 months of work involving around 470 pull requests. It is aimed to users wanting the latest features and maximum performance. Users valuing stability may want to stay with the LTS release instead (currently at version 3.16.4).

The complete set of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.17/debian/changelog

A summary is given below.

Disaggregation calculator#

The disaggregation calculator has been deeply revised, by changing the underlying algorithms and by making it possible to compute the average disaggregation directly. For the first time it is possible to compute the mean disaggregation for large models like Europe (with 10,000 realizations) without running out of memory and in a surprisingly small amount of time (minutes on a cluster). The game changer was computing the mean disaggregation in terms the rates and not probabilities, since rates are additive and therefore the mean calculation can be parallelized cleanly without the need to keep all realizations in memory. This is why now the only statistics supported in disaggregation is the mean and you cannot compute the quantiles. Computing the standard deviation is possible with a memory-efficient algorithm so that could be added in the future.

The changes have touched all kinds of disaggregation, including epsilon_star disaggregation, disaggregation over multiple sites and disaggregation by source. The disaggregation outputs are stored in a different way and the exporters have been updated and optimized. The CSV outputs have an additional column iml. We renamed the “by TRT” disaggregation outputs and added a new output TRT_Mag. Models with mutually exclusive sources (i.e. Japan) can now be disaggregated, as well as models using the NegativeBinomialTOM temporal occurrence model (i.e. the New Zealand model).

The disaggregation using the poes_disagg parameter has been unified with the disaggregation based on the iml_disagg parameter. We fixed a few bugs (for instance the disaggregation sometimes failed when using the NGAEast GMPEs) and changed the binning algorithm for LonLat bins. Since the algorithms have changed you will get (slightly) different results. Also, now the default behavior is to compute the mean disaggregation while before it was to compute the disaggregation only for the realization closest to the mean hazard curve.

Moreover, we substantially changed the disagg_by_src feature. The pandas dataframe stored under the name disagg_by_src has been replaced by an ArrayWrapper called mean_rates_by_src and the documentation changed accordingly. The new structure contains less information than before (only the means) and in a different form (rates instead of PoEs), however it is enough for the purpose of finding the most relevant sources and it can be actually stored for all hazard models in the mosaic, since it requires a lot less storage (like 10,000 times less storage for the EUR model).

The AELO project required the ability to store disaggregation results by source. This is why we added a new output mean_disagg_by_src with its own CSV exporter. The AELO project als required a deep refactoring of the logic tree processor, to implement the ability to reduce a full logic tree to a specific source. That was hard and time-consuming and may require further work in the future.

Finally, we changed the task distribution to reduce data transfer in disaggregation and ensure optimal performance in all cases.

Classical PSHA#

The major new feature in Classical PSHA is the ability to define custom post-processors as documented in https://docs.openquake.org/oq-engine/advanced/3.17/classical_PSHA.html#the-post-processing-framework-and-vector-valued-psha-calculations

We took advantages of this feature to reimplement the conditional spectrum calculator as a post-processor. Vector PSHA is also implemented as a post-processor, and so the AELO disaggregations by source. Users can easily implement custom postprocessors by following the examples in https://github.com/gem/oq-engine/tree/engine-3.17/openquake/calculators/postproc

Apart from post-processors, we kept working on the calculator to make sure that even the largest model on the GEM mosaic can run with a limited amount of RAM. We also changed the internal _poes storage to reduce the disk space occupation.

It is now possible to set at the same time both the sites parameter and the site_model_file parameter; previously it was an error, now instead the closest vs30 parameters are used for the specified sites and the zpt site parameters are recomputed. The change was required in the context of the AELO project.

Finally, we needed a new feature to support the 2018 USA model. In that model some multipoint sources use the equivalent distance approximation, while others do not. To manage such situations we added a new parameter in the job.ini, reqv_ignore_sources, which is a list of IDs for the sources that should not use the equivalent distance approximation.

Event Based Hazard#

The mechanism to generate ruptures in event based calculations has been changed and now different ruptures will be sampled with respect to the past. There is not only a change in the random seed generation algorithm but also a substantial change of approach: starting from engine 3.17 ruptures are first sampled and then filtered which is the opposite order with respect to the previous versions. The previous order was more efficient but it was making it impossible to define a meaningful rupture ID. The problem is documented in detail in https://docs.openquake.org/oq-engine/advanced/3.17/event_based.html#rupture-sampling-how-to-get-it-wrong

Here we will just note that previous versions of the engine had a sequential rupture ID, exported in the file ruptures.csv, but was not usable, because by changing by a little the minimum magnitude or the maximum distance the IDs would refer to totally different ruptures. Moreover, there was no easy way to export a rupture given the rupture ID. All this has changed now, and the rupture ID does not depend on the details of the filtering anymore; as a consequence it is finally possible to disaggregate the GMFs (and the risk) by rupture, a much desired feature.

The command to extract a single rupture is as simple as

oq extract ruptures?rup_id=XXX

and there is an associated HTTP API.

The new rupture ID is a 60 bit integer, with the first 30 bits referring to the index of the source generating the rupture and the second 30 bits referring to the index of the rupture inside the source. Sources containing more than 2^30=1,073,741,824 ruptures are now rejected. Notice that this is not a limitation, since you can always split a large source. Also, in the entire GEM mosaic there is not a single source with more than 2^30 ruptures, so we did not break any model.

We fixed sampling of the ruptures both in the case of mutex ruptures (relevant for the USA model around the New Madrid cluster) and in the case of mutex sources (relevant for the Japan model). The sampling for mutex sources was extended to include the grp_probability parameters, thus making it possible for the first time to sample the Japan model correctly.

There was some work to improve the performance of multiFault sampling by splitting the sources and thus parallelizing the sampling: the improvement was spectacular in the USA model (over 100x on a machine with 120 cores). Moreover, we strongly optimized the generation of events from the ruptures, which now can be dozens of times faster than before.

We improved the view extreme_gmvs to display the largest ground motion values, and we added a new parameter extreme_gmv (a dictionary IMT->threshold) to discard the largest GMVs on demand. This a feature to use with care and not recommended: in presence of extreme GMVs one should fix the GMPE instead.

Finally, we added an output “Annual Frequency of Events” which can be exported as a file event_based_mfd.csv with fields (mag, freq), which is meant to be compared with the measured magnitude-frequency distribution.

Additions to hazardlib#

The most important event in hazardlib was the porting of the GMPEs required to run the Canada SHM6 model. The porting from the GMPEs used in engine 3.11 was long and difficult, since there are many complex GMPEs. In the process we discovered and fixed several subtle bugs. The support for the Canada SHM6 is still considered experimental and you should report any suspicious discrepancy with respect with a calculation performed with engine 3.11.

Moreover there was work for supporting the release of the GEM 2023 Global Hazard Mosaic. All the models were computed by converting the GMPEs with nontrivial horizontal components to use the geometric average. That involved specifying horiz_comp_to_geom_mean = true in the job.ini file and required to fix a few GMPEs by adding the missing DEFINED_FOR_INTENSITY_MEASURE_COMPONENT class attribute.

Apart from that, we added a parameter infer_occur_rates to speedup rupture sampling in presence of multiFault sources. If you know that the probs_occur in the multiFault sources are actually poissonian (which is the common case) then you can set infer_occur_rates=true and get an order of magnitude speedup in the sampling procedure. This feature should be considered experimental at the moment and by default is disabled.

Marco Pagani added the ability to pass the kappa0 parameter to the Lanzano 2019 GMPE.

As usual, some features were contributed by our users.

Chris DiCaprio from New Zealand added to the sourcewriter the ability to add a prefix to MultiFaultSource IDs.

Julián Santiago Montejo contributed the GMPE Arteta et al. (2023) and a bug fix the the GMPE Arteta et al. (2021).

Graeme Weatherill contribute a few bug fixes to various GMPEs, already backported to engine 3.16 (https://github.com/gem/oq-engine/pull/8778).

WebUI#

In the context of the AELO project, we extended the WebUI to make it possible to run single-site classical calculations including disaggregation by Mag_Dist_Eps and disaggregation by relevant sources for each site in the world. The calculator automatically determine the model to use from the coordinates of the site and recompute the site parameters starting from the user-provided vs30. The performance has been tuned so much that a machine with only 16 GB and 4 cores is enough to run such calculations.

The WebUI was extended to log information on login/logout so that tools like fail2ban can be used to detect and stop denial-of-service attacks. We also worked on the password-reset facility.

We also fixed the method /v1/ini_defaults returning a JSON with the defaults used in the job.ini file. The issue was that it was returning NaNs for some site parameters, i.e. not a valid JSON. The solution was to not return anything for such parameters.

We introduced a READ_ONLY mode for the WebUI in situations when it is advisable to remove the ability to post calculations (for instance when the machine were the WebUI runs has not enough resources to spawn calculations).

Risk#

We have a few new features in the risk calculators too.

The first one is the introduction of three new loss types and vulnerability functions: area, number and residents. It is enough to specify the corresponding vulnerability functions in the job.ini file and the outputs relative to the new loss types will appear are new columns in the loss outputs CSV files.

The area and number loss types are associated to the corresponding fields in the exposure, while the residents loss type is associated to the field avg_occupants in the exposure; if missing, it is automatically computed as the mean of night, day and transit occupants.

The second important new feature is disaggregation by rupture. The command to give is simply $ oq show risk_by_rup:structural <calc_id> which will return a DataFrame keyed by the rupture ID and with fields including the total loss generated by the rupture and the parameters of the rupture (magnitude, number of occurrences and hypocenter).

                        loss   mag  n_occ         lon        lat         dep
rup_id
52076478464016  3.091906e+12  6.60      9  128.401993  27.496000   68.199997
52037823758352  2.223281e+12  6.60      9  128.501999  27.596001   70.000000

We changed the (re)insurance calculator to read the deductible field from the exposure: this gives a speedup of multiple orders of magnitude in cases where you have thousands of assets all with the same limit and different deductibles. This was the common case for our sponsor SURA, so we switched from a slow and memory-consuming calculation aggregating 6 million different policies to an ultra-fast calculation aggregating only 6 policies.

At user request, we extended the total_losses feature to include business_interruption and we improved the error handling for reinsurance calculations.

There was also a major performance improvement when reading the exposure. The issue was that we could not read the USA exposure (22 million assets) without running out of memory; now we can and the exposure processor is three times faster. We also improved the event based risk calculator to use very little memory on the workers even in presence of a huge exposure.

There was a huge amount of work on the conditioned GMF calculator, for which we included a [comprehensive verification test suite] (https://github.com/gem/oq-engine/pull/8542) based on the tests for the [ShakeMap code by USGS] (https://usgs.github.io/shakemap/manual4_0/tg_verification.html) We also changed the calculator to compute the GMFs only on the hazard sites and not on the seismic stations.

At user request, we changed the behavior of scenario_risk calculations to just print a warning in case of small hazard causing to losses, and not to raise an error. This is consistent with the behavior of scenario_damage calculations

Finally, we added an environment variable OQ_SAMPLE_ASSETS which is useful when debugging calculations; for instance you can set OQ_SAMPLE_ASSETS=.001 to reduce a calculation by 1000 times, which will be much faster and easier to debug.

Bug fixes and new checks#

We reimplemented the minimum_magnitude filter in terms of the mag-dependent-distance filter, which can also be used to implement a maximum_magnitude filter. The change eliminated a bunch of bugs and corner cases that kept creeping in.

The engine was rounding longitude and latitude to 5 digits everywhere, except when generating a grid from a region; this has been fixed, and we have full consistency now.

The truncation_level parameter is now mandatory in classical and event based calculations. Before, if the user forgot to set it, a default value of 99 was used, most likely not what the users wanted.

If the same parameter is present in different sections of the job.ini file, now the engine raises an error.

If the user by mistake disable the statistics and does not specify individual_rlzs=true, now gets an early error; before, the engine was computing the individual realizations (which was an error).

If the exposure contains empty asset IDs, now the engine raises a clear error.

If the site model file does not contain a required site parameter, and the parameter is not set in the job.ini either, now a clear error is used instead of using incorrectly a NaN value.

The New Zealand model uses the CScalingMSR magnitude-scaling-relationship. That caused a surprising error when using the ps_grid_spacing approximation due to the way the names of the CollapsedPointSources were generated. This is now fixed since the name does not contain anymore the name of the MSR.

It is now possible to use a percent character in the job.ini file without interfering with the ConfigParser interpolation feature.

The GMF importer was failing to import GMFs in CSV format with a custom_site_id field: this has been fixed.

In case of errors, the engine was printing the same traceback two or even three times: this has been fixed.

oq commands#

We finally removed the command oq celery, since we have deprecated celery for over 5 years. We also removed the obsolete references to celery and rabbitmq in the engine documentation.

If a calculation ID is already taken in the database, now oq importcalc raises a clear error before importing the HDF5 file.

We enhanced the command oq sample which is now able to sample MultiPointSources too.

We extended the command oq shakemap2gmfs: on demand it can amplify the generated GMFs depending on the ShakeMap vs30 or on the site model vs30. See the --help message for more.

We refined the plotting command oq plot event_based_mfd? to plot the magnitude-frequency distribution of the events in an event based calculation.

We updated the command oq plot uhs_cluster?k=XXX to clusterize together uniform hazard spectra given similar or identical hazard.

We added a new command oq compare med_gmv <imt> which is able to compare the median GMVs between two calculation: this is useful when debugging different versions of the engine.

We added a new command oq reduce_smlt which is able to reduce source model logic tree file and associated source mode files to a single source; this is useful when debugging AELO calculations.

We introduced a new family of commands oq mosaic with the following two subcommands:

  1. oq mosaic run_site to run a classical calculation on the given longitude,latitude site.

  2. oq mosaic sample_gmfs to sample the ruptures on a given mosaic model

Other#

We extended the sensitivity_analysis feature to work also on file parameters: for instance, you can provide multiple different logic tree files to study the sensitivity from the logic tree.

There was a lot of work on documentation, both on the manuals (fully documented aggregate_by and ideductible and maximum_distance), and the FAQs (how configure multiple engine installations). We also improved the installation instructions.

The full report output has been revised and now a compact representation for the involved GMPEs is used when needed, to avoid unreadable extra-long lines.