Release notes v3.24 =================== Version 3.24 is the culmination of 9 months of work involving around 510 pull requests. It is aimed at users wanting the latest features, bug fixes and maximum performance. Users valuing stability may want to stay with the LTS release instead (currently at version 3.23.3). The complete set of changes is listed in the changelog: https://github.com/gem/oq-engine/blob/engine-3.24/debian/changelog The major changes are in multifault sources (relevant for the USA, China, Japan and New Zealand models), in secondary perils (i.e. liquefaction and landslides) and in the script generating the global stochastic event set from the GEM hazard models. Moreover, we optimized the memory consumption in scenario calculations (both conditioned and not) and in large event based risk calculations, that were running out of memory due to the large data transfer in the average losses. A detailed summary is given below. Hazard sources --------------- We enhanced multifault sources. It is now possible to specify their composition in faults with a syntax like the following: This is used when disaggregating the hazard by source. The mechanism works by splitting the multifault source in faults before starting the calculation. Then in the `source_info` dataset you will see not the original multifault source ("MF0" in this example), but the two multifault sources "MF0@tag1" and "MF0@tag2". You can find an example in classical/case_75. We reduced the memory consumption in multifault sources by splitting them in blocks of ruptures, with size determined by the openquake.cfg parameter `config.memory.max_multi_fault_ruptures`. We added checks for missing hdf5 files when reading nonparametric and multifault ruptures, so that the user gets clear error messges. We added a parsing check in pointlike sources, that is the depths in the hypodepth distribution must be within the `upper_seismogenic_depth` and the `lower_seismogenic_depth`. We improved the error message for invalid polygons, so that the user can see which area source causes the infamous error "polygon perimeter intersects itself". We extended the `source_id` feature. For instance setting `source_id=NewMadrid` in the job.ini will keep all the sources starting with the string "NewMadrid" instead of keeping only the source with ID "NewMadrid". Classical calculations ------------------------ There was a substantial amount of work to support the USA (2023) model. While the model is not supported yet - it will require engine v3.25 - all of the required GMMs have been added to hazardlib. Moreover, there were multiple improvements in the management of clusters of sources (like the NewMadrid cluster) that now are a lot more efficient: computationally, in terms of memory consumption and in terms of data transfer. Even estimating the weight of cluster sources in the preclassical phase is much faster. Most of all, the disk space consumption has been reduced by one order of magnitude for USA. We fixed the "OverflowError: Python integer out of bounds for uint32" happening when saving the `source_info` table due to the size of the model. Far away point sources (as determined by the preclassical phase) are not transferred anymore, thus improving performance and reducing data transfer and slow tasks. We optimized the preclassical phase in the case of point sources when using the ps_grid_spacing approximation. We also changed slightly the logic of the approximation so that the numbers are slightly different. This will help towards implementing additional optimizations in future engine releases. We removed the parameter `reference_backarc` since it was not used. Event based calculations ------------------------ We added a parameter `max_sites_correl` with default 1000 to stop users trying to compute correlated GMFs with too many sites, resulting in serious out of memory situations. The parameter does not affect scenario calculations, only event based ones. We made the job.ini parameters `minimum_intensity` and `minimum_magnitude` mandatory in the case of event based calculations with more than `max_sites_disagg=10` sites. This solves the issue of users forgetting to set such parameters and running calculations much larger than needed. We introduced a parameter `config.memory.gmf_data_rows` to make it impossible to compute hazard curves from excessively large datasets: this happened when the user forgot to remove `hazard_curves_from_gmfs` which is meant to be used only for few sites. We exposed the output `avg_gmf` that before was internal. Moreover, we extended it to consider secondary IMTs. Notice that `avg_gmf` is computed and stored only in small calculations. The controllining parameter in the `job.ini` file is called `gmf_max_gb`, with a default value of 0.1, meaning that `avg_gmf` is computed and stored only if `gmf_data` contains less than 102.4 MB of data. You can raise the parameter but then the calculation will become slower. We improved the script `global_ses.py`, which is able to generate a global (or partial) Stochastic Event Set starting from the hazard models in the GEM mosaic. In particular it now is possible to pass the models to consider and various parameters. Moreover, we documented it. It is not considered experimental anymore. We also worked at avoiding out of memory errors in calculations starting from the global SES; the performance, however, is still not ideal and further work will be required. Moreover, you cannot run risk calculations directly from the global SES without storing intermediate ground motion fields first. Scenario calculations ---------------------- We reduced the memory consumption in scenario calculations by discarding the sites outside the integration distance from the rupture. We futher reduced the memory consumption in conditioned GMFs calculations by using shared memory to keep the distance matrices. We fixed the issue of small negative damages due to rounding errors in some scenario damage calculations. hazardlib ---------- We slightly optimized the generation of context arrays for point sources. We worked at the Australian GMMs (NGAEastAUS2023GMPE), by introducing a way to manage variable vs30 values, which is essential for risk calculations. We adjusted the basin terms in many GMMs for the USA model to determine the basin parameters from the vs30 via empirical relationships. Such feature is enabled only if the `z1pt0` and `z2pt5` parameters in the site model are set to the sentinel value -999. We added a BSSA14SiteTerm GMPE for use in the USA model. Our users at USGS reported a bug with aliases, such that `valid.gsim("[AbrahamsonGulerce2020SInterAlaska]")` was returning the generic GMPE and not the regional specialization for Alaska. This is fixed now. We fixed `modified_gsim` to work with the NGAEast GMPEs. We fixed the site term for Abrahamson and Gulerce (2020) to be exactly consistent with the paper. We fixed the BA08 site term: the rjb distance was not computed when needed by the underlying GMPE, thus resulting into an error. We fixed the issue of NaNs coming from `get_middle_point()` for kite surfaces by discarding NaNs before computing the middle point. We extended `ComplexFaultSource.iter_ruptures` to create ruptures for a range of aspect ratios. We added a classmethod `from_moment` to the TaperedGRMFD Magnitude Frequency Distribution. In the HMTK we extended the `get_completeness_counts` function to optionally return an empty array (the default is to raise an error). Moreover we extended the `serialise_to_nrml` method to accept a mesh spacing parameters, as requested by Francis Jenner Bernales. We implemented the Taherian et al. (2024) GMM for Western Iberia, both for inland and offshore scenarios. This is the first GMM in the engine using Machine Learning techniques, i.e. the onnxruntime library. Such library at the moment is optional, i.e. if missing you can run everything else except this GMM. Graeme Weatherill contributed an implementation of the MacedoEtAl2019SInter/SSlab conditional GMPE. Baran Güryuva contributed the EMME24 backbone GMMs for crustal earthquakes. Guillermo Aldama-Bustos contributed the Douglas et Al. (2024) GMM. Valeria Cascone added `get_width` and `get_length methods` to the Strasser et al. (2010) MSR and then to the Thingbaijam et al.(2017) MSR. Chiara Mascandola added a class LanzanoEtAl2019_RJB_OMO_NESS2 to Lanzano (2019) set of GMPEs and submitted an implementation of Ambraseys (1996) and Sapetta-Pugliese (1996). She also submitted Ramadan (2023) GMPEs. We changed `contexts.get_cmakers` to return a `ContextMakerSequence` instance, so that it is possible to compute the RateMap generated by a CompositeSourceModel with a single call. Then we removed the function `hazard_curve.calc_hazard_curve` since it is now redundant. logic trees ------------ We improved the library to generate logic tree files (`hazardlib.lt.build`). In particular we used it to generate the logic tree for the Drouet calculation for France using 4 blocks of 100 branches each, thus avoiding the limit of 183 branches per branchset. We added a new uncertainty `abMaxMagAbsolute` to change three parameters of the Gutenberg Richter MFD at the same time (`aValue`, `bValue` and `MaxMax`). We added a new uncertainty `areaSourceGeometryAbsolute` and then two seismogenic depth uncertainties (upper and lower) to pointlike sources. We extended the check on duplicated branches to the source model logic tree. We extended the check on number_of_logic_tree_samples to all calculators and not only to event_based. We changed the realization string representation regular, i.e. the first branch of the source model logic tree regular uses a single letter abbreviation, all the other branches. We fixed two small bugs in `oq show rlz`: it was not working for a couple of tests (case_12 and case_68). If the source model logic tree is trivial (i.e. with a single branch) it is possible to not specify it and just set the `source_model_file` parameter. Finally, we included in the engine repository a private `_unc` package containing a prototype implementation of the POINT (Propagation Of epIstemic uNcerTainty) methodology for managing correlated uncertainties. This is not ready for general usage yet. Risk inputs ------------- We added two new loss types `injured` and `affectedpop` with their associated vulnerability functions. They are being used by the OQImpact platform. We extended the exposure reader to accept .csv.gz files, thus saving disk space now that the exposure files are getting large. We made the USA exposure parsing 10x faster by fixing `get_taxidx` that was performing redundant operations. Missing columns in the exposure.csv files - missing with respect to the fields declared in the exposure.xml file - are now automatically filled with 'No_tag'. We added an experimental option to aggregate the exposure: multiple assets on the same hazard site and with the same taxonomy are considered as a single aggregated assets. The feature is experimental and meant to be used in the global risk model as a performance hack. We saved a bit of memory when reading the asset collection in the workers in event based risk calculations, by reading the assets by taxonomy and by setting `max_assets_chunk`. We also managed to increase the reading speed substantially (6x in an example for Pakistan). Infrastructure risk calculations for large networks could easily run out of memory since the efficiency loss (EFL) calculation is quadratic in the number of nodes. Now we just raise an error if the number of nodes exceeds `max_nodes_network`, for the moment hard coded to 1000. Calculations starting from ShakeMaps required a pre_job.ini file and then the usual job.ini file. Now everything can be done with a single file, thus making the user experience simpler and less error prone. The two-files approach is still supported (it is useful for instance in sensitivity analysis, when you want to perform multiple calculations without downloading multiple times the same ShakeMap) but it is not mandatory anymore. We removed the obsolete parameter `site_effects` in the job.ini of scenario from ShakeMaps calculations. It was meant to manage amplification effects, but such effects are already included in modern ShakeMaps. Risk outputs ------------ We fixed a bug in `post_risk` when computing the quantiles stored in `aggrisk_quantiles`. We reduced substantially the data transfer in `avg_losses` (4x for the Philippines) by performing a partial aggregation in the workers. We changed the risk exporters to export longitude and latitude of the assets at 5 digits, consistently with the site collection exporter. We added a method to aggregate the asset collection over a list of geometries, for internal usage. We added an extractor for `damages-stats`, similar to the extractor for `damages-rlzs`, since it was not implemented yet. There is now an exposure extractor which however is not accessible from the WebUI for security issues (the exposure can contain sensitive data). There is also an exporter for the vulnerability functions and one for the job.zip file: the latter, however, is still experimental and known not to work in some cases. We added an extractor for `losses_by_location`, which is aggregating the assets on the same location (i.e. not necessarily on the the hazard sites). We are gzipping the avg_losses datasets, thus reducing the disk space occupation by 2-3 times. This is important in large scenario calculations. The extractor for aggregated risk and exposure was not supporting multiple tags, i.e. the semicolon syntax as in `aggregate_by = NAME_1, OCCUPANCY; NAME_1; OCCUPANCY`. This is now fixed. Moreover, the exporter `aggexp_tags` exports a file for each tag combination. Secondary perils ---------------- The secondary peril implementation is still experimental and this release contains a great deal of changes (see https://github.com/gem/oq-engine/pull/10254) plus a new feature: it is possible to generate the equivalent of hazard curves and maps for secondary IMTs. Many new models were added, many were fixed, some model names were changed. Calculations using the old names will fail with a clear error message suggesting to the user the new names. In the case of landslides we renamed the `job.ini` parameter `crit_accel_threshold` as `accel_ratio_threshold`. Moreover, we discarded displacement values below 1E-4 to avoid wasting disk space. We fixed the unit of measure for the slope parameter (m/m) when computing the landaslide peril. There were two important fixes to the TodorovicSilva2022NonParametric model: first, we solved the `NameError: Missing liquefaction:LiqProb in gmf_data` that happened in some cases. Second, we solved a much trickier issue, overparallelization due to the underlying machine learning library that could totally hang large calculations. We changed the `multi_risk` calculator to be more similar to a scenario_risk calculator with secondary perils; in particular the special-case syntax multi_peril_csv = {'ASH': "ash_fall.csv", 'LAHAR': "lahar-geom.csv", 'LAVA': "lava-geom.csv", 'PYRO': "pyro-geom.csv"} has been replaced with secondary_perils = Volcanic sec_peril_params = [{"ASH": "ash_fall.csv", "LAHAR": "lahar-geom.csv", "LAVA": "lava-geom.csv", "PYRO": "pyro-geom.csv"}] The plan for the future is to replace the `multi_risk` calculator with a regular `scenario_risk`. We changed the storage of secondary intensity measure types (IMTs) inside `gmf_data` to keep the secondary peril class as a prefix in the column name: this makes it possible to manage multiple perils within a single computation. Each secondary peril subclass has now a `peril` attribute pointing to `landslide` or `liquefaction`. If the risk functions contain IMTs which are not in the secondary IMTs set a warning is printed, since it means that such risk functions will be ignored. This is a protection against typos. If there are secondary IMTs not covered by the risk function an error is raised, since it is impossible to compute the secondary peril. We added a demo called InfrastructureMultiPeril to showcase a Multi-Peril risk calculator using a small infrastructure network with four nodes and four edges. Bug fixes --------- In the past users could mistakingly set `max_sites_disagg` to be (much) larger than the total number of sites, thus resulting in too many postclassical tasks to be generated, with a terrible performance. Now the engine takes the total number of sites in consideration when determining the number of postclassical tasks to generate. Extracting the realization specific disaggregation outputs in traditional format, with a call like `get("disagg?kind=TRT_Mag_Dist_Eps&imt=SA(0.075)&spec=rlzs-traditional&poe_id=0&site_id=0")` failed with a numpy error. This is now fixed. We fixed `getters.get_ebruptures` to work also for scenario calculations. Some corporate users have no permission to write on their Windows temporary directory and they could not run the demos. We worked around this issue. You can still export by using the `oq engine --export-outputs` command and by passing to it the path to a writeable directory. `hcurves-rlzs` and `hmaps-rlzs` were created in single site calculations even if `individual-rlzs` was not set. This is now fixed. oq commands ----------- The commmand `oq plot rupture` to plot the geometry of a rupture has been slightly improved and we added a command `oq plot build_rupture` to plot planar ruptures (see `oq plot examples` for examples of usage). We added a command `oq plot_h3` accepting a list of h3 codes and visualizing the hexagons with country borders. We extended `oq plot_assets` to also plot the stations if available. We added a command `oq info apply_uncertainty` to display the recognized uncertainties in the logic tree. We removed the `--config-file` option from the `oq engine` command since there is the OQ_CONFIG_FILE environment variable for the rare cases when it is needed. We removed the `--reuse-input` flag since it can be replaced with a cache enabled via the `config.dbserver.cache` flag. The cache mechanism is still experimental and disabled by default. We added an utility `utils/find_exposure ` to find the largest exposure in a directory. For single site calculations with disagg_by_src it is possible to visualize the most relevant sources with the command `oq show relevant_sources:`. We extended `oq engine --delete-calculation` to remove both `calc_XXX.hdf5` and `calc_XXX_tmp.hdf5` when deleting a job. We extended `oq importcalc calc_hdf5` to also import calculation files outside oqdata by moving them inside oqdata first. WebUI ----- We added an endpoint `/v1/calc/:calc_id/exposure_by_mmi` to extract the `exposure_by_mmi` output (in JSON format) for risk calculations from ShakeMaps. We added an endpoint `/v1/calc/:calc_id/impact` to extract the impact results (in JSON format) for scenario risk calculations. We added an endpoint `/v1/calc/jobs_from_inis` to determine which job.ini files have been already run (in that case we return the associated job_id) or not (in that case we return 0). This was required by the PAPERS project. We added a private endpoint `/v1/calc/run_ini` to run calculations starting from a remote .ini file, assuming the client knows its full path name, again fro the PAPERS project. We made the hidden outputs (i.e. exposure and job_zip) visible and downloadable only for users of level 2 or when the WebUI authentication is disabled. We made it possible to select multiple input files (and not only a single zip file) when running a job via the WebUI. We do not show anymore in the WebUI the jobs that have the `relevant` flat set to false. This is a way to hide jobs without removing them. IT --- In this release we removed the `pyshp` dependency (since we have `fiona`) and we added the Uber's package `h3` (version 3.7.7). This is the last release supporting Python 3.10 and numpy 1.26: in the next release we will upgrade all numpy-dependent libraries (including the geospatial libraries) and we will require at least Python 3.11. We changed the parallelization parameters to try to ensure at least one GB of RAM per thread, possibly at the cost of using less threads than available. For instance, if your machine has 22 threads and 16 GB of RAM only 16 threads will be used by default. It is still possible, as always, to specify the `num_cores` parameter in the `openquake.cfg` file to explicitly control the number of processes in the process pool. Users on a laptop with not much memory are recommended to close the browser and run their calculations from the command line. AELO and OQImpact ----------------- Hundreds of pull requests were made in the engine codebase to support the AELO and OQImpact platforms. However, since those are private platforms, the related improvements will not be listed here.