Release notes v3.13
===================

The release 3.13 is the result of 4 months of work involving over
350 pull requests and touching all aspects of the engine. In particular,
this is the minimal release needed to run the new [2020 Euro-Mediterranean 
Seismic Hazard Model (ESHM20)](http://hazard.efehr.org/en/Documentation/specific-hazard-models/europe/eshm2020-overview/) 
and the new [GEM China model](https://www.globalquakemodel.org/GEMNews/china-earthquake-loss-model-2021).

The complete list of changes is listed in the changelog:

https://github.com/gem/oq-engine/blob/engine-3.13/debian/changelog

## New features

We have a brand new conditional spectrum calculator documented here:

https://docs.openquake.org/oq-engine/advanced/conditional-spectrum.html

For the moment, it should still be considered experimental; we welcome
help from people wanting to test it. This is a functionality implemented as part of the activities GEM is performing within the METIS project (https://metis-h2020.eu).

The event_based_damage calculator introduced in version 3.12 has been
revised, unified with the scenario_damage calculator and documented here:

https://docs.openquake.org/oq-engine/advanced/event-based-damage.html

In particular it is possible to work with discrete damage distributions
by setting the parameter `discrete_damage_distribution=true` (the
default is false, which is much more performant). Moreover, a few bugs
were fixed.

We now support four different strategies for computing risk profiles,
documented here:

https://github.com/gem/oq-engine/blob/engine-3.13/doc/adv-manual/risk-profiles.rst

Global site model parameters (i.e. the site model parameters defined
in the job.ini file) are now used if a column is missing in the site
model file, instead of raising an error. Only the site model
parameters required by the GMPEs are imported in the datastore,
additional columns in the site model file are ignored (previously, they
were imported even unnecessarily).

Finally, the engine now supports cross correlation (i.e. inter-period correlation) when
generating the ground motion fields. Two cross correlation models are
supported at the moment (FullCrossCorrelation and GodaAtkinson2009),
more will likely be added in the future. The feature is documented here:

https://docs.openquake.org/oq-engine/advanced/correlation.html

## Improvements to the classical calculator

In order to support the ESHM20 model, which is too large to run on
most hardware, we had to implement a tiling functionality, so that
a classical calculation can be split into chunks of sites (tiles) that
are then run sequentially. This is slower than the usual strategy of
running everything in a single tile and should be used only as last
resort, in out-of-memory situations. Also, you should use a small
number of tiles to reduce the performance penalty. For instance,
for a calculation with 100,000 sites, setting `max_sites_per_tile=50000`
in the job.ini file will produce two tiles and reduce by half the memory
consumption.

In order to avoid slow tasks in tiling calculations that would completely
kill the performance, enormous efforts was dedicated to improving the task
distribution which is now sensibly better than before, in particular
in the presence of CollapsedPointSources and MultiFaultSources. Slow tasks
are still possible in corner case situations but they can be mitigated
by tweaking the parameters `time_per_task` and `outs_per_task`.

The internals of the datastore have changed significantly, in particular
for the `source_info` dataset; there is now an additional `source_data`
dataset. They are used internally to understand the content of the tasks
in terms of sources and the calculation time per source.

The memory occupation has been significantly reduced, even without tiling, 
and calculations that were previously running out of of memory are now
progressing smoothly. The reduction depends on the model: for instance the
OpenQuake implementation of the USGS model for the United States now
requires half the memory than before, while the NRCan model for Canada
requires the same memory as before. Some small models could possibly require
more memory than before. The memory reduction is particularly sensible in
models with few sites and in disaggregation calculations.

The disk space occupation has been significantly reduced by changing
the ways the PoEs are stored (not storing the zero values).  This was
necessary for the ESHM20 model where the required disk space was
reduced from 256 GB to 110 GB per calculation. Changing the storage
had also a positive effect on performance, since the calculation is
not stuck writing the data anymore. Moreover, thanks to the new
storage format, it is much simpler to perform manual post-processing of the
PoEs by using pandas.

We changed the approach used when computing the hazard curves and the
statistics: it is now several times faster and using a lot less memory.
The improvement has come both from the change in storage (there is less data
to read) and the usage of numba in `get_slices`. The effect is dramatic
in the ESHM20 model, much less so in smaller models.

The documentation of the point source gridding approximation has been
improved, with a section about how to make the Canada model 26x faster:

https://docs.openquake.org/oq-engine/advanced/point-source-gridding.html

We improved the precision of the `pointsource_distance` approximation:
that means the now you can use smaller values for the `pointsource_distance`
parameter and have better performance without loosing precision.

Finally, the preclassical phase of the classical calculator has been
moved to its own independent calculator. That means than one can
run a preclassical analysis only once and then run different classical
calculations starting from the same sources by using the `--hc`
option. This is also useful when debugging, in case the preclassical
phase is expensive, i.e. for large models.

## Changes to the disaggregation calculator

There was a big change in the CSV exporters, with the goal of reducing
the proliferation of outputs. For instance a calculation with N=2
sites, M=10 intensity measure types, P=5 poes_disagg and R=100
realizations and 7 kinds of disaggregation, previously would generate
N * M * P * R * 7 = 70,000 files. Now it only generates N * 7 = 14
files. The trick was to add a column for the IMT, a column for the PoE
and a column for the realization index. People parsing the CSV outputs will
have to update their scripts.

In the single site case, there is now a warning when a realization
does not contribute to the disaggregation. This helps identifying
pathological situations.

We fixed a bug in the `Mag_Lon_Lat` exporter: the order of
the columns was wrong and the fields mag, lon, lat were actually
containing the values of lon, lat, mag.

We added a command `oq info disagg` to print all 7 kinds of supported
disaggregation outputs.

We added a FAQ about how to compute mean disaggregation outputs, since
many uses asked, see 

https://github.com/gem/oq-engine/blob/engine-3.13/doc/faq-hazard.md#how-can-i-compute-mean-disaggregation-outputs

Finally we removed the long deprecated XML exporters.

## Improvements to the event based calculator

We significantly reduced the slow tasks affecting event based calculations,
both for hazard and risk.

We added a check for missing GSIMs in the job_risk.ini file, to
avoid confusing error messages in the middle of the computation.

We extended the mag-dependent filtering to event based calculations; before
it was honored only in classical calculations.

The `custom_site_id` is now exported also by the GMF exporters.

When using the `--hc` option the engine was using the site collection
of the parent calculation and ignoring the site collection of the
child calculation: this is now fixed.

There is a new parameter `gmf_max_gb`, with default value 0.1,
which is used to decide when to store the `avg_gmf` dataset.

We improved the documentation on the rupture sampling mechanism:

https://docs.openquake.org/oq-engine/advanced/rupture-sampling.html

## Logic trees

We improved the support for source specific logic trees (i.e. logic
trees with an `applyToSources` for each source) and documented it
here:

https://docs.openquake.org/oq-engine/advanced/sslt.html

The branch IDs in the gsim logic tree file are now ignored and
the user can skip them altogether. The change was implemented
to fix a subtle bug causing incorrect branch paths to be listed
in the output "Realizations" in case of duplicated branch IDs.

We changed the string representation of logic tree paths and 
added the commands `oq show branches` and `oq show rlz:<no>`
to allow the user to switch easily from the branch path to the
corresponding source model, source parameters and GMPE. They
are documented here:

https://docs.openquake.org/oq-engine/advanced/logic-trees.html

Due to the changes above, the numbering of the realizations can be
different between engine 3.13 and previous versions, and calculations
using sampling can produce slightly different averages since different
realizations may be chosen internally. This is akin to a change of
random seed, i.e. it is not a physically significant change.

There is now a limit on the number of branches per branchset in the
logic tree (94 branches).

We changed the experimental feature `collapse_gsim_logic_tree` to use 
the class `AvgPoeGMPE` instead of `AvgGMPE`: in this way it is possible to
compute exactly the average mean curves in the case of full enumeration,
even when the number of realizations is too large to use the traditional
approach.

We fixed the `extendModel` feature that was not working in presence
of multiple files per source model logic tree branch.

Running a calculation with full enumeration and more than 15,000
realizations now raises an error unless the user raises the
parameters `max_potential_paths` in the job.ini file. This is useful
to avoid out-of-memory issue at the end of the calculation, in
the postclassical phase.

## hazardlib

As usual, many new GMPEs were contributed:

- [Miguel Leonardo-Suárez](https://github.com/mleonardos)
  contributed the GMPEs Arroyo et al. (2010) and
  Jaimes et al. (2020).

- [Chiara Mascandala](https://github.com/mascandola)
  contributed a new GMPE LanzanoEtAl2019_RJB_OMO
  and fixed a bug in the SgobbaEtAl2020 GMPE. She also contributed
  new variations of Lanzano & Luzi (2019), Skarlatoudis et al. (2013),
  and BCHydro GMPEs.

- [Giuseppina Tusa](https://github.com/gtus23) contributed the Tusa-Langer-Azzaro (2019) GMPE.

- Thanks to EDF we added support for the EAS, FAS and DRVT intensity
  measure types, used in the new GMPEs Bora et al. (2019) and Bayless &
  Abrahamson (2018).

- We added the GMPEs of Bahrampouri et al. (2021) for the Arias intensity
  measure type.

- The `backarc` site parameter used to have only two possible values, 0
  for "fore arc" and 1 for "back arc"; now the value 2 for "along arc"
  is accepted too. The change was made to support the GMPE Manea (2021),
  contributed by [Elena Manea](https://github.com/ElenaManea) and 
  [Laurentiu Danciu](https://github.com/danciul).

A few bugs were also fixed:

- We fixed a bug in `recompute_mmax` (returning square meters instead
  of square kilometers) affecting logic trees changing the slipRate.

- We fixed a compatibility bug with the SMTK causing the error
  `'RuptureContext' object has no attribute 'occurrence_rate'`
  when running the SMTK tests.

- We fixed an array<->scalar bug in the GMPE Abrahamson-Gulerce (2020)

- Several essential bug fixes and improvements were made on the new
  MultiFaultSource and KiteSurface classes, needed for the GEM China model
  and others.

- [Riccardo Zaccarelli](https://github.com/rizac) found a typo in the 
  intensity measure type RSD that was fixed.

There were a few other changes and new features:

- We changed `hazardlib.valid.gsim` to return a correctly instantiated
  GSIM or to fail. Before - for GMPETable subclasses - it was
  returning a partially initialized GSIM to be post-initialized later
  on. Thanks to [Bruce Worden](https://github.com/cbworden) for
  pointing this out.

- We changed the API of `get_mean_stds`: there is no need to specify the
  standard deviation anymore, since it always returns all three standard
  deviations ($\sig, \tau, \phi$) on top of the mean ($\mu$). We improved its
  documentation in the advanced manual.

- We added a warning when the mesh size of MultiFaultSources and
  NonParametricSources is over one million points: the goal is to warn
  the user against building sources that are then too big to compute.

- We added a new method `add_between_within_stds` to the modifiable
  GMPE to compute spatially correlated ground-motion fields
  even when the initial GMPE only provides the total standard
  deviation. To be used with care.

- We added a check to the `geo.Line class` to ensure that every Line object
  must have at least two points.

- We improved the error message in ShakeMap calculations failing due to the
  correlation matrix being not positive defined; now we point out to
  the manual: https://docs.openquake.org/oq-engine/advanced/shakemaps.html#correlation

- [Yen-Shin Chen](https://github.com/vup1120) contributed a new scaling 
  relationship Thingbaijam et al. (2017) for strike-slip.

- [Manuela Villani](https://github.com/orgs/gem/people/ManuelaVillani)
  contributed a new method `horiz_comp_to_geom_mean`
  to the ModifiableGMPE class to convert ground-motion between
  different representations of the horizontal component.

Finally there was some refactoring and 15 classes in the module
`hazardlib.gsim.boore_2014` have been replaced with a single parametric GMPE,
so they cannot be called directly anymore, but rather via the
`valid.gsim` factory function, which is the recommended way for all GMPEs.

## Risk fixes and improvements

We renamed the field "conversion" into "risk_id" in the header of the
taxonomy mapping file. The old name is still valid.

We fixed a bug in presence of continuous fragility functions with
`minIML` equal to `noDamageLimit`, causing damages where there should
have been no damage.

We fixed a bug in event based risk calculations with non-trivial taxonomy
mapping producing NaNs in the event loss table.

We added a "reaggregate_by" feature in event based risk calculations,
documented here:

https://github.com/gem/oq-engine/blob/engine-3.13/doc/adv-manual/risk-profiles.rst

We improved the error checking when reading the taxonomy mapping file.

We added a check for invalid consequence files, to get an early error
and not an error in the middle of the risk calculation. Also,
specifying consequence files without fragility files now raises an early .
error.

We added a check for missing investigation_time in classical risk
calculations.

We fixed a serious performance bug when using `ignore_master_seed=true`
that caused a 60x slowdown in the event based risk calculation for China.

We implemented secondary perils in the risk side. This is still an
experimental feature.

When using the `--hc` option there is now a check making sure that the
intensity measure levels are consistent between child calculation
and parent calculation.

We fixed a bug when using `aggregate_by = site_id` in presence of
a parent calculation.

The aggregation of losses in event based risk calculations has been
optimized even more, and unified with the aggregation in event based
calculations.

The damage outputs have been unified with the risk outputs and now we
have only two possibilites, both for risk and damage calculations: an
`aggrisk` output and an `aggcurves` output. Both are pandas-friendly
to help post-processing of the results. As a consequence, also the
exported CSV files are more similar between risk and damage outputs.

Consequence models in XML format have been deprecated: you should use
solely the CSV format.

## Other fixed and changes

We renamed the parameter `individual_curves` into `individual_rlzs`
since it now applies not only to hazard curves but to all kinds of outputs.
The old name is still valid as an alias.

The --log-file option in the command `oq engine` was not honored.

We fixed running a calculation starting from a parent calculation owned
by a different user.

Mixed list-scalar maximum distances (for instance maximum_distance={
'TRT_A': 100, 'TRT_B': [4, 100], [8, 200]}) were previously invalid:
this has been fixed.

The `custom_site_id` site model parameter is now honored in all
calculators, not only in classical calculators. Moreover we
changed it from being a 32-bit integer to a 6-characters ASCII string
and we documented it:

https://docs.openquake.org/oq-engine/advanced/special-features.html

`DataStore.read_df` now returns strings and not bytes for fields stored
as bytes.

`readinput.get_composite_source_model(oqparam, branchID)`
now accepts a `branchID` parameter; this is useful for Hamlet.

`datastore.read` now accepts a flag `read_parent`; by default it is
True, but it can be set to False to avoid reading the parent of a calculation
(if any). This is useful in postprocessing scripts.

`get_oqparam` now does not instantiate a SourceModelLogicTree object
and thus avoids parsing the entire source model every time. For the
Australia model that reduces the instantiation time from 47 seconds to
0.7 seconds. That makes it possible to assess the size of a source model
very quickly.

## oq commands

The logic behind the command `oq check_input job_haz.ini job_risk.ini`
has changed: now the risk files are checked first, so errors are
discovered immediately and not after a slow check of the hazard files.

The command `oq prepare_site_model` has been extended to accept vs30 files
in .csv.gz or .hdf5 format. We also added an utility `vs30tohdf5` to
convert the global vs30 .geotiff map provided by the USGS
(downloadable from https://earthquake.usgs.gov/data/vs30/)
into a compressed HF5 file suitable for use with `oq prepare_site_model`.

The command `oq prepare_site_model` has also been extended to work for exposures
containing assets distant from all site parameters: in that case a warning
is printed. Previously, the site model could not have been generated.

We added a command `oq compare uhs` to compare uniform hazard spectra between
two or more calculations, similar to the old commands
`oq compare hcurves` and `oq compare hmaps`.

We fixed a small bug in `oq zip job_haz.ini -r job_risk.ini`: now it works
even for empty oqdata directories.

We fixed `oq plot uhs` in presence of non-SA intensity measure types.

We added a command `oq plot source_data`.

We added a command `oq info cfg` to show the location of the
engine configuration file.

We extended the command `oq plot avg_gmf` so it can plot the differences
between two average GMFs.

## IT and WebUI changes

We upgraded a few libraries: numpy to version 1.20, scipy to version
1.7, h5py to version 3.1 and GDAL to version 3.3.3. This allowed us
to support Python 3.9.  Python 3.7 and 3.8 are also supported, while
Python 3.6 is deprecated since it has reached its end of life and it
is not supported by python.org.  The engine will probably stop working
with Python 3.6 in the next release.  Python 3.10 is not officially
supported yet.

We removed the dependency from python-prctl which was not needed anymore.

We removed all upper bounds on the dependencies to make it possible
for people to install the engine with newer libraries. Thanks to that
the engine now unofficially works on macOS with an M1 processor by
using Python 3.9 and the latest libraries. However, that requires
manual tweaking and it is not officially supported by GEM. We cannot
support the M1 processor until GitHub provides support for automatic
testing on this platform.

Some infrastructure work for dynamically deploying the engine on a
cluster of kubernetes has been started, and the WebUI changed to
support such an environment. In particular now invalid input files
raise an error *before* spawning a new machine. Also, small calculations
are recognized and executed on the master machine, without spawning
anything.

The WebUI has been changed to store the input files in the
temporary directory determined by the `custom_tmp` parameter in the
file `openquake.cfg`. Moreover we added a parameter `mosaic_dir`
in `openquake.cfg` that allows the engine to read (big) files from
a predefined global directory, so that we can avoid uploading huge files
to the WebUI.

We also extended the WebUI to run sensitivity analysis calculations.

An annoying warning about not having the latest engine release has
been removed.

We extended the systemd services to work on multiple linux distributions
and not only on Ubuntu.

We extended and improved the installation script `install.py` in
various ways. For instance now it is mandatory to pass the kind of
installation to perform. Optionally, it is also possible to pass a
port for the DBServer.

When using `install.py --version <branch>`
the latest commit of a branch is downloaded and stored so that
`oq engine --version` prints the git hash.

We removed the `multi_user` flag from the file `openquake.cfg`:
now an installation is automatically considered to be of kind multi
user if the engine was installed with root permissions. Multi users
installations are meant for Linux servers only.

We removed Celery from requirements files, since it has been deprecated
and not used for years.

We added experimental support for ipyparallel and for Ray.