Release notes v3.9
==================

This is a major release featuring several new optimizations, features and bug
fixes. Over 320 pull requests were merged.

For the complete list of changes, see the changelog:
https://github.com/gem/oq-engine/blob/engine-3.9/debian/changelog

## Changes in behavior

There are a couple of important changes in engine 3.9 that *must* be signaled:

1. there is no automatic reduction of the GMPE logic tree anymore: potentially,
   this can cause the generation of redundant outputs
2. the `pointsource_distance` approximation now replaces planar ruptures
   with pointlike ruptures: this can produce differences in the hazard curves

In both cases the engine raises warnings asking the user to take
action if problems are identified. Both changes were motivated by the request of
making the engine less magic. They are fully documented here

https://github.com/gem/oq-engine/blob/engine-3.9/doc/adv-manual/effective-realizations.html

and here:

https://github.com/gem/oq-engine/blob/engine-3.9/doc/adv-manual/common-mistakes.rst#pointsource_distance

## Logic trees

Most of the work on this release went into a deep refactoring of the
logic tree code. From the user perspective the most notable change is
the time needed to parse and manage the source models, which has been
substantially reduced. This is particularly visible in the case of the
complex logic trees used for site specific analysis (we are talking
about orders of magnitude speedups). For continental scale
calculations the speedup is very significant when running in preclassical
mode or for single site calculations, while it is not noticeable - 
compared to the total runtime - in the other cases.

The basic logic tree classes, as well as the code to
manage the uncertainties, have been moved into hazardlib. The change
makes it possible for a power user to introduce new custom
uncertainties with a little Python coding, whereas previously,
adding new uncertainties was extremely difficult, even for a core
developer. Users with an interest on such topics should
contact us and we can give some guidance.

The removal of the *automatic* reduction of the GMPE logic tree
feature allowed substantial simplifications and made it possible to
infer in advance the size of a calculation, thus giving early warnings
in the case of calculations too big to run. It is possible to
reduce the GMPE logic tree *manually*, by discarding irrelevant
tectonic region types (a TRT is irrelevant if there are no sources for
that TRT within the integration distance). The engine will tell you
automatically which are the irrelevant TRTs, even without running a
full calculation.

There is a new (and much requested) feature, the ability to add sources
to a source model as a special kind of uncertainty. The feature is
called `extendModel` and is documented here:

https://github.com/gem/oq-engine/blob/engine-3.9/doc/adv-manual/special-features.rst#extendmodel

A substantial amount of work made it possible collapse logic trees
programmatically. The feature is implemented but not exposed to the
final users (yet).

Even if the engine does not offer any built-in way to plot logic trees, an
example of how you can do it yourself by using the
[ete3](http://etetoolkit.org/) library has been added in
https://github.com/gem/oq-engine/blob/engine-3.9/utils/plot_lt

## New optimizations

There are several new optimizations and improvements.

The most impressive optimization is the enhancement of the point
source collapsing mechanism for *site-specific classical
calculations*.  This can easily give an order of magnitude speedup for
calculations dominated by point sources, i.e. most calculations. The
price to pay is a small reduction in precision, as discussed here:

https://github.com/gem/oq-engine/blob/engine-3.9/doc/adv-manual/site-specific.rst

There is a new demo (in demos/hazard/MultiPointClassicalPSHA) to
demonstrate the feature. For the moment, this feature should be regarded as
experimental and it is *not enabled* by default, unless you set some
parameters in the `job.ini`.

Classical calculations with few sites (meaning fewer than the
parameter `max_sites_disagg`, which has a default value of 10) 
have been optimized too. Not only they are faster, but they require 
less disk space to store the rupture information, since we are now 
compressing the relevant datasets. The change made disaggregation 
calculations faster and more efficient, with a reduced data transfer 
and a lower memory consumption.

Calculations with many sites have not been optimized per se,
but since the task distribution has been improved, avoiding corner
cases where the engine was producing too many tasks or not enough
tasks, it is likely that they will be faster than before. The changes
in the task distribution affect the classical, the disaggregation,
the event based and the ebrisk calculators.

The data transfer in ruptures has been reduced in the event based
and ebrisk calculator, thus saving memory in large calculations.

There were *huge* improvements in the calculation of aggregate loss
curves with the `event_based_risk` and `ebrisk` calculators. Now
they can be computed without the need to store intermediate
asset loss tables (one per each tag combination) and therefore the
required storage space has dropped drastically.

The UCERF calculators have been unified with the regular calculators:
the calculators `ucerf_classical` and `ucerf_hazard` are no more,
just use the regular `classical` and `event_based` calculators; now
they can manage UCERF calculations too. Since the task distribution
has improved, now classical UCERF calculations are a bit faster than
before (say 10-20% faster).

## New features

The disaggregation calculator can now compute the mean disaggregation,
if multiple realizations are specified by the user in the `job.ini`. This
is useful to assess the stability of the disaggregation results.

The ebrisk calculator accepts a new parameter called
`minimum_asset_loss`: by specifying it, losses below the threshold are
discarded in the computation of the aggregate loss curves. This does
not improve the speed of the calculation much, but saves a substantial
amount of memory. Notice that in the calculation of
average losses and total losses the parameter
`minimum_asset_loss` is ignored and losses are not discarded: the
results are exact. It is only the aggregate loss curves that
are approximated. The parameter is experimental and it is there for
testing purposes.

There is a new stochastic `event_based_damage` calculator, which for the moment
should be considered experimental. Specifications for this calculator
are listed in this issue: https://github.com/gem/oq-engine/issues/5339.
The `event_based_damage` calculator allows for the computation of
aggregated damage statistics for a distributed portfolio of assets starting 
from a stochastic event set, with an approach similar to the
`event_based_risk` calculator. Similar to the `scenario_damage`
calculator, the `event_based_damage` calculator also includes 
the ability to compute probabilistic consequences (such as
direct economic costs of repairing the damaged buildings, 
estimates of casualties, displaced households, shelter requirements, 
loss of use of essential facilities, amount of debris generated etc.),
given the appropriate input consequence models. If you
are interested in beta-testing this new calculator, we welcome you to
write to engine.support@openquake.org. 

In order to support the `event_based_damage` calculator, the
`scenario_damage` calculator has been updated. If the field `number`
in the exposure is an integer for all assets, the `scenario_damage` calculator
will employ a damage state sampling algorithm to assign a specific
damage state for every building of every asset. Previously, the 
`scenario_damage` calculator was simply multiplying the probabilities
of occurrence for the different damage states for an asset (gleaned from the
fragility model) by the `number` of buildings to get the expected number of
buildings in each damage state for the scenario. The old behavior is 
retained for exposures that contain non-integral values in the `number` field
for any asset.

Finally, there was work on a couple of new experimental features:

- amplification of hazard curves
- amplification of ground motion fields

These features are not documented yet, because they are not ready.
We will add information in due course.

## hazardlib

[Graeme Weatherill](https://github.com/g-weatherill) extended hazardlib 
so that it is possible to compute Gaussian Mixture Models in the standard deviation
(see https://github.com/gem/oq-engine/pull/5688).

Graeme also implemented Forearc/Backarc Taper in the SERA BC Hydro Model
(see https://github.com/gem/oq-engine/pull/5479), and updated the
Kotha et al SERA GMPE (https://github.com/gem/oq-engine/pull/5475)
and the Pitilakis et al. Site Amplification Model
(https://github.com/gem/oq-engine/pull/5732).

[Nick Horspool](https://github.com/nickhorspool) discovered a typo 
in the coefficient table of the GMPE of Youngs et al (1997) that was 
[fixed](https://github.com/gem/oq-engine/pull/5700).

The INGV contributed three new GMPEs with scaled coefficients, Cauzzi (2014)
scaled, Bindi (2014) scaled and Bindi (2011) scaled
(https://github.com/gem/oq-engine/pull/5682).

[Kendra Johnson](https://github.com/kejohnso)
added the new scaling relationships Allen and Hayes (2017)
(see https://github.com/gem/oq-engine/pull/5535).

[Kris Vanneste](https://github.com/krisvanneste)
discovered a bug in the function `calc_hazard_curves` that was not working 
correctly in the presence of multiple tectonic region types. It has been fixed.

The AvgGMPE class was saved incorrectly in the datastore, causing issues
with the ``--hc`` option. It has been fixed. Moreover now it can be used
with a correlation model if all the underlying GMPEs can be used with a
correlation model.

## Outputs

The exporter for the `events` table has been changed. It exports
two new columns: `ses_id`, i.e. the stochastic event set ID, which is an integer
from 1 up to `ses_per_logic_tree_paths`, and `year`, the year in which
the event happened, an integer from 1 up to `investigation_time`.

The header of the exported file `sigma_epsilon_XXX.csv` has changed,
to indicate  that the values correspond to inter event sigma.

`.rst` has been added to the list of default formats: this means that
now the `.rst` report of a calculation can be exported directly.

The `dmg_by_asset` exporter for `modal_damage_state=true` was buggy, causing
a stddev column to be exported incorrectly. It has been fixed.

There were a few bugs in the `tot_losses` and `tot_curves` exporters in
event based risk calculations which have been fixed (a wrong header and
an inconsistency with the sum of the aggregate losses by tag).

When computing loss curves for small periods the engine was producing NaNs
if there were not enough events to produce reliable numbers. Such NaNs have
been replaced with zeros because the reason for having not enough events
was discarding the small losses.

There was an ordering bug in the exporter of the asset loss curves, causing
the curves to be associated to the wrong asset IDs, in some cases. It has
been fixed.

If `aggregate_by` was missing or empty, ebrisk calculations were exporting
empty aggregate curves files. Now nothing is exported, as it should be.

We fixed a bug with quotes when exporting CSV outputs.

## Bug fixes

We fixed an encoding issue on Windows, so that the calculation descriptions
where incorrectly displayed on the WebUI for UTF8 characters.

We fixed a memory issue in calculations using the `nrcan15_site_term`
GMPE: unnecessary deep copies of large arrays were made and large
calculations could fail with an out of memory error.

[Avinash Singh](https://github.com/AvinashSingh786) pointed out that the
`bin_width parameter` was not passed to
`openquake.hmtk.faults.mtkActiveFaultModel.build_fault_model` in
the Hazard Modellers Toolkit. 
[Graeme Weatherill](https://github.com/g-weatherill) fixed the issue
(https://github.com/gem/oq-engine/pull/5567).

There was a bug when converting USGS ShakeMap files into numpy arrays, since
the wrong formula was used. Fortunately the effect on the risk is small.

The zip exporter for the input files was incorrectly flattening the tree
structure: it has been fixed.

There was a BOM bug (Byte Order Mark: a nonprintable character added by
Excel to CSV files) that was breaking the engine when reading CSV exposures:
it has been fixed.

The procedure parsing exposure files has been fixed and now
`Exposure.read(fnames).assets` returns a list of `Asset` objects
suitable for a line-by-line database importer.

The extract API for extracting ruptures was affected by an ordering bug,
causing the QGIS plugin to display the ruptures incorrectly in same
cases.

We fixed a type error in the command `oq engine --run job.ini --param`.

## New checks

We added a limit on the maximum data transfer in disaggregation, to avoid
running out of memory in large calculations.

We added a limit of 1,000 sources when `disagg_by_src=true`, to avoid
disastrous performance effects.

Setting a negative number of cores in the `openquake.cfg` file, different
from -1, it is now an error.

If the GSIM logic tree file is missing a TRT, a clear error is raised
early.

A source with multiple `complexFaultGeometry` nodes is now invalid, while
before all the nodes except the first were silently discarded.

Instead of silently truncating inputs, now `baselib.hdf5.read_csv`
(used for reading all CSV files in the engine) raises an error when
a string field exceeds its expected size.

Instead of magically inferring the intensity measure levels
from the vulnerability functions, now the engine raises a clear error
suggesting to the user the levels to use.

Case-similar field names in the exposure are now an error: for instance
a header like `id,lon,lat,taxonomy,number,ID,structural` would be an
error since `id` and `ID` are the same field apart from the case.

There is a clear error when instantiating `hazardlib.geo.mesh.Mesh` with
arrays of incorrect shape.

There is a clear error message if the enlarged bounding box of the
sources does not intersect the sites, which is common in case of
mistakes like inverting longitude with latitude or using the exposure
for the wrong country.

## Warnings

Now we raise a warning when there is a different number of levels per IMT.
This helps finding accidental inconsistencies. In the future the warning
could be turned into an error.

We are logging an error message when the bounding box
of a source exceeds half the globe, which is usually a mistake.

We added a warning on classical calculations too big to be run,
based on the product (number of sites) x (number of levels) x 
(max number of gsims) x (max source multiplicity).

We improved the error message for duplicated sites, as well as the
error message for duplicated nodal planes.

We improved the error message in case of CSV exposures with wrong headers.

## oq commands

`oq check_input` was enhanced to accept multiple files. Moreover, it
checks complex fault geometries and prints an error if it discovers
issues, such as the error "get_mean_inclination_and_azimuth() requires
next mesh row to be not shallower than the previous one". Finally,
when applied to exposures, `oq check_input` warns about assets with field
number >= 65536.

`oq reduce_sm` has been parallelized, so it is much faster when there
are multiple files in the source model.

`oq reduce` has been renamed as `oq sample`, to avoid any confusion with
`oq reduce_sm`.

`oq info` has been fixed to work on a zmq cluster, thus avoiding the
dreaded "zmq not started" error. Moreover, `oq info source_model_logic_tree.xml`
now works correctly even for source models in format NRML/0.4. Finally,
the commands `oq info --<what>` have been changed to `oq info <what>` with
`<what>` one of "calculators", "gsims", "imts", "views", "exports",
"extracts", "parameters", "sources", "mfds".

`oq compare -s` has been enhanced to accept a file name with the control sites,
i.e. the sites where to perform the comparison, as a list of site IDs.

`oq run` has now an option `--calc-id`: this is useful when starting a bunch
of calculations in parallel, to avoid race conditions on the datastores.

`oq postzip` sends a zip file to the WebUI and start a calculation;
it also works on a `job.ini` file, by first zipping the associated files.

`oq plot sources?` now works with all kind of sources, except UCERF sources.
For nonparametric sources it is a lot faster than it was, since now it tries
to display only the convex hull of the underlying ruptures. It also has new
features, such as the ability to specify the sources to plot, an upper limit
on the number of sources and the kind of sources to plot.

`oq plot disagg?` has been fixed (there was an annoying
`ValueError: too many values to unpack (expected 1)` when specifying the
poe_id parameters).

`oq plot` accepts a flag `-l, --local` meaning that the local engine server
should be used instead of completely bypassing the server. This is useful
when debugging the web API.

`oq workerpool` immediately starts an oq-zworkerpool process.

## Other

As always there was a lot of documentation work on the 
[advanced manual](https://github.com/gem/oq-engine/tree/engine-3.9/doc/adv-manual)
and on the [Risk FAQ page](https://github.com/gem/oq-engine/blob/engine-3.9/doc/faq-risk.md). 
We also improved the docs about the parallelization 
features of the engine (i.e. openquake.baselib.parallel).

We added a demo for nonparametric sources, one for multipoint sources
and we extended the event based hazard demo to use sources of two different
tectonic region types.

In production installations, if the zmq distribution mode is enabled,
the zmq workers are now started when the DbServer starts. This makes
configuration errors (if any) immediately visible in the DbServer logs.

The configuration file `openquake.cfg` has been cleaned up, by removing
a couple of obsolete parameters.

The module `openquake.hmtk.mapping` has been removed. The reason is that it
depended on the library basemap, which has been abandoned years ago by its
authors and it is basically impossible to install on some platforms, notably
macOS.

The usage of .yml files in the HMTK has been deprecated. In the next release
they will be replaced with .toml files.

There was a lot of activity to make the engine work with Python 3.8
and the latest versions of the scientific libraries. Currently the
engine works perfectly with Python 3.6, 3.7 and 3.8; internally we are
using Python 3.7 for production and Python 3.8 for testing. 
The Linux packages that we are distributing are still using 
Python 3.6, but in the next version of the engine we will fully 
switch to Python 3.8.

The QGIS plugin can now interact with an engine server using a
version of Python with a different pickle protocol, like Python 3.8.