Release notes v3.6 ================== This release features several major new features (including completely revised disaggregation, automatic optimization of duplicated sources, fast exposure importer and taxonomy mapping) and lots of improvements, new checks and bug fixes. Nearly 200 pull requests were merged. For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.6/debian/changelog Disaggregation -------------- The most relevant development on the hazard side was the work on the disaggregation calculator. We changed substantially the business logic. While in previous versions of the engine we were disaggregating for all possible realizations in order to compute disaggregation statistics, in this version we gave up on statistics. Instead, we disaggregate only for a specific realization. The realization can be specified by the user with the `rlz_index` parameter in the job.ini file, or it can be determined automatically by the engine as the realization closest to the mean curve for the given disaggregation site. Moreover, now the disaggregation calculation works like a post-calculator (i.e. with the ``--hc`` option) and it is able to reuse information computed in its parent calculation: the net effect is that it is always faster than the corresponding classical calculation while in the past it was several times slower. We also fixed a couple of performance bugs: there was a slow operation ``truncnorm.cdf`` in an inner loop and ruptures outside the integration distance were not discarded. Finally, we changed the file names of the disaggregation outputs. For models with thousands of realizations, the disaggregation can easily be thousands of times faster than before. Classical PSHA calculator ------------------------- The engine is now smart enough to recognize duplicated sources appearing in different branches of the composite source model and to avoid redundant computations. Because this optimization is always on, the flag `optimize_same_id_sources` has been removed, as it has now been rendered useless. There are several models in the hazard mosaic with duplicates sources and the new optimization has a significant impact on those. Moreover the demo `LogicTreeCase2ClassicalPSHA` has become an order of magnitude faster than before thanks to the reduction of the duplicated sources. There was a big improvement in the computation of the statistical hazard curves which now is not only faster, but uses a lot less memory than before. The trick was to consider one site at the time, instead of a block of sites. As a consequence it is now possible to consider tens of thousands of realizations for hundreds of thousands of sites without requiring terabytes of RAM. Moreover the data transfer has been reduced by storing some auxiliary information in the datastore and reading it from the workers instead of transferring it via celery/rabbitmq. There was a substantial change in the way the tasks are distributed for a classical calculation. The engine has acquired the ability to estimate the runtime of each task and if the estimated time exceeds a `task_duration` parameter, the engine is able to split the task in subtasks that run in less than `task_duration` seconds. The user can set the `task_duration` manually in the `job.ini`, or the user can leave it empty; in that case the engine will figure out a reasonable value for it. The approach is not perfect since there are non-splittable sources, so there is a minimum size for a given subtask and sometimes subtasks taking much longer that the `task_duration` parameter may still appear: however, the new approach is a drastic improvement and the situation was never better than it is now. We added a check on sources with a suspiciously large spatial extent (more than 5,000 km) so that a warning is printed. Usually this means that there was a bug in the generation of the source model. We added a check on sources with suspicious hypo-depths and nodal plane distributions (i.e. distributions with duplicated values) since they make the calculation slower. In extra-large models saving some debugging information (eg. the number of sites affected by each source) was exceedingly slow, so now the information is stored only if there are fewer than 100,000 relevant sources. Logic trees ----------- There was a tricky bug with the `minimum_distance` feature in presence of multiple GSIMs in a logic tree branchset. Now each GSIMs keeps its own minimum distance; before they were all getting the same minimum distance, causing wrong results to be computed. Fortunaly the `minimum_distance` feature is rarely used (and only for internal purposes) so the bug is minor. The feature is documented here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/special-features.rst#gmpe-logic-trees-with-minimum_distance We implemented zero weights for intensity measure types that should be discarded in the GSIM logic tree. You can see the relevant documentation here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/special-features.rst#gmpe-logic-trees-with-weighted-imts We implemented risk logic trees, a.k.a. the *taxonomy mapping* feature. The idea is that users can map the taxonomy strings in their exposure to one or more vulnerability/fragility functions, with corresponding weights for each function assignment, to take into account the epistemic uncertainty in the exposure ⟷ vulnerability domain. The feature is documented here: https://github.com/gem/oq-engine/blob/engine-3.6/doc/adv-manual/risk-features.rst A big conceptual change (but with no impact on the user) was the simplification of the source model logic tree XML file. Before it was necessary to specify a `logicTreeBranchingLevel` node that was not used internally, now that node is optional. Old files will keep working, as long as the `logicTreeBranchingLevel` contains only a single subnode. The case of multiple subnodes is now correctly flagged as an error. Thanks to the change, source model logic trees, gsim logic tree, and risk logic trees are now stored in the same way internally. Lastly, we fixed a bug in source model logic trees with the options `applyToSources` and `applyToBranches` on; in some times a fake error about the source not being in the source model - even if it actually was - was raised. Event based hazard ------------------ We introduced a parameter `max_sites_per_gmf` in the job.ini (only for `event_based` calculations that are trying to store ground motion fields), with a default of 65,536 sites. A user trying to run an `event_based` calculation that has `ground_motion_fields = true`, with more than the number of sites permitted by `max_sites_per_gmf` will now get an early validation error instead of running out of memory after several hours of calculations. The `max_sites_per_gmf` limit can be raised beyond the default of 65,536 sites, at the user's own responsibility. We also added a limit of ``2**32`` events in event based calculations: this is a hard limit that cannot be raised. If your calculation produces more than 4 billion events, it will need to be be split into smaller calculations. Such calculations involving billions of ruptures would likely never work anyway, because it would eventually run out of memory. We added a check for missing ``intensity_measure_types``: this avoids cryptic errors in the middle of the computation of the ground motion fields. We optimized the rupture sampling procedure for point sources (which includes multi point sources and area sources). The improvement can be quite significant, for instance the generation of ruptures for a large multipoint source for Colombia became 30x faster using 12x less memory. We changed the way ruptures are stored internally: the `code` field in the `ruptures` dataset is now a unique checksum depending on the kind of rupture. Before it was an incremental number depending on the order of the Python module imports which was making debugging difficult. The rupture CSV exporter has been enhanced, and now it exports the rupture surface boundaries as 3D multipolygons instead of 2D multipolygons. We fixed a small bug in the rupture XML exporter, which was failing if the user did not specify the hazard sites. We added the ability to generate hazard curves without storing the GMFs, simply by setting the flags ``` hazard_curves_from_gmfs = true ground_motion_fields = false ``` This is useful when one is interested in the hazard curves generated by an `event_based` calculation but not in the ground motion fields themselves. Not storing the GMFs reduces the data transfer and the memory occupation. In engine 3.5 we changed the `gmf_data` CSV exporter to export a file ``sitemodel.csv`` instead of the file ``sitemesh.csv``. That change has been reverted because it was generating confusion. The right way to to export the site model information for the most recently completed calculation - which works for all calculators, not only for event based - is to use the command ``oq show sitecol > sitecol.csv`` Importing GMFs from CSV has been enhanced and now it does not require anymore the field `rlzi`: previously, this was a required field, but it was assumed to contain always the value `0`. On the other hand, now the GMF exporter to CSV does not export the field `rlzi`, because it is redundant: the association between events and realizations can be found in the events table and it is exported in the file `sigma_epsilon.csv`. In the `sigma_epsilon.csv` file, we renamed the field `eid` to `event_id` in order to avoid confusion with the naming used in the `gmf_data.csv` file (`event_id` is the 64 bit event ID in the `events` table in the datastore, `eid` is the 32 bit index to the event ID record). Event based risk ---------------- There was a huge refactoring of all risk calculators. As a consequence the `event_based_risk` calculator has become simpler and faster than before (twice as fast in some cases). In the `ebrisk` calculator it is now possible to aggregate by `asset_id` and therefore to produce individual loss curves and maps for each asset. Needless to mention, this is only viable for exposures of manageable size. There was some work to make the ``losses_by_event`` exporter for the ``ebrisk`` calculator more similar to the ones for ``event_based_risk`` and for ``scenario_risk``. We fixed a bug in the `agg_curves-rlzs` and `agg_curves-stats` outputs in ``ebrisk``: they were missing the ``units`` compared to the same outputs coming from the ``event_based_risk`` calculator. This was breaking the QGIS plugin. We changed the ``agglosses`` exporter in ``scenario_risk`` calculations, by adding a column with the realization index. The `agg_curves` exporter for event based risk was broken if the exposure was imported in the parent calculation and not in the child calculation. We fixed a bug in the exporter of the aggregate loss curves coming from an ``ebrisk`` calculation: now the loss ratios are computed correctly even in presence of occupants. Before the exporter was writing incorrect loss ratios to the output file. Hazardlib --------- Graeme Weatherill (@g-weatherill) contributed a finite rupture option to the Germany-adjusted Cauzzi and Derras GMPEs. Moreover, he contributed the Tromans et al. (2019) adjustable GMPE, used for a nuclear power plant in the UK. Chris van Houtte (@cvanhoutte) contributed the Van Houtte et al. (2018) Significant duration model for New Zealand. Robin Gee (@rcgee) fixed a bug in the GMPE Sharma (2009): there was a key error if the intensity measure level specified in the job.ini included periods that required interpolation. Marco Pagani (@mmpagani) discovered a bug in `calc_hazard_curves` which was failing with a cryptic `AttributeError: 'NoneType' object has no attribute 'within_bbox'` when used in parallel mode. It has been fixed. Risk ---- The CSV importer for the exposure has been optimized. Before, for legacy reasons, the importer was converting the CSV records into node objects similar to the ones coming from the XML importer and then it was reusing the XML logic. Now we are doing the opposite: the XML importer is producing records and reusing the logic of the CSV importer. Thanks to this change for large CSV exposure the new importer is 4-5 times faster and uses over 10 times less memory than before. Since a long time ago the engine has the ability to reduce the hazard site collection (which can be large, think of a fine grid) only to the locations where there are assets. Such feature has been optimized in this release, up to a spectacular extent in some cases: we measured a speedup from 2h to 0.1s for Canada. We changed how zipped exposures are managed by the engine. In version 3.5 a zipped exposure was expected to contain an XML file with the same name of the archive, apart from the extension. Because of that the `job.ini` file had to contain a line ``exposure_file = .xml`` while now it requires a line ``exposure_file = .zip``, which is clearer. The change was requested by the risk team in the context of the CRAVE project because it simplifies the unzipping of the exposure. Unzipping will overwrite files of the same name already present, but a warning will be printed and the original files will be not lost, but renamed with a ``.bak`` extension. We added a consistency check between statistics for calculations leveraging the ``--hc`` option, because some users were making mistakes like trying to compute means in the child calculation without having them in the parent calculation. Now one gets a clear error message. We fixed a bug in ``classical_damage`` from CSV with discrete fragility functions: for hazard intensity measure levels different from the fragility levels, the engine was giving incorrect results. Vulnerability functions with the beta distribution must satisfy some consistency requirements if the coefficients of variation are nonzero. Unfortunately the consistency check were missing and it was possible to accept invalid functions raising and error in the middle of the computation. Now the error will be raised much early, at the time of the instantiating the vulnerability functions. See #4841 for more details. Hyeuk Ryu (@dynaryu) discovered a bug in the `agg_loss-curves` outputs for the `event_based_risk` calculators, which has been fixed. Finally there were some improvements to the ``multi_risk`` calculator in the context of the CRAVE project. In particular now the engine is able to manage the geometries of volcanic perils like lava, lahar and pyroclastic flow and it is also able to manage other binary perils without requiring the introduction of new intensity measure types. General changes --------------- The CSV exporters have been enhanced: now there is an additional line before the header, starting with a ``#`` character, containing some metadata, like the date when the file the generated, the version of the engine that generated it, and some relevant parameters, like the investigation time in the case of the hazard outputs. In the future we may add even more metadata and extend the approach to other outputs. Before release 3.3, the engine had the ability to associate site model parameters to hazard sites on a grid. This feature was sometimes buggy and removed, by recommending to the users the command `oq prepare_site_model` instead. `oq prepare_site_model` is able to produce a `site_model.csv` file with sites on the grid and it performs the associations explicitly, once and for all. In this release, we restored the ability to perform the association directly in the engine. This is less efficient than using `oq prepare_site_model`, since the same associations will be recomputed during each run. It is still useful for people wanting to experiment with the grid spacing: they can run several calculations and when they are happy with the grid spacing, run `oq prepare_site_model` and fix the site model once and for all with the preferred grid spacing. We fixed a performance regression in the `ucerf_classical` calculator, due to a change of logic in engine 3.5, which was trying to filter thousands of sources in the controller node instead than in the workers, thus becoming extra-slow. We decided to change the `realizations.csv` output for scenario calculations, by replacing the `branch_path` field with the GSIM representation. This is more informative for the users and more convenient for the QGIS plugin too. IT -- The job queue first introduced in engine-3.5 is now enabled by default. This means than only on job can run at a given time for a given engine instance. The progress report has been improved: before in large classical calculations the progress started to be printed too late, even days after the start of the calculation. We improved the `oq abort` command to remove submitted jobs too. Deleting a calculation in the engine has always been tricky in the case of multiple users. In this release we fixed several issues and now an user can delete all of her calculations with the command `oq reset`. The engine will look inside the database and correctly remove the calculations of the user, including all the relevant .hdf5 files. We improved the `oq plot` command by adding several new kinds of plot. They are still for internal use only (i.e. introspection and debugging). We extended the command `oq db` to run generic queries for the `openquake` user. Other users can only run ``SELECT`` queries. There was a bug in `oq webui start` not supplying the `--noreload` argument that has been fixed (the reload functionality of the Django development server interferes with SIGCHLD and causes zombies). We fixed another bug with the `--hc` functionality in a multi-user situation, due to the fact that the engine was searching the the datastore of the parent calculation in the wrong directory. There is now a better error message if the shared directory is not mounted. Source models can now be serialized in TOML format, which is useful for debugging purposes. Libraries and Python 3.7 ------------------------ In this releases we updated some of our libraries (numpy from version 1.14 to 1.16 and scipy from version 1.0.1 to 1.3.0) to make it possible to use the engine with Python 3.7. We actually have a cluster using Python 3.7 in production. In the future we may distribute installers for Windows and macOS based on Python 3.7, but for the moment Python 3.6 is still the only officially supported version and we not plan to abandon Python 3.6 any time soon. We raised the minimum version for h5py from 2.8.0 to 2.9.0, fixed some compatibility issue with Django 2.1 and 2.2 and fixed several Python 3.7 deprecation warnings. Finally we removed the external dependency from the mock module since it is included in the standard library since Python 3.3. Deprecations/removals --------------------- For years the engine has been able to import ground motion fields and hazard curves in CSV format and NRML format, with the NRML format deprecated. Now finally the NRML importers have been removed. There was an old deprecated GMF exporter in NRML format for scenario calculations. It has been finally deprecated. You should use the CSV exporter thas has been available for years instead. We deprecated the XML disaggregation exporters in favor of the CSV exporters. We removed the long time deprecated `agg_loss_table` exporter since now all the needed information is in the `losses_by_event` exporter. We switched officially the testing framework from nosetests to pytest.