Release notes v3.1#

This release features several improvements to the engine calculators and a few important bug fixes. Over 150 issues were closed. For the complete list of changes, please see the changelog: https://github.com/gem/oq-engine/blob/engine-3.1/debian/changelog . Here is a summary.

Bug fixes#

We discovered a performance regression in engine 2.9 and 3.0. The regression affects large hazard calculations and makes computing the statistics within the main calculation at best slow and at worse impossible because of data transfer issues / out-of-memory errors. If you have engine 2.9 and 3.0 you can still compute the statistics - even for large calculations - but only in post-processing, i.e. with the --hc option. This is not needed anymore. Actually, computing the statistics within the main calculation is as fast as in post-processing and much faster and memory-efficient than it was in engine 2.8 and previous versions. The price to pay for the gain was changing the way the PoEs are stored and the extraction routines. As a consequence, you cannot export with engine 3.1 the hazard curves generated by previous versions of the engine.

We fixed a long standing performance bug in scenario calculations. The filtering of the sites according to the integration distance was done too late. Now it is done before even starting the calculation. If you have 1 million sites with only 1 thousand within the integration distance, then 990,000 sites are discarded up front. This makes it convenient to run scenarios with large exposures because only the sites and assets close to the rupture will be considered, even without the user specifying a region constraint in the job.ini file.

The user Parisa on the mailing list discovered a regression in the GMF importer from XML: it was not working anymore, unless a sites.csv file was specified. This has been fixed. The XML importer is still deprecated and the recommendation is to import GMFs in .csv format an with an explicit sites.csv file.

Our Canadian users discovered a bug in the hazard curves computed from nonparametric sources, causing the hazard curves to saturate at high intensities. It turns out the source of the bug was a numeric rounding issue that we fixed.

We discovered a bug affecting the classical UCERF calculator, such that the computation was hanging without a clear reason. It was fixed by simplifying the distribution mechanism.

We discovered a rare bug when reading ruptures from the datastore. Due to numeric errors in the conversion from 64 bits to 32 bits in the geometries, ruptures barely within the integration instance, could be read as outside the integration distance, then causing an error. This is a very rare case, but it may happen, especially in calculation with millions of ruptures. We fixed the issue by discarding such borderline ruptures.

We discovered a subtle bug in event based risk calculations, happening only when storing the GMFs and when using vulnerability functions with PMF: the results were depending on the order of the tasks storing the ground motion fields. This has been fixed by speciying the order in which the GMFs are read (they are read by event ID). In general this makes it easier to debug risk calculations reading the GMFs from the datastore.

We discovered a tricky bug affecting calculations with sites and/or sources crossing the international date line. The bug has been there for years. In some cases, some sources could be incorrectly discarded, thus producing a hazard lower than reality. The solution of the bug involved completely discarding our old prefiltering mechanism.

Improvements to the filtering/prefiltering mechanisms#

We began improving the source prefiltering mechanism in engine 3.0 and we kept working on that. Now the prefiltering is performed in parallel and it is so efficient that we can prefilter millions of sources in a matter of minutes. Moreover the prefiltering is done only on the controller node and not duplicated on the workers nodes. The change improved the performance of the calculators but also increased the data transfer. So we changed the engine to make it able to read the site collection directly from the datastore, thus saving GBs of data transfer. In cluster situations the new approach requires setting a shared directory otherwise the data transfer will be substantial. Due to the new prefiltering, is was possible to remove from the classical calculator the tiling mechanism, which was very complex and fragile, and still be faster than before. As a consequence, the data transfer is lower now than it ever was. Also, having removed the old prefiltering mechanism, a very tricky bug over two year old has disappeared .

While the prefiltering involves sources and it is performed before the calculation starts, the filtering involves ruptures and is performed while the calculation runs. The filtering works by computing a lots of distances from each rupture to each hazard site and it can be slow. In engine 3.1 we reworked completely the filtering: now the default distance used for filtering is the so-called rrup distance which is nothing else than the Euclidean distance. Such distance can be computated with the scipy.spatial.distance.cdist function which is extremely efficient: in some cases we measured speedups of over an order of magnitude with respect to the previous approach.

The user can specify the distance used in the filtering by setting the parameter filter_distance in the job.ini file. For instance filter_distance=rjb would restore the Joyner-Boore distance that we used in previous versions of the engine (actually not exactly the same because now even the rjb distance is calling scipy.spatial.distance.cdist internally). Due to the changes both to the default filter distance and to the low level function to compute the distance, the numbers produced be engine 3.1 will be slighly different from the numbers generated by previous versions. The differences however are minor and actually the current approach is more accurate than the previous one.

A refactoring of the site collection object made the computation of distances faster, not only for ruptures with complex geometries but also for ruptures coming from point sources. The net effect can be very significant if your computation is dominated by distance calculations. This in practice happens if you have a single GMPE; for instance, in and old model for South America with a single realization the performance of the full computation (i.e. distances + GMPE) nearly doubled. In complex calculations with dozens of GMPEs the dominating factor is the time spent in the GMPE calculations and so you will not see any significant speedup due to the improvements in the filtering mechanism, even if you should still see some speedup due to the prefiltering improvements.

Changes to hazardlib#

This release involved a substantial refactoring of hazardlib, mostly needed for the improvements in the filtering and prefiltering procedures.

There were several changes to the Surface classes. In particular, now PlanarSurfaces do not require anymore the rupture_mesh_spacing to be instantiated (it was not used anyway). Moreover the class hierarchy has been simplified and one method have been removed. The ContextMaker.make_contexts method now returns two values instead of three, and ruptures now have more attributes that they used to have. On the other hand, the attribute .source_typology was removed from Rupture classes, since it was not used. A .polygon property was added to each source class, to make the plotting of sources easier.

We fixed the serialization of griddedRuptures in the datastore.

We added in hazardlib.geo.utils an association function assoc(objects, sitecol, assoc_dist, mode) based on scipy.spatial.cKDTree which the engine uses to associate the assets to the hazard sites, to associate the site parameters to the hazard sites, or to associate a ShakeMap to an exposure.

The signature of stochastic_event_set was slightly changed, by removing the redundant sites parameter. The API of the correlation module was slightly changed.

The most visibile change is probably in the Probability Mass Function class. In order to instantiate a PMF object a list of pairs (probability, object) is required. In the past the probability had to be expressed as a Python Decimal instance, while now a simple float can be used. Before the sum of the probabilities was checked to be exactly equal to 1, now it must be equal to 1 up to a tolerance of 1E-12. Since the Decimals at the end were converted into floats anyway, it felt easier and more honest to work with floats from the beginning.

It should be noticed that in spite of the major refactoring (over 1,000 lines of Python code were removed) the changes to client code are very minor: the HMTK was unaffected and in the SMTK only two lines had to be fixed. So the change should not be a problems for users of hazardlib. If you run into problems, please ask on our mailing list.

Finally, we should mention that a new GMPE was added, a version of the Yu et al. (2013) GMPE supporting Mw, contributed by M. Pagani and Changlong Li.

Risk#

We unified the region and region_constraint parameters of the job.ini. Old calculations with a region_constraint parameter will keep working, but you will see a deprecation warning telling to replace region_constraint with region. This is what the engine is doing internally now.

We kept improving our risk exporters. In particular, if the exposure has tags (a concept introduced in engine 2.9) now the exported CSV files (like average losses and such) will also contain the tag information. This makes it possible to aggregate the outputs by tag. Also, the risk demos have been updated so that the exposures have tags.

We added two new scenario outputs dmg_by_event and losses_by_event and we changed slightly the format of the realizations output. We added an XML exporter for the site model: this is useful to check how the site model parameters were associated to the hazard sites. The work on the risk outputs has been reflected on the QGIS plugin.

Experimental integration with USGS ShakeMaps#

Engine 3.1 features a new experimental integration with USGS ShakeMaps. Essentially, when running a scenario risk or scenario damage calculation, you can specify in the job.ini the ID of an USGS ShakeMap. The engine will automatically

download the ShakeMap (the grid.xml and uncertainty.xml files)
associate the GMFs from the ShakeMap to the exposure sites
consider only the IMTs required by the risk model
create a distribution of GMFs with the right standard deviation, i.e. the standard deviation provided by the ShakeMap
perform the desired risk calculation, considering also the cross correlation effect for multiple IMTs.

The feature is highly experimental and provided only for the purpose of beta testing by external users.

WebUI#

There was a bug with the groups in the WebUI, a features used only if the authentication is enabled, which is not the default. Due to the bug, users of a given group could not see calculations belonging to their group, unless they were admin users. This has been fixed.

When a job dies abruptly, its status in the WebUI can remain is_running. Now restarting the WebUI or just the DbServer fixes the status of jobs which are not really running. Also, now a job that did not start because there were no live celery nodes is properly marked as failed.

The input files used to perform a calculation are now automatically zipped, saved in the datastore, and made accessible to the WebUI and the QGIS plugin. This is very useful if you need to repeat a calculation performed by another user on a different machine.

New checks and warnings#

The engine always had a check between the provided GMPEs and the GMF correlation model, since not all of the GMPEs supports correlation. However, the check was done in the middle of the computation. Now the check is done before starting the calculation, and the error message is clearer.

We added a warning in the case of a logic tree with more than 10,000 realizations, suggesting the user to sample it because otherwise the calculation will likely run out of memory.

We warn the user when the source model contains duplicated source IDs, suggesting to set the optimize_same_id_sources flag if the sources are really duplicated (the ID could be duplicated without the sources being so).

For scenario calculations, we added a check that the number of realizations must equal the number of distinct GMPEs.

We added a check against duplicated GMPEs in the logic tree.

We added an early check on the Python version, that must be greater or equal than 3.5.

oq commands#

We added two new commands, which are useful in a cluster environment:

oq celery status prints the number of active tasks
oq celery inspect prints some information on the active tasks

We fixed a bug in oq db get_executing_jobs, which now returns only the jobs which are actually running. This required fixing the engine that was not storing the PIDs.

We fixed the commands oq to_shapefile and oq from_shapefile by adding “YoungsCoppersmithMFD” and “arbitraryMFD” to the list of serializable magnitude-frequency distributions.

Packaging#

For the first time we provide packages for Ubuntu 18.04 (Bionic) and Fedora 28. Old versions of Ubuntu (16.04 and 14.04) are still supported, and we keep providing packages for Red Hat distributions, as well as installers for Windows and macOS, docker containers, virtual machines and more.

We also updated our Python dependencies, in particular we include in our packages numpy 1.14.2, scipy 1.0.1, h5py 2.8, shapely 1.6.4, rtree 0.8.3, pyzmq-17.0.0, psutil-5.4.3 and Django 2.0.4.