Extracting data from calculations

The engine has a relatively large set of predefined outputs, that you can get in various formats, like CSV, XML or HDF5. They are all documented in the manual and they are the recommended way of interacting with the engine, if you are not tech-savvy.

However, sometimes you must be tech-savvy: for instance if you want to post-process hundreds of GB of ground motion fields produced by an event based calculation, you should not use the CSV output, at least if you care about efficiency. To manage this case (huge amounts of data) there a specific solution, which is also able to manage the case of data lacking a predefined exporter: the Extractor API.

There are actually two different kind of extractors: the simple Extractor, which is meant to manage large data sets (say > 100 MB) and the WebExtractor, which is able to interact with the WebAPI and to extract data from a remote machine. The WebExtractor is nice, but cannot be used for large amount of data for various reasons; in particular, unless your Internet connection is ultra-fast, downloading GBs of data will probably send the web request in timeout, causing it to fail. Even if there is no timeout, the WebAPI will block, everything will be slow, the memory occupation and disk space will go up, and at certain moment something will fail.

The WebExtractor is meant for small to medium outputs, things like the mean hazard maps - an hazard map containing 100,000 points and 3 PoEs requires only 1.1 MB of data at 4 bytes per point. Mean hazard curves or mean average losses in risk calculation are still small enough for the WebExtractor. But if you want to extract all of the realizations you must go with the simple Extractor: in that case your postprocessing script must run in the remote machine, since it requires direct access to the datastore.

Here is an example of usage of the Extractor to retrieve mean hazard curves:

>>> from openquake.calculators.extract import Extractor
>>> calc_id = 42  # for example
>>> extractor = Extractor(calc_id)
>>> obj = extractor.get('hcurves?kind=mean&imt=PGA')  # returns an ArrayWrapper
>>> obj.array.shape  # an example with 10,000 sites and 20 levels per PGA
(10000, 20)
>>> extractor.close()

Here is an example of using the WebExtractor to retrieve hazard maps. Here we assumes that there is available in a remote machine where there is a WebAPI server running, a.k.a. the Engine Server. The first thing to is to set up the credentials to access the WebAPI. There are two cases:

  1. you have a production installation of the engine in /opt
  2. you have a development installation of the engine in a virtualenv

In both case you need to create a file called openquake.cfg with the following format:

[webapi]
server = http(s)://the-url-of-the-server(:port)
username = my-username
password = my-password

username and password can be left empty if the authentication is not enabled in the server, which is the recommended way, if the server is in your own secure LAN. Otherwise you must set the right credentials. The difference between case 1 and case 2 is in where to put the openquake.cfg file: if you have a production installation, put it in your $HOME, if you have a development installation, put it in your virtualenv directory.

The usage then is the same as the regular extractor:

>>> from openquake.calculators.extract import WebExtractor
>>> extractor = WebExtractor(calc_id)
>>> obj = extractor.get('hmaps?kind=mean&imt=PGA')  # returns an ArrayWrapper
>>> obj.array.shape  # an example with 10,000 sites and 4 PoEs
(10000, 4)
>>> extractor.close()

If you do not want to put your credentials in the openquake.cfg file, you can do so, but then you need to pass them explicitly to the WebExtractor:

>>> extractor = WebExtractor(calc_id, server, username, password)

Plotting

The (Web)Extractor is used in the oq plot command: by configuring openquake.cfg it is possible to plot things like hazard curves, hazard maps and uniform hazard spectra for remote (or local) calculations. Here are three examples of use:

$ oq plot 'hcurves?kind=mean&imt=PGA&site_id=0' <calc_id>
$ oq plot 'hmaps?kind=mean&imt=PGA' <calc_id>
$ oq plot 'uhs?kind=mean&site_id=0' <calc_id>

The site_id is optional; if missing only the first site (site_id=0) will be plotted. If you want to plot all the realizations you can do:

$ oq plot 'hcurves?kind=rlzs&imt=PGA' <calc_id>

If you want to plot all statistics you can do:

$ oq plot 'hcurves?kind=stats&imt=PGA' <calc_id>

It is also possible to combine plots. For instance if you want to plot all realizations and also the mean the command to give is:

$ oq plot 'hcurves?kind=rlzs&kind=mean&imt=PGA' <calc_id>

If you want to plot the mean and the median the command is:

$ oq plot 'hcurves?kind=quantile-0.5&kind=mean&imt=PGA' <calc_id>

assuming the median (i.e. quantile-0.5) is available in the calculation. If you want to compare say rlz-0 with rlz-2 and rlz-5 you can just just say so:

$ oq plot 'hcurves?kind=rlz-0&kind=rlz-2&kind=rlz-5&imt=PGA' <calc_id>

You can combine as many kinds of curves as you want. Clearly if your are specifying a kind that is not available you will get an error.

Extracting ruptures

Here is an example for the event based demo:

$ cd oq-engine/demos/hazard/EventBasedPSHA/
$ oq engine --run job.ini
$ oq shell
IPython shell with a global object "o"
In [1]: from openquake.calculators.extract import Extractor
In [2]: extractor = Extractor(calc_id=-1)
In [3]: aw = extractor.get('rupture_info?min_mag=5')
In [4]: aw
Out[4]: <ArrayWrapper(1511,)>
In [5]: aw.array
Out[5]:
array([(   0, 1, 5.05, 0.08456118,  0.15503392, 5., b'Active Shallow Crust', 0.0000000e+00, 90.      , 0.),
       (   1, 1, 5.05, 0.08456119,  0.15503392, 5., b'Active Shallow Crust', 4.4999969e+01, 90.      , 0.),
       (   2, 1, 5.05, 0.08456118,  0.15503392, 5., b'Active Shallow Crust', 3.5999997e+02, 49.999985, 0.),
       ...,
       (1508, 2, 6.15, 0.26448786, -0.7442877 , 5., b'Active Shallow Crust', 0.0000000e+00, 90.      , 0.),
       (1509, 1, 6.15, 0.26448786, -0.74428767, 5., b'Active Shallow Crust', 2.2499924e+02, 50.000004, 0.),
       (1510, 1, 6.85, 0.26448786, -0.74428767, 5., b'Active Shallow Crust', 4.9094699e-04, 50.000046, 0.)],
      dtype=[('rup_id', '<u4'), ('multiplicity', '<u2'), ('mag', '<f4'), ('centroid_lon', '<f4'),
             ('centroid_lat', '<f4'), ('centroid_depth', '<f4'), ('trt', 'S50'), ('strike', '<f4'),
             ('dip', '<f4'), ('rake', '<f4')])
In [6]: extractor.close()