# Reading outputs with pandas¶

If you are a scientist familiar with Pandas, you will be happy to know that it is possible to process the engine outputs with it. Here we will give an example involving hazard curves.

Suppose you ran the AreaSourceClassicalPSHA demo, with calculation ID=42; then you can process the hazard curves as follows:

```
>>> from openquake.baselib.datastore import read
>>> dstore = read(42)
>>> df = dstore.read_df('hcurves-stats', index='lvl',
... sel=dict(imt='PGA', stat='mean', site_id=0))
site_id stat imt value
lvl
0 0 b'mean' b'PGA' 0.999982
1 0 b'mean' b'PGA' 0.999949
2 0 b'mean' b'PGA' 0.999850
3 0 b'mean' b'PGA' 0.999545
4 0 b'mean' b'PGA' 0.998634
.. ... ... ... ...
44 0 b'mean' b'PGA' 0.000000
```

The dictionary `dict(imt='PGA', stat='mean', site_id=0)`

is used to select
subsets of the entire dataset: in this case hazard curves for mean PGA for
the first site.

If you do not like pandas, or for some reason you prefer plain numpy arrays,
you can get a slice of hazard curves by using the `.sel`

method:

```
>>> arr = dstore.sel('hcurves-stats', imt='PGA', stat='mean', site_id=0)
>>> arr.shape # (num_sites, num_stats, num_imts, num_levels)
(1, 1, 1, 45)
```

Notice that the .sel method does not reduce the number of dimensions of the original array (4 in this case), it just reduces the number of elements. It was inspired by a similar functionality in xarray.

## Example: how many events per magnitude?¶

When analyzing an event based calculation, users are often interested in
checking the magnitude-frequency distribution, i.e. to count how many
events of a given magnitude present in the stochastic event set for
a fixed investigation time and a fixed `ses_per_logic_tree_path`

.
You can do that with code like the following:

```
def print_events_by_mag(calc_id):
# open the DataStore for the current calculation
dstore = datastore.read(calc_id)
# read the events table as a Pandas dataset indexed by the event ID
events = dstore.read_df('events', 'id')
# find the magnitude of each event by looking at the 'ruptures' table
events['mag'] = dstore['ruptures']['mag'][events['rup_id']]
# group the events by magnitude
for mag, grp in events.groupby(['mag']):
print(mag, len(grp)) # number of events per group
```

If you want to now the number of events per realization and per stochastic
event set you can just refine the groupby clause, using the list
`['mag', 'rlz_id', 'ses_id']`

instead of simply `['mag']`

.

Given an event, it is trivial to extract the ground motion field
generated by that event, if it has been stored (warning: events
producing zero ground motion are not stored). It is enough to read
the `gmf_data`

table indexed by event ID, i.e. the `eid`

field:

```
>>> eid = 20 # consider event with ID 20
>>> gmf_data = dstore.read_df('gmf_data', index='eid') # engine>3.11
>>> gmf_data.loc[eid]
sid gmv_0
eid
20 93 0.113241
20 102 0.114756
20 121 0.242828
20 142 0.111506
```

The `gmv_0`

refers to the first IMT; here I have shown an example with a
single IMT, in presence of multiple IMTs you would see multiple columns
`gmv_0, gmv_1, gmv_2, ...`

. The `sid`

column refers to the site ID.

As a following step, you can compute the mean hazard curves at each site from the ground motion values by using the function gmvs_to_poes, available since engine 3.10:

```
>>> from openquake.commonlib.calc import gmvs_to_poes
>>> gmf_data = dstore.read_df('gmf_data', index='sid')
>>> df = gmf_data.loc[0] # first site
>>> gmvs = [df[col].to_numpy() for col in df.columns
... if col.startswith('gmv_')] # list of M arrays
>>> oq = dstore['oqparam'] # calculation parameters
>>> poes = gmvs_to_poes(gmvs, oq.imtls, oq.ses_per_logic_tree_path)
```

This will return an array of shape (M, L) where M is the number of intensity measure types and L the number of levels per IMT.