Event Based Risk Outputs#
The asset loss table#
When performing an event based risk calculation the engine keeps in memory a table with the losses for each asset and each event, for each loss type. It is usually impossible to fully store such a table, because it is extremely large; for instance, for 1 million assets, 1 million events, 2 loss types and 4 bytes per loss ~8 TB of disk space would be required. It is true that many events will produce zero losses because of the maximum_distance and minimum_intensity parameters, but still, the asset loss table is prohibitively large and for many years could not be stored. In engine 3.8 we made a breakthrough: we decided to store a partial asset loss table, obtained by discarding small losses, by leveraging on the fact that loss curves for long enough return periods are dominated by extreme events, i.e. there is no point in saving all the small losses.
To that aim, the engine honours a parameter called
minimum_asset_loss
which determines how many losses are discarded
when storing the asset loss table. The rule is simple: losses below
minimum_asset_loss
are discarded. By choosing the threshold
properly in an ideal world
the vast majority of the losses would be discarded, thus making the asset loss table storable;
the loss curves would still be nearly identical to the ones without discarding any loss, except for small return periods.
It is the job of the user to verify if 1 and 2 are true in the real world.
He can assess that by playing with the minimum_asset_loss
in a small
calculation, finding a good value for it, and then extending it to the large
calculation. Clearly, it is a matter of compromise: by sacrificing precision
it is possible to reduce enormously the size of the stored asset loss table
and to make an impossible calculation possible.
Starting from engine 3.11 the asset loss table is stored if the user specifies
aggregate_by = id
in the job.ini file. In large calculations it is extremely easy to run out of
memory or make the calculation extremely slow, so we recommend
not to store the asset loss table. The functionality is there for the sole
purpose of debugging small calculations, for instance, to see the effect
of the minimum_asset_loss
approximation at the asset level.
For large calculations usually one is interested in the aggregate loss
table, which contains the losses per event and per aggregation tag (or
multi-tag). For instance, the tag occupancy
has the three values
“Residential”, “Industrial” and “Commercial” and by setting
aggregate_by = occupancy
the engine will store a pandas DataFrame called risk_by_event
with a
field agg_id
with 4 possible value: 0 for “Residential”, 1 for
“Industrial”, 2 for “Commercial” and 3 for the full aggregation.
NB: if the parameter aggregate_by
is not specified, the engine will
still compute the aggregate loss table but then the agg_id
field will
have a single value of 0 corresponding to the total portfolio losses.
The Probable Maximum Loss (PML) and the loss curves#
Given an effective investigation time and a return period,
the engine is able to compute a PML for each
aggregation tag. It does so by using the function
openquake.risklib.scientific.losses_by_period
which takes as input
an array of cumulative losses associated with the aggregation tag, a
list of the return periods, and the effective investigation time. If
there is a single return period the function returns the PML; if there are
multiple return periods it returns the loss curve. The two concepts
are essentially the same thing, since a loss curve is just an array of
PMLs, one for each return period.
For instance:
>>> from openquake.risklib.scientific import losses_by_period
>>> losses = [3, 2, 3.5, 4, 3, 23, 11, 2, 1, 4, 5, 7, 8, 9, 13, 0]
>>> [PML_500y] = losses_by_period(losses, [500], eff_time=1000)
>>> PML_500y
13.0
computes the Probably Maximum Loss at 500 years for the given losses
with an effective investigation time of 1000 years. The algorithm works
by ordering the losses (suppose there are E losses, E > 1) generating E time
periods eff_time/E, eff_time/(E-1), ... eff_time/1
, and log-interpolating
the loss at the return period. Of course this works only if the condition
eff_time/E < return_period < eff_time
is respected.
In this example there are E=16 losses, so the return period must be in the range 62.5 .. 1000 years. If the return period is too small the PML will be zero
>>> losses_by_period(losses, [50], eff_time=1000)
array([0.])
while if the return period is outside the investigation range, we will refuse the temptation to extrapolate and we will return NaN instead:
>>> losses_by_period(losses, [1500], eff_time=1000)
array([nan])
The rules above are the reason why you will see zeros or NaNs in the loss curves generated by the engine sometimes, especially when there are too few events (the valid range will be small and some return periods may slip outside the range).
Aggregate loss curves#
In some cases the computation of the PML is particularly simple and
you can do it by hand: this happens when the ratio
eff_time/return_period
is an integer. Consider for instance a case with
eff_time=10,000
years and return_period=2,000
years;
suppose there are the following 10 losses aggregating the commercial
and residential buildings of an exposure:
>>> import numpy as np
>>> losses_COM = np.array([123, 0, 400, 0, 1500, 200, 350, 0, 700, 600])
>>> losses_RES = np.array([0, 800, 200, 0, 500, 1200, 250, 600, 300, 150])
The loss curve associate the highest loss to 10,000 years, the second highest to 10,000 / 2 years, the third highest to 10,000 / 3 years, the fourth highest to 10,000 / 4 years, the fifth highest to 10,000 / 5 years and so on until the lowest loss is associated to 10,000 / 10 years. Since the return period is 2,000 = 10,000 / 5 to compute the MPL it is enough to take the fifth loss ordered in descending order:
>>> MPL_COM = [1500, 700, 600, 400, 350, 200, 123, 0, 0, 0][4] = 350
>>> MPL_RES = [1200, 800, 600, 500, 300, 250, 200, 150, 0, 0][4] = 300
Given this algorithm, it is clear why the MPL cannot be additive, i.e. MPL(COM + RES) != MPL(COM) + MPL(RES): doing the sums before or after the ordering of the losses is different. In this example by taking the fifth loss of the sorted sums
>>> sorted(losses_COM + losses_RES, reverse=True)
[2000, 1400, 1000, 800, 750, 600, 600, 600, 123, 0]
one gets MPL(COM + RES) = 750
which is different from
MPL(RES) + MPL(COM) = 350 + 300 = 650
.
The engine is able to compute aggregate loss curves correctly, i.e.
by doing the sums before the ordering phase. In order to perform
aggregations, you need to set the aggregate_by
parameter in the
job.ini
by specifying tags over which you wish to perform the
aggregation. Your exposure must contain the specified tags
for each asset. We have an example for Nepal in our event based risk
demo. The exposure for this demo contains various tags and in
particular a geographic tag called NAME_1 with values “Mid-Western”,
“Far-Western”, “West”, “East”, “Central”, and the job_eb.ini
file
defines
aggregate_by = NAME_1
When running the calculation you will see something like this:
Calculation 1 finished correctly in 17 seconds
id | name
9 | Aggregate Event Losses
1 | Aggregate Loss Curves
2 | Aggregate Loss Curves Statistics
3 | Aggregate Losses
4 | Aggregate Losses Statistics
5 | Average Asset Losses Statistics
11 | Earthquake Ruptures
6 | Events
7 | Full Report
8 | Input Files
10 | Realizations
12 | Total Loss Curves
13 | Total Loss Curves Statistics
14 | Total Losses
15 | Total Losses Statistics
Exporting the Aggregate Loss Curves Statistics output will give you the mean and quantile loss curves in a format like the following one:
annual_frequency_of_exceedence,return_period,loss_type,loss_value,loss_ratio
5.00000E-01,2,nonstructural,0.00000E+00,0.00000E+00
5.00000E-01,2,structural,0.00000E+00,0.00000E+00
2.00000E-01,5,nonstructural,0.00000E+00,0.00000E+00
2.00000E-01,5,structural,0.00000E+00,0.00000E+00
1.00000E-01,10,nonstructural,0.00000E+00,0.00000E+00
1.00000E-01,10,structural,0.00000E+00,0.00000E+00
5.00000E-02,20,nonstructural,0.00000E+00,0.00000E+00
5.00000E-02,20,structural,0.00000E+00,0.00000E+00
2.00000E-02,50,nonstructural,0.00000E+00,0.00000E+00
2.00000E-02,50,structural,0.00000E+00,0.00000E+00
1.00000E-02,100,nonstructural,0.00000E+00,0.00000E+00
1.00000E-02,100,structural,0.00000E+00,0.00000E+00
5.00000E-03,200,nonstructural,1.35279E+05,1.26664E-06
5.00000E-03,200,structural,2.36901E+05,9.02027E-03
2.00000E-03,500,nonstructural,1.74918E+06,1.63779E-05
2.00000E-03,500,structural,2.99670E+06,1.14103E-01
1.00000E-03,1000,nonstructural,6.92401E+06,6.48308E-05
1.00000E-03,1000,structural,1.15148E+07,4.38439E-01
If you do not set the aggregate_by
parameter
you will still be able to compute the total loss curve
(for the entire portfolio of assets), and the total average losses.