# PSHA with the OpenQuake engine#

This chapter describes the mathematical framework for PSHA implemented by the OpenQuake engine. More comprehensive descriptions of Probabilistic Seismic Hazard Analysis can be found for example in McGuire (2004) and USNRC (2012).

## Basic concepts#

Two main assumptions are at the base of all PSHA calculators included in the OpenQuake engine:

• seismicity in a region is described by a collection of independent seismic sources (i.e. the occurrence of an earthquake rupture in a source does not affect the probability of earthquake occurrence in the other sources)

• each source generates independent earthquake ruptures (i.e. the occurrence of an earth- quake rupture in a source does not affect the probability of occurrence of the other potential earthquake ruptures in the same source)

The Classical, Event-Based, and Disaggregation analysis requires the definition of two main components: the seismic source model, that is a collection of seismic sources describing the seismic activity in a region of interest, and the ground motion model, that is a mathematical relationship defining the probability distribution of a ground motion parameter at a site given the occurrence of an earthquake rupture.

The design of a seismic source model involves the specification of a number of sources whose main parameters are the geometry, constraining the earthquake rupture locations, and the magnitude-frequency distribution, defining the average annual occurrence rate over a magnitude range. A seismic source model (SSM) can be therefore defined as a set of I seismic sources (Src):

$SSM = \{Src_1, Src_2, ..., Src_I\}$

Chapter 3 provides a detailed description of the different source typologies supported by the OpenQuake engine. However, independently of the typology, in a PSHA each source undergoes a discretization process which effectively generates a number of distinct earthquake ruptures. A generic i-th source defines therefore a set of J earthquake ruptures:

$Src_i = \{Rup_{i1} ,Rup_{i2}, ..., Rup_{iJ}\}$

## Classical PSHA#

The classical PSHA analysis allows calculating the probabilities of exceeding, at least once in a given time span, and at a given site, a set of ground motion parameter levels considering all possible earthquake ruptures defined in a seismic source model. Such a list of probability values is usually referred to as hazard curve.

We indicate with $$P(X \geq x|T)$$ the probability that a ground-motion parameter X exceeds, at least once in a time span $$T$$ , a level $$x$$. $$P(X \geq x|T)$$ can be computed as 1 minus the probability that none of the sources is causing a ground motion exceedance. By assuming independent sources, the probability that none of the sources is causing an exceedance is equal to the product of the probabilities that each source does not cause an exceedance, that is:

$\begin{split}P(X \geq x|T) &= 1 - P_{src1}(X<x|T) \times P_{src2}(X<x|T) \times ... \times P_{srcI}(X<x|T)\\ &= 1 - \prod^{I}_{i=1}{P_{src_i}(X<x|T)}\end{split}$

where $$P_{src_i}(X<x|T)$$ is the probability that the i-th source is not causing an exceedance and $$I$$ is the total number of sources in the source model.

By further assuming each source generates independent earthquake ruptures, we can compute $$P_{src_i}(X<x|T)$$ as the product of the probabilities that each rupture does not cause an exceedance, that is:

$\begin{split}P_{src_i}(X<x|T) &= P_{rup_{i1}}(X<x|T) \times P_{rup_{i2}}(X<x|T) \times ... \times P_{rup_{ij}}(X<x|T)\\ &= \prod^{J_i}_{j=1}{P_{rup_{ij}}(X<x|T)}\end{split}$

where $$P_{rup_{ij}}(X<x|T)$$ is the probability that the j-th rupture in the i-th source is not causing an exceedance and $$J_i$$ is the total number of ruptures generated by the i-th source.

Intuitively, the fact that a rupture does not cause any exceedance in a given time span T can be due to the fact that the rupture does not occur at all or that the rupture occurs once but without causing an exceedance, or that the rupture occurs twice but both times without causing an exceedance, and so on. Given that all these events are mutually exclusive, by using the total probability theorem we can write:

$\begin{split}P_{rup_{ij}}(X<x|T) &= P_{rup_{ij}}(n=0|T)+P_{rup_{ij}}(n=1|T) \times P(X<x|rup_{ij}) + P_{rup_{ij}}(n=2|T) \times P(X<x|rup_{ij})^2 +...\\ &= \sum^{∞}_{k=0}{P_{rup_{ij}}(k|T) \times P(X<x|rup_{ij})^k}\end{split}$

where $$P_{rup_{ij}}(k|T)$$ is the probability that the j-th rupture in the i-th source is occurring $$k$$ times in time span $$T$$ and $$P(X<x|rup_{ij})$$ is the conditional probability that parameter $$X$$ is not exceeding level $$x$$ given an occurrence of $$rup_{ij}$$.

Relying therefore on the assumptions of independent sources and independent earthquake ruptures generated by each source, we can compute the probability of at least one ground motion exceedance as:

$\begin{split}P(X \geq x|T) &= 1-\prod^{I}_{i=1}\prod^{J_i}_{j=1}{P_{rup_{ij}}(X<x|T)}\\ &= 1-\prod^{I}_{i=1}\prod^{J_i}_{j=1}\sum^{∞}_{k=0}{P_{rup_{ij}}(k|T) \times P(X<x|rup_{ij})^k}\end{split}$

It is worth noticing how equation 2.6 allows the calculation of hazard curves for any temporal occurrence model. More precisely, only the probability of the number of rupture occurrences $$P_{rup_{ij}}(k|T)$$ depends on the temporal occurrence model, while the procedure for calculating the final probability of ground motion exceedance remains the same. This also means that is possible to define in the same source model sources with different temporal occurrence models (for instance time-dependent and time-independent sources).

### Poissonian source model#

If we now assume a source model to consist only of Poissonian sources, we can write $$P_{rup_{ij}}(k|T)$$ for every rupture as:

$P_{rup_{ij}}(k|T) = e^{-\nu_{ij}T}\frac{(\nu_{ij}T)^k}{k!}$

where $$nu_{ij}$$ is the average annual occurrence rate for the j-th rupture in the i-th source. We can then place equation above and thus write:

$\begin{split}P_{rup_{ij}}(X<x|T) &= \sum^{∞}_{k=0}{e^{-\nu_{ij}T}\frac{(\nu_{ij}T)^k}{k!}} \times P(X<x|rup_{ij})^k\\ &= e^{-\nu_{ij}T} \sum^{∞}_{k=0}{\frac{(\nu_{ij}T \times P(X<x|rup_{ij}))^k}{k!}}\end{split}$

Making use of the property:

$e^x = \sum^{∞}_{k=0}{\frac{x^k}{k!}}$

we can rewrite 2.8 as:

$\begin{split}P_{rup_{ij}}(X<x|T) &= e^{-\nu_{ij}T}e^{\nu_{ij}T \times P(X<x|rup_{ij})}\\ &= e^{-\nu_{ij}T \times (1-P(X<x|rup_{ij}))}\\ &= e^{-\nu_{ij}T \times P(X \geq x|rup_{ij})}\end{split}$

By now recognizing that, according to the Poissonian distribution, the probability of at least one occurrence (that is one or more) in a time span $$T$$ of $$rup_{ij}$$ is:

$P_{rup_{ij}}(n \geq 1|T) = 1-e^{-\nu_{ij}T}$

we can write equation 2.10 as:

$P_{rup_{ij}}(X<x|T) = (1-P_{rup_{ij}}(n \geq 1|T))^{P(X \geq x|rup_{ij})}$

By placing equation 2.12 in 2.6, we can write:

$P(X \geq x|T) = 1-\prod^{I}_{i=1}\prod^{J_i}_{j=1}(1-P_{rup_{ij}}(n \geq 1|T))^{P(X \geq x|rup_{ij})}$

Equation 2.13 is used by the OpenQuake engine for the calculation of hazard curves when performing Classical PSHA with a Poissonian source model. To our knowledge, this equation has been first proposed by Field et al. (2003), derived from the traditional rate-based formulation converted in terms of probabilities (their equation A8). Instead, we derive it from the assumptions of a source model consisting of independent sources, independent earthquake ruptures generated by each source, and ruptures obeying to a Poissonian temporal occurrence model.

### Equivalence with the rate-based equation#

It is worth noticing how equation 2.13 is equivalent to the more traditional rate-based hazard equation (McGuire, 1995). Indeed, by assuming ground motion occurrence to follow a Poissonian distribution in time, and indicating with n the mean annual rate of exceeding a ground motion level x, we can write:

$P(X \geq x|T) = 1-e^{-\nu T}$

We can also rewrite 2.13 as:

$\begin{split}P(X \geq x|T) &= 1-\prod^{I}_{i=1}\prod^{J_i}_{j=1}(1-P_{rup_{ij}}(n \geq 1|T))^{P(X \geq x|rup_{ij})}\\ &= 1-\prod^{I}_{i=1}\prod^{J_i}_{j=1}e^{-\nu_{ij}T \times P(X \geq x|rup_{ij})}\\ &= 1-e^{-\sum^{I}_{i=1}\sum^{J_i}_{i=1}\nu_{ij}T \times P(X \geq x|rup_{ij})}\end{split}$

The equivalence between equations and 2.15 is possible if and only if:

$\nu = \sum^{I}_{i=1}\sum^{J_i}_{i=1}\nu_{ij} \times P(X \geq x|rup_{ij})$

Assuming now, for the sake of simplicity, that a rupture is completely characterized by magnitude and distance from a site, we can write the rate of occurrence of the j-th rupture as:

$\nu_{ij} = \nu_{i} \times f_{i}(m,r)$

where $$nu_{i}$$ is the total occurrence rate for the i-th source, and $$f_i(m,r)$$ is the probability, for the i-th source, of generating a rupture of magnitude $$m$$ and distance $$r$$. By placing 2.17 in 2.16 and by replacing the discrete summation over ruptures with a continuous integral over magnitude and distance, we can write:

$\nu = \sum^{I}_{i=1} \nu_i \int\int f_i(m,r)P(X \geq x|m,r)dmdr$

which is the traditional equation for calculating ground motion exceedance rates (McGuire, 1995).

## Event-based PSHA#

The goal of an Event-based PSHA is to simulate seismicity in a region as described by a source model and to simulate ground shaking on a set of locations accordingly with a ground motion model. In both cases, simulation involves a Monte Carlo (i.e. random) sampling procedure.

Seismicity is simulated by generating a stochastic event set (also known as synthetic catalog) for a given time span $$T$$. For each rupture generated by a source, the number of occurrences in a time span $$T$$ is simulated by sampling the corresponding probability distribution as given by $$P_{rup}(k|T)$$. A stochastic event set is therefore a sample of the full population of ruptures as defined by a seismic source model. Each rupture is present zero, one or more times, depending on its probability. Symbolically, we can define a stochastic event set $$SES$$ as:

$SES(T) = \{k \times rup,\ k\ \sim\ P_{rup}(k|T)\ \forall\ rup\ in\ Src\ \forall\ Src\ in\ SSM\}$

where $$k$$, the number of occurrences, is a random sample of $$P_{rup}(k|T)$$, and $$k \times rup$$ means that rupture $$rup$$ is repeated $$k$$ times in the stochastic event set.

Given an earthquake rupture, the simulation of ground shaking values on a set of locations $$x=(x_1,x_2,.....,x_N)$$ forms a ground motion field. In a Event-based PSHA, for each rupture in a stochastic event set, the ground motion field is obtained by sampling the probability distribution defined by the ground motion model. As described in Chapter 4, the ground motion distribution at a site is assumed to be a Normal distribution. The aleatory variability is described in terms of an inter-event (also known as between-events) standard deviation ($$\tau$$) and intra-event (also known as within-event) standard deviation ($$\sigma$$). The simulation of a ground motion field is therefore the result of the summation of three terms, the logarithmic mean of the ground motion distribution:

$\mu = (\mu_1,\mu_2,...,\mu_N)$

the inter-event variability:

$\eta = (\eta,\eta,...,\eta),\ where\ \eta \sim N(0,\tau)$

and the intra-event variability:

$\epsilon = (\epsilon_1,\epsilon_2,...,\epsilon_N)\ \sim N(0,\Sigma)$

where:

$\begin{split}\Sigma = \left [ \begin{matrix} \sigma_1^2 & \sigma_1\sigma_2\rho_{12} & ... & \sigma_1\sigma_N\rho_{1N} \\ \sigma_2\sigma_1\rho_{21} & \sigma_2^2 & ... & \sigma_2\sigma_N\rho_{2N} \\ ... & ... & ... & ... \\ \sigma_N\sigma_1\rho_{N1} & \sigma_N\sigma_2\rho_{N2} & ... & \sigma_N^2 \end{matrix}\right]\end{split}$

It is worth noticing how the inter-event variability, uniform for all sites given an earthquake rupture, is drawn from an univariate normal distribution of mean 0 and standard deviation t, while the intra-event variability is a random sample of a multivariate normal distribution of mean $$0$$ and covariance matrix $$\Sigma$$. $$\Sigma$$ is a diagonal matrix when considering no correlation in the intra-event variability, while it has non-zero off-diagonal elements when considering a correlation model ($$\rho$$) for the intra-event aleatory variability.

### Rupture sampling: how does it work?#

In this section we explain how the sampling of ruptures in event based calculations works, at least for the case of Poissonian sources. As an example, consider the following point source:

>>> from openquake.hazardlib import nrml
>>> src = nrml.get('''\
... <pointSource id="1" name="Point Source"
...             tectonicRegion="Active Shallow Crust">
...    <pointGeometry>
...        <gml:Point><gml:pos>179.5 0</gml:pos></gml:Point>
...        <upperSeismoDepth>0</upperSeismoDepth>
...        <lowerSeismoDepth>10</lowerSeismoDepth>
...    </pointGeometry>
...    <magScaleRel>WC1994</magScaleRel>
...    <ruptAspectRatio>1.5</ruptAspectRatio>
...    <truncGutenbergRichterMFD aValue="3" bValue="1" minMag="5" maxMag="7"/>
...    <nodalPlaneDist>
...        <nodalPlane dip="30" probability="1" strike="45" rake="90" />
...    </nodalPlaneDist>
...    <hypoDepthDist>
...        <hypoDepth depth="4" probability="1"/>
...    </hypoDepthDist>
...</pointSource>''', investigation_time=1, width_of_mfd_bin=1.0)


The source here is particularly simple, with only one seismogenic depth and one nodal plane. It generates two ruptures, because with a width_of_mfd_bin of 1 there are only two magnitudes in the range from 5 to 7:

>>> [(mag1, rate1), (mag2, rate2)] = src.get_annual_occurrence_rates()
>>> mag1
5.5
>>> mag2
6.5


The occurrence rates are respectively 0.009 and 0.0009. So, if we set the number of stochastic event sets to 1,000,000:

>>> num_ses = 1_000_000


we would expect the first rupture (the one with magnitude 5.5) to occur around 9,000 times and the second rupture (the one with magnitude 6.5) to occur around 900 times. Clearly the exact numbers will depend on the stochastic seed; if we set:

>>> np.random.seed(42)


then we will have (for investigation_time = 1):

>>> np.random.poisson(rate1 * num_ses * 1)
8966
>>> np.random.poisson(rate2 * num_ses * 1)
921


These are the number of occurrences of each rupture in the effective investigation time, i.e. the investigation time multiplied by the number of stochastic event sets and the number of realizations (here we assumed 1 realization).

The total number of events generated by the source will be number_of_events = sum(n_occ for each rupture) i.e. 8,966 + 921 = 9,887, with ~91% of the events associated to the first rupture and ~9% of the events associated to the second rupture.

Since the details of the seed algorithm can change with updates to the the engine, if you run an event based calculation with the same parameters with different versions of the engine, you may not get exactly the same number of events, but something close given a reasonably long effective investigation time. After running the calculation, inside the datastore, in the ruptures dataset you will find the two ruptures, their occurrence rates and their integer number of occurrences (n_occ). If the effective investigation time is large enough then the relation n_occ ~ occurrence_rate * eff_investigation_time will hold. If the effective investigation time is not large enough, or the occurrence rate is extremely small, then you should expect to see larger differences between the expected number of occurrences and n_occ, as well as a strong seed dependency.

It is important to notice than in order to determine the effective investigation time, the engine takes into account also the ground motion logic tree and the correct formula to use is eff_investigation_time = investigation_time * num_ses * num_rlzs where num_rlzs is the number of realizations in the ground motion logic tree.

Just to be concrete, if you run a calculation with the same parameters as described before, but with two GMPEs instead of one (and number_of_logic_tree_samples = 0), then the total number of paths admitted by the logic tree will be 2 and you should expect to get about twice the number of occurrences for each rupture. Users wanting to know the nitty-gritty details should look at the code, inside hazardlib/source/base.py, in the method src.sample_ruptures(eff_num_ses, ses_seed).

### Calculation of hazard curves from ground motion fields#

The ground motion fields simulated for each rupture in a stochastic event set can be used to compute hazard curves. Indeed, indicating with $$T_0$$ the duration associated with a stochastic event set, and with $$K$$ the number of ground motion fields (and associated ruptures) simulated in time $$T_0$$, we can compute the rate of exceedance of a ground motion level $$x$$ at a site as (Ebel and Kafka, 1999):

$\nu = \frac{\sum^{K}_{k=1}H(x_k-x)}{T_0}$

where $$H$$ is the Heaviside function and $$x_k$$ is the ground motion parameter value at the considered site associated with the k-th ground motion field. The exceedance rate obtained from equation 2.24 can then be used to compute the probability of at least one occurrence in any time span $$T$$ , accordingly with the Poissonian distribution, using equation 2.14. This approach is possible when the source model from which the stochastic event set is generated is Poissonian.

As the stochastic event set duration $$T_0$$ increases, equation 2.24 provides an increasingly more accurate estimate of the actual rate of exceedance. A larger $$T_0$$ can be achieved not only by simulating a single stochastic event set with longer duration, but also by simulating multiple stochastic event sets. These can then be joined together to form a stochastic event which has a large enough duration to provide a stable estimate of the rates of ground motion exceedance.

## Disaggregation#

The disaggregation analysis allows investigating how the different earthquake ruptures defined in a source model contribute to the probability of exceeding a certain ground motion level $$x$$ at a given site. Given the very large number of earthquake ruptures associated with a source model, contributions cannot be investigated on a rupture by rupture basis but a classification scheme is used instead. Ruptures are classified in terms of the following parameters:

• magnitude ($$M$$)

• distance to rupture surface-projection (Joyner-Boore distance) ($$r_{jb}$$)

• longitude and latitude of rupture surface-projection closest point ($$\lambda$$ , $$\phi$$)

• tectonic region type ($$TRT$$)

For each earthquake rupture, the associated conditional probability of ground motion exceedance – $$P(X \geq x|T,rup)$$ – is computed for different $$\epsilon$$ bins, where $$\epsilon$$ is the difference, in terms of number of total standard deviations, between $$x$$ and the mean ground motion $$\mu$$ as predicted by the ground motion model, that is:

$\epsilon = \frac{x-\mu}{\sigma_{total}}$

The disaggregation in terms of e allows investigating how the different regions of the ground motion distributions contribute to the probability of exceedance.

The rupture parameters ($$M$$, $$r_{jb}$$, $$\lambda$$, $$\phi$$ , $$TRT$$) together with the $$\epsilon$$ parameter effectively create a 6-dimensional model space which, discretized into a number of bins, is used to classify the probability of exceedance for different combination of rupture parameters and $$\epsilon$$ values.

For a given model space bin $$m$$ = ($$M$$, $$r_{jb}$$, $$\lambda$$, $$\phi$$ , $$TRT$$) the probability of exceeding level $$x$$ at least once in a time span $$T$$ is computed using equation 2.6, that is:

$\begin{split}P(X>x|T,m) = 1- \prod^{I}_{i=1}\prod^{J_i}_{j=1} \begin{cases} P_{rup_{ij}}(X<x|T)\ & \text {if } rup_{ij}\ \in\ m \\ 1\ & \text {if } rup_{ij}\ \notin\ m \end{cases}\end{split}$

In other words, if a rupture belongs to the considered bin, then the probability of not causing a ground motion exceedance is computed according to equation 2.6, otherwise the probability is 1 (that is, given that the rupture does not belong to the bin it can never cause a ground motion exceedance).

### Disaggregation histograms#

Disaggregation values as given by equation 2.26 can be aggregated in order to investigate earthquake rupture contributions over a reduced model space. The following disaggregation histograms are provided by the OpenQuake engine.

Magnitude disaggregation:

$P(X>x|T,M) = 1 - \prod_{r_{jb}}\prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m))$

Distance disaggregation:

$P(X>x|T,r_{jb}) = 1 - \prod_{M}\prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m))$

Tectonic region type disaggregation:

$P(X>x|T,TRT) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{\lambda}\prod_{\phi}\prod_{\epsilon}(1-P(X>x|T,m))$

Magnitude-Distance disaggregation:

$P(X>x|T,M,r_{jb}) = 1 - \prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m))$

Magnitude-Distance-Epsilon disaggregation:

$P(X>x|T,M,r_{jb},\epsilon) = 1 - \prod_{\lambda}\prod_{\phi}\prod_{TRT}(1-P(X>x|T,m))$

Longitude-Latitude disaggregation:

$P(X>x|T,\lambda,\phi) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m))$

Longitude-Latitude-Magnitude disaggregation:

$P(X>x|T,\lambda,\phi,M) = 1 - \prod_{r_{jb}}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m))$

Longitude-Latitude-Tectonic Region Type disaggregation:

$P(X>x|T,\lambda,\phi,TRT) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{\epsilon}(1-P(X>x|T,m))$

All the above equations are based on the assumption that earthquake ruptures in different bins are independent, therefore probabilities can be aggregated by using the multiplication rule for independent events. The probability of a ground motion exceedance over a reduced model space is computed simply as 1 minus the probabilty of non-exceedance over the remaining model space dimensions.

### Comparison between OpenQuake engine disaggregation and traditional disaggregation#

The traditional disaggregation analysis as commonly known in literature (e.g. Bazzurro and Cornell, 1999) differs from the one provided by the OpenQuake engine. Indeed, a disaggregation analysis tipically provides the conditional probability of observing an earthquake scenario of given properties (magnitude, distance, epsilon, …) given that a ground motion exceedance is occurred, which can be written (following the same notation used in this chapter) as:

$P(m|X>x)$

On the contrary, the OpenQuake engine (as described in equation 2.26) provides the conditional probability of observing at least one ground motion exceedance in a time span $$T$$ given the occurrence of earthquake ruptures of given properties $$m$$, that is:

$P(X>x|T,m)$

The probabilities given in equations 2.35 and 2.36 are clearly different. Indeed, for different $$m$$, values given by equation 2.35 must sum up to 1, while this is not the case for equation 2.36. For the former equation different $$m$$ represent mutually exclusive events, while for the latter they represent independent events.

When considering a Poissonian source model it is possible however to derive equation 2.35 from equation 2.36. Indeed, indicating with $$\nu_m$$ the rate of ground motion exceedance $$(X>x)$$ associated with earthquake ruptures of properties $$m$$ and with $$\nu$$ the rate of ground motion exceedance associated with all earthquake ruptures, we can write equation 2.35 as:

$P(m|X>x)=\frac{\nu_m}{\nu}$

By solving the Poissonian equation 2.14 for the rate of exceedance, we can write $$\nu_m$$ as:

$\nu_m=-\frac{\ln(1-P(X>x|T,m))}{T}$

$$\nu$$ can be obtained using the same equation above but considering $$P(X>x|T)$$ instead of $$P(X>x|T,m)$$, where $$P(X>x|T)$$ is obtained by aggregating, using the multiplicative rule, the probabilities over the different $$m$$, that is:

$P(X>x|T)=1-\prod_{m}(1-P(X>x|T,m))$

By computing $$\nu_m$$ and $$\nu$$ from $$P(X>x|T,m)$$ it is hence possible to obtain the more traditional disaggregation results as given in equation 2.35.