PSHA with the OpenQuake engine ============================== This chapter describes the mathematical framework for PSHA implemented by the OpenQuake engine. More comprehensive descriptions of Probabilistic Seismic Hazard Analysis can be found for example in McGuire (2004) and USNRC (2012). Basic concepts -------------- Two main assumptions are at the base of all PSHA calculators included in the OpenQuake engine: - seismicity in a region is described by a collection of *independent seismic sources* (i.e. the occurrence of an earthquake rupture in a source does not affect the probability of earthquake occurrence in the other sources) - each source generates *independent earthquake ruptures* (i.e. the occurrence of an earth- quake rupture in a source does not affect the probability of occurrence of the other potential earthquake ruptures in the same source) The Classical, Event-Based, and Disaggregation analysis requires the definition of two main components: the *seismic source model*, that is a collection of seismic sources describing the seismic activity in a region of interest, and the *ground motion model*, that is a mathematical relationship defining the probability distribution of a ground motion parameter at a site given the occurrence of an earthquake rupture. The design of a seismic source model involves the specification of a number of sources whose main parameters are the geometry, constraining the earthquake rupture locations, and the *magnitude-frequency distribution*, defining the average annual occurrence rate over a magnitude range. A seismic source model (*SSM*) can be therefore defined as a set of *I* seismic sources (*Src*): .. math:: SSM = \{Src_1, Src_2, ..., Src_I\} Chapter 3 provides a detailed description of the different source typologies supported by the OpenQuake engine. However, independently of the typology, in a PSHA each source undergoes a discretization process which effectively generates a number of distinct earthquake ruptures. A generic *i*-th source defines therefore a set of *J* earthquake ruptures: .. math:: Src_i = \{Rup_{i1} ,Rup_{i2}, ..., Rup_{iJ}\} Classical PSHA -------------- The classical PSHA analysis allows calculating the probabilities of exceeding, at least once in a given time span, and at a given site, a set of ground motion parameter levels considering all possible earthquake ruptures defined in a seismic source model. Such a list of probability values is usually referred to as *hazard curve*. We indicate with :math:`P(X \geq x|T)` the probability that a ground-motion parameter *X* exceeds, at least once in a time span :math:`T` , a level :math:`x`. :math:`P(X \geq x|T)` can be computed as 1 minus the probability that none of the sources is causing a ground motion exceedance. By assuming *independent sources*, the probability that none of the sources is causing an exceedance is equal to the product of the probabilities that each source does not cause an exceedance, that is: .. math:: P(X \geq x|T) &= 1 - P_{src1}(X>> from openquake.hazardlib import nrml >>> src = nrml.get('''\ ... ... ... 179.5 0 ... 0 ... 10 ... ... WC1994 ... 1.5 ... ... ... ... ... ... ... ...''', investigation_time=1, width_of_mfd_bin=1.0) The source here is particularly simple, with only one seismogenic depth and one nodal plane. It generates two ruptures, because with a ``width_of_mfd_bin`` of 1 there are only two magnitudes in the range from 5 to 7:: >>> [(mag1, rate1), (mag2, rate2)] = src.get_annual_occurrence_rates() >>> mag1 5.5 >>> mag2 6.5 The occurrence rates are respectively 0.009 and 0.0009. So, if we set the number of stochastic event sets to 1,000,000:: >>> num_ses = 1_000_000 we would expect the first rupture (the one with magnitude 5.5) to occur around 9,000 times and the second rupture (the one with magnitude 6.5) to occur around 900 times. Clearly the exact numbers will depend on the stochastic seed; if we set:: >>> np.random.seed(42) then we will have (for ``investigation_time = 1``):: >>> np.random.poisson(rate1 * num_ses * 1) 8966 >>> np.random.poisson(rate2 * num_ses * 1) 921 These are the number of occurrences of each rupture in the effective investigation time, i.e. the investigation time multiplied by the number of stochastic event sets and the number of realizations (here we assumed 1 realization). The total number of events generated by the source will be ``number_of_events = sum(n_occ for each rupture)`` i.e. 8,966 + 921 = 9,887, with ~91% of the events associated to the first rupture and ~9% of the events associated to the second rupture. Since the details of the seed algorithm can change with updates to the the engine, if you run an event based calculation with the same parameters with different versions of the engine, you may not get exactly the same number of events, but something close given a reasonably long effective investigation time. After running the calculation, inside the datastore, in the ``ruptures`` dataset you will find the two ruptures, their occurrence rates and their integer number of occurrences (``n_occ``). If the effective investigation time is large enough then the relation ``n_occ ~ occurrence_rate * eff_investigation_time`` will hold. If the effective investigation time is not large enough, or the occurrence rate is extremely small, then you should expect to see larger differences between the expected number of occurrences and ``n_occ``, as well as a strong seed dependency. It is important to notice than in order to determine the effective investigation time, the engine takes into account also the ground motion logic tree and the correct formula to use is ``eff_investigation_time = investigation_time * num_ses * num_rlzs`` where ``num_rlzs`` is the number of realizations in the ground motion logic tree. Just to be concrete, if you run a calculation with the same parameters as described before, but with two GMPEs instead of one (and ``number_of_logic_tree_samples = 0``), then the total number of paths admitted by the logic tree will be 2 and you should expect to get about twice the number of occurrences for each rupture. Users wanting to know the nitty-gritty details should look at the code, inside hazardlib/source/base.py, in the method ``src.sample_ruptures(eff_num_ses, ses_seed)``. Calculation of hazard curves from ground motion fields ****************************************************** The ground motion fields simulated for each rupture in a stochastic event set can be used to compute hazard curves. Indeed, indicating with :math:`T_0` the duration associated with a stochastic event set, and with :math:`K` the number of ground motion fields (and associated ruptures) simulated in time :math:`T_0`, we can compute the rate of exceedance of a ground motion level :math:`x` at a site as (Ebel and Kafka, 1999): .. math:: \nu = \frac{\sum^{K}_{k=1}H(x_k-x)}{T_0} where :math:`H` is the Heaviside function and :math:`x_k` is the ground motion parameter value at the considered site associated with the *k*-th ground motion field. The exceedance rate obtained from equation 2.24 can then be used to compute the probability of at least one occurrence in any time span :math:`T` , accordingly with the Poissonian distribution, using equation 2.14. This approach is possible when the source model from which the stochastic event set is generated is Poissonian. As the stochastic event set duration :math:`T_0` increases, equation 2.24 provides an increasingly more accurate estimate of the actual rate of exceedance. A larger :math:`T_0` can be achieved not only by simulating a single stochastic event set with longer duration, but also by simulating multiple stochastic event sets. These can then be joined together to form a stochastic event which has a large enough duration to provide a stable estimate of the rates of ground motion exceedance. Disaggregation -------------- The disaggregation analysis allows investigating how the different earthquake ruptures defined in a source model contribute to the probability of exceeding a certain ground motion level :math:`x` at a given site. Given the very large number of earthquake ruptures associated with a source model, contributions cannot be investigated on a rupture by rupture basis but a classification scheme is used instead. Ruptures are classified in terms of the following parameters: - magnitude (:math:`M`) - distance to rupture surface-projection (Joyner-Boore distance) (:math:`r_{jb}`) - longitude and latitude of rupture surface-projection closest point (:math:`\lambda` , :math:`\phi`) - tectonic region type (:math:`TRT`) For each earthquake rupture, the associated conditional probability of ground motion exceedance – :math:`P(X \geq x|T,rup)` – is computed for different :math:`\epsilon` bins, where :math:`\epsilon` is the difference, in terms of number of total standard deviations, between :math:`x` and the mean ground motion :math:`\mu` as predicted by the ground motion model, that is: .. math:: \epsilon = \frac{x-\mu}{\sigma_{total}} The disaggregation in terms of *e* allows investigating how the different regions of the ground motion distributions contribute to the probability of exceedance. The rupture parameters (:math:`M`, :math:`r_{jb}`, :math:`\lambda`, :math:`\phi` , :math:`TRT`) together with the :math:`\epsilon` parameter effectively create a 6-dimensional model space which, discretized into a number of bins, is used to classify the probability of exceedance for different combination of rupture parameters and :math:`\epsilon` values. For a given model space bin :math:`m` = (:math:`M`, :math:`r_{jb}`, :math:`\lambda`, :math:`\phi` , :math:`TRT`) the probability of exceeding level :math:`x` at least once in a time span :math:`T` is computed using equation 2.6, that is: .. math:: P(X>x|T,m) = 1- \prod^{I}_{i=1}\prod^{J_i}_{j=1} \begin{cases} P_{rup_{ij}}(Xx|T,M) = 1 - \prod_{r_{jb}}\prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m)) Distance disaggregation: .. math:: P(X>x|T,r_{jb}) = 1 - \prod_{M}\prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m)) Tectonic region type disaggregation: .. math:: P(X>x|T,TRT) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{\lambda}\prod_{\phi}\prod_{\epsilon}(1-P(X>x|T,m)) Magnitude-Distance disaggregation: .. math:: P(X>x|T,M,r_{jb}) = 1 - \prod_{\lambda}\prod_{\phi}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m)) Magnitude-Distance-Epsilon disaggregation: .. math:: P(X>x|T,M,r_{jb},\epsilon) = 1 - \prod_{\lambda}\prod_{\phi}\prod_{TRT}(1-P(X>x|T,m)) Longitude-Latitude disaggregation: .. math:: P(X>x|T,\lambda,\phi) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m)) Longitude-Latitude-Magnitude disaggregation: .. math:: P(X>x|T,\lambda,\phi,M) = 1 - \prod_{r_{jb}}\prod_{TRT}\prod_{\epsilon}(1-P(X>x|T,m)) Longitude-Latitude-Tectonic Region Type disaggregation: .. math:: P(X>x|T,\lambda,\phi,TRT) = 1 - \prod_{M}\prod_{r_{jb}}\prod_{\epsilon}(1-P(X>x|T,m)) All the above equations are based on the assumption that earthquake ruptures in different bins are independent, therefore probabilities can be aggregated by using the multiplication rule for independent events. The probability of a ground motion exceedance over a reduced model space is computed simply as 1 minus the probabilty of non-exceedance over the remaining model space dimensions. Comparison between OpenQuake engine disaggregation and traditional disaggregation ********************************************************************************* The traditional disaggregation analysis as commonly known in literature (e.g. Bazzurro and Cornell, 1999) differs from the one provided by the OpenQuake engine. Indeed, a disaggregation analysis tipically provides the conditional probability of observing an earthquake scenario of given properties (magnitude, distance, epsilon, ...) given that a ground motion exceedance is occurred, which can be written (following the same notation used in this chapter) as: .. math:: P(m|X>x) On the contrary, the OpenQuake engine (as described in equation 2.26) provides the conditional probability of observing at least one ground motion exceedance in a time span :math:`T` given the occurrence of earthquake ruptures of given properties :math:`m`, that is: .. math:: P(X>x|T,m) The probabilities given in equations 2.35 and 2.36 are clearly different. Indeed, for different :math:`m`, values given by equation 2.35 must sum up to 1, while this is not the case for equation 2.36. For the former equation different :math:`m` represent mutually exclusive events, while for the latter they represent independent events. When considering a Poissonian source model it is possible however to derive equation 2.35 from equation 2.36. Indeed, indicating with :math:`\nu_m` the rate of ground motion exceedance :math:`(X>x)` associated with earthquake ruptures of properties :math:`m` and with :math:`\nu` the rate of ground motion exceedance associated with all earthquake ruptures, we can write equation 2.35 as: .. math:: P(m|X>x)=\frac{\nu_m}{\nu} By solving the Poissonian equation 2.14 for the rate of exceedance, we can write :math:`\nu_m` as: .. math:: \nu_m=-\frac{\ln(1-P(X>x|T,m))}{T} :math:`\nu` can be obtained using the same equation above but considering :math:`P(X>x|T)` instead of :math:`P(X>x|T,m)`, where :math:`P(X>x|T)` is obtained by aggregating, using the multiplicative rule, the probabilities over the different :math:`m`, that is: .. math:: P(X>x|T)=1-\prod_{m}(1-P(X>x|T,m)) By computing :math:`\nu_m` and :math:`\nu` from :math:`P(X>x|T,m)` it is hence possible to obtain the more traditional disaggregation results as given in equation 2.35.