Prospective Evaluation of Multiplicative Hybrid Earthquake Forecast Models for California

Author(s):  
Jose A. Bayona ◽  
William Savran ◽  
Maximilian Werner ◽  
David A. Rhoades

<p>Developing testable seismicity models is essential for robust seismic hazard assessments and to quantify the predictive skills of posited hypotheses about seismogenesis. On this premise, the Regional Earthquake Likelihood Models (RELM) group designed a joint forecasting experiment, with associated models, data and tests to evaluate earthquake predictability in California over a five-year period. Participating RELM forecast models were based on a range of geophysical datasets, including earthquake catalogs, interseismic strain rates, and geologic fault slip rates. After five years of prospective evaluation, the RELM experiment found that the smoothed seismicity (HKJ) model by Helmstetter et al. (2007) was the most informative. The diversity of competing forecast hypotheses in RELM was suitable for combining multiple models that could provide more informative earthquake forecasts than HKJ. Thus, Rhoades et al. (2014) created multiplicative hybrid models that involve the HKJ model as a baseline and one or more conjugate models. Particularly, the authors fitted two parameters for each conjugate model and an overall normalizing constant to optimize each hybrid model. Then, information gain scores per earthquake were computed using a corrected Akaike Information Criterion that penalized for the number of fitted parameters. According to retrospective analyses, some hybrid models showed significant information gains over the HKJ forecast, despite the penalty. Here, we assess in a prospective setting the predictive skills of 16 hybrids and 6 original RELM forecasts, using a suite of tests of the Collaboratory for the Study of Earthquake Predicitability (CSEP). The evaluation dataset contains 40 M≥4.95 events recorded within the California CSEP-testing region from 1 January 2011 to 31 December 2020, including the 2016 Mw 5.6, 5.6, and 5.5 Hawthorne earthquake swarm, and the Mw 6.4 foreshock and Mw 7.1 mainshock from the 2019 Ridgecrest sequence. We evaluate the consistency between the observed and the expected number, spatial, likelihood and magnitude distributions of earthquakes, and compare the performance of each forecast to that of HKJ. Our prospective test results show that none of the hybrid models are significantly more informative than the HKJ baseline forecast. These results are mainly due to the occurrence of the 2016 Hawthorne earthquake cluster, and four events from the 2019 Ridgecrest sequence in two forecast bins. These clusters of seismicity are exceptionally unlikely in all models, and insufficiently captured by the Poisson distribution that the likelihood functions of tests assume. Therefore, we are currently examining alternative likelihood functions that reduce the sensitivity of the evaluations to clustering, and that could be used to better understand whether the discrepancies between prospective and retrospective test results for multiplicative hybrid forecasts are due to limitations of the tests or the methods used to create the hybrid models. </p>

2012 ◽  
Vol 2 (1) ◽  
pp. 2 ◽  
Author(s):  
Christine Smyth ◽  
Masumi Yamada ◽  
Jim Mori

The Collaboratory for the Study of Earthquake Predictability (CSEP) is a global project aimed at testing earthquake forecast models in a fair environment. Various metrics are currently used to evaluate the submitted forecasts. However, the CSEP still lacks easily understandable metrics with which to rank the universal performance of the forecast models. In this research, we modify a well-known and respected metric from another statistical field, bioinformatics, to make it suitable for evaluating earthquake forecasts, such as those submitted to the CSEP initiative. The metric, originally called a <em>gene-set enrichment score</em>, is based on a Kolmogorov-Smirnov statistic. Our modified metric assesses if, over a certain time period, the forecast values at locations where earthquakes have occurred are significantly increased compared to the values for all locations where earthquakes did not occur. Permutation testing allows for a significance value to be placed upon the score. Unlike the metrics currently employed by the CSEP, the score places no assumption on the distribution of earthquake occurrence nor requires an arbitrary reference forecast. In this research, we apply the modified metric to simulated data and real forecast data to show it is a powerful and robust technique, capable of ranking competing earthquake forecasts.


2017 ◽  
Vol 59 (6) ◽  
Author(s):  
Matteo Taroni ◽  
Warner Marzocchi ◽  
Pamela Roselli

<p>The quantitative assessment of the performance of earthquake prediction and/or forecast models is essential for evaluating their applicability for risk reduction purposes. Here we assess the earthquake prediction performance of the CN model applied to the Italian territory. This model has been widely publicized in Italian news media, but a careful assessment of its prediction performance is still lacking. In this paper we evaluate the results obtained so far from the CN algorithm applied to the Italian territory, by adopting widely used testing procedures and under development in the Collaboratory for the Study of Earthquake Predictability (CSEP) network. Our results show that the CN prediction performance is comparable to the prediction performance of the stationary Poisson model, that is, CN predictions do not add more to what may be expected from random chance.</p>


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 258
Author(s):  
Zhihang Xu ◽  
Qifeng Liao

Optimal experimental design (OED) is of great significance in efficient Bayesian inversion. A popular choice of OED methods is based on maximizing the expected information gain (EIG), where expensive likelihood functions are typically involved. To reduce the computational cost, in this work, a novel double-loop Bayesian Monte Carlo (DLBMC) method is developed to efficiently compute the EIG, and a Bayesian optimization (BO) strategy is proposed to obtain its maximizer only using a small number of samples. For Bayesian Monte Carlo posed on uniform and normal distributions, our analysis provides explicit expressions for the mean estimates and the bounds of their variances. The accuracy and the efficiency of our DLBMC and BO based optimal design are validated and demonstrated with numerical experiments.


1996 ◽  
Vol 43 (2) ◽  
pp. 49-53
Author(s):  
Loren Laine ◽  
David Chun ◽  
Craig Stein ◽  
Ihab El-Beblawi ◽  
Vishvinder Sharma ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhanshan (Sam) Ma

Using 2,733 longitudinal vaginal microbiome samples (representing local microbial communities) from 79 individuals (representing meta-communities) in the states of healthy, BV (bacterial vaginosis) and pregnancy, we assess and interpret the relative importance of stochastic forces (e.g., stochastic drifts in bacteria demography, and stochastic dispersal) vs. deterministic selection (e.g., host genome, and host physiology) in shaping the dynamics of human vaginal microbiome (HVM) diversity by an integrated analysis with multi-site neutral (MSN) and niche-neutral hybrid (NNH) modeling. It was found that, when the traditional “default” P-value = 0.05 was specified, the neutral drifts were predominant (≥50% metacommunities indistinguishable from the MSN prediction), while the niche differentiations were moderate (&lt;20% from the NNH prediction). The study also analyzed two challenging uncertainties in testing the neutral and/or niche-neutral hybrid models, i.e., lack of full model specificity – non-unique fittings of same datasets to multiple models with potentially different mechanistic assumptions – and lack of definite rules for setting the P-value thresholds (also noted as Pt-value when referring to the threshold of P-value in this article) in testing null hypothesis (model). Indeed, the two uncertainties can be interdependent, which further complicates the statistical inferences. To deal with the uncertainties, the MSN/NNH test results under a series of P-values ranged from 0.05 to 0.95 were presented. Furthermore, the influence of P-value threshold-setting on the model specificity, and the effects of woman’s health status on the neutrality level of HVM were examined. It was found that with the increase of P-value threshold from 0.05 to 0.95, the overlap (non-unique) fitting of MSN and NNH decreased from 29.1 to 1.3%, whereas the specificity (uniquely fitted to data) of MSN model was kept between 55.7 and 82.3%. Also with the rising P-value threshold, the difference between healthy and BV groups become significant. These findings suggested that traditional single P-value threshold (such as the de facto standard P-value = 0.05) might be insufficient for testing the neutral and/or niche neutral hybrid models.


Sign in / Sign up

Export Citation Format

Share Document