Ranking earthquake forecasts: On the use of proper scoring rules to discriminate forecasts

Author(s):  
Francesco Serafini ◽  
Mark Naylor ◽  
Finn Lindgren ◽  
Maximilian Werner

<p>Recent years have seen a growth in the diversity of probabilistic earthquake forecasts as well as the advent of them being applied operationally. The growth of their use demands a deeper look at our ability to rank their performance within a transparent and unified framework. Programs such as the Collaboratory Study for Earthquake Predictability (CSEP)  have been at the forefront of this effort. Scores are quantitative measures of how well a dataset can be explained by a candidate forecast and allow forecasts to be ranked. A positively oriented score is said to be proper when, on average, the highest score is achieved by the closest model to the data generating one. Different meanings of closest lead to different proper scoring rules. Here, we prove that the Parimutuel Gambling score, used to evaluate the results of the 2009 Italy CSEP experiment, is generally not proper, and even for the special case where it is proper, it can still be used improperly. We show in detail the possible consequences of using this score for forecast evaluation. Moreover, we show that other well-established scores can be applied to existing studies to calculate new rankings with no requirement for extra information. We extend the analysis to show how much data are required, in principle, to distinguish candidate forecasts and therefore how likely it is to express a preference towards a forecast. This introduces the possibility of survey design with regard to the duration and spatial discretisation of earthquake forecasts. Our findings may contribute to more rigorous statements about the ability to distinguish between the predictive skills of candidate forecasts in addition to simple rankings.</p>

Author(s):  
Bruno de Finetti ◽  
Maria Carla Galavotti ◽  
Hykel Hosni ◽  
Alberto Mura

2018 ◽  
Vol 6 (3-4) ◽  
pp. 343-376 ◽  
Author(s):  
Arthur Carvalho ◽  
Stanko Dimitrov ◽  
Kate Larson

2015 ◽  
Vol 10 (2) ◽  
pp. 479-499 ◽  
Author(s):  
A. Philip Dawid ◽  
Monica Musio

2009 ◽  
Vol 76 (4) ◽  
pp. 1461-1489 ◽  
Author(s):  
THEO OFFERMAN ◽  
JOEP SONNEMANS ◽  
GIJS VAN DE KUILEN ◽  
PETER P. WAKKER

2013 ◽  
Vol 31 (4_suppl) ◽  
pp. 427-427 ◽  
Author(s):  
Ulrich Robert Mansmann ◽  
Ute Sartorius ◽  
Ruediger Paul Laubender ◽  
Clemens Albrecht Giessen ◽  
Regina Esser ◽  
...  

427 Background: The extent of tumor shrinkage in patients (pts) receiving chemotherapy +/- monoclonal antibodies has prognostic value for PFS and OS. "Deepness of response (DpR)" is a new efficacy outcome measure that could explain the impact of tumor shrinkage on long-term survival outcome. If shrinkage takes place DpR is the percentage of tumor shrinkage observed at the nadir compared to baseline. DpR is 0 for no change and negative if the tumor load increases. Longest diameter (LD) based on RECIST or a calculated tumor volume (ASCO GI 2012 #635) can quantify the tumor load at distinct time points. A joint model was presented (ASCO GI 2012 #580, ASCO 2012 #3603) which allows us to relate DpR to individual post-progression survival (PPS) time. Methods: Based on the data from 2 randomized trials (CRYSTAL, n=1198; OPUS, n=337), 4 treatment regimens (FOLFIRI +/- cetuximab and FOLFOX4 +/- cetuximab) were studied. A joint model was used to quantify individual changes in tumor size over time and to relate these changes to PFS and OS. Relationships between baseline tumor load and DpR and PPS were studied. Proper scoring rules were used to assess whether the LD-based or the volume-based approach allowed a better prediction of individual prognosis. Results: Results are reported for the CRYSTAL study using LD-based measures for 663 pts with KRAS wild-type tumors and imaging data. The 348 pts treated with FOLFIRI alone had a mean DpR of 35.52% (Interquartile range [IR]:12.09%, 59.86%), minimum DpR -80%. The 315 pts treated with FOLFIRI + cetuximab had a mean DpR of 50.07% (IR: 22.87%, 79.55%) and a minimum DpR of -49%. The DpR was significantly different between the 2 groups (p<0.00001). Individual DpR is a significant prognostic factor for PPS time in both the LD-based (p=0.0023) and volume-based (p=0.0003) models. Proper scoring rules provided evidence of a more precise estimation of individual PPS time based on volume algorithm-measured DpR. Results of the OPUS study will be presented. Conclusions: Our results emphasize the value of the variable DpR as a new efficacy outcome measure for clinical trials. The tumor-shrinking capacity of cetuximab was shown to be associated with its ability to prolong PPS.


METRON ◽  
2014 ◽  
Vol 72 (2) ◽  
pp. 169-183 ◽  
Author(s):  
Alexander Philip Dawid ◽  
Monica Musio

Sign in / Sign up

Export Citation Format

Share Document