Evaluating epidemic forecasts in an interval format

For practical reasons, many forecasts of case, hospitalization, and death counts in the context of the current Coronavirus Disease 2019 (COVID-19) pandemic are issued in the form of central predictive intervals at various levels. This is also the case for the forecasts collected in the COVID-19 Forecast Hub (https://covid19forecasthub.org/). Forecast evaluation metrics like the logarithmic score, which has been applied in several infectious disease forecasting challenges, are then not available as they require full predictive distributions. This article provides an overview of how established methods for the evaluation of quantile and interval forecasts can be applied to epidemic forecasts in this format. Specifically, we discuss the computation and interpretation of the weighted interval score, which is a proper score that approximates the continuous ranked probability score. It can be interpreted as a generalization of the absolute error to probabilistic forecasts and allows for a decomposition into a measure of sharpness and penalties for over- and underprediction.

Download Full-text

Online Aggregation of Probabilistic Forecasts Based on the Continuous Ranked Probability Score

Journal of Communications Technology and Electronics ◽

10.1134/s1064226920060285 ◽

2020 ◽

Vol 65 (6) ◽

pp. 662-676

Author(s):

V. V. V’yugin ◽

V. G. Trunov

Keyword(s):

Probability Score ◽

Continuous Ranked Probability Score ◽

Online Aggregation ◽

Probabilistic Forecasts

Download Full-text

Generalization of the Ignorance Score: Continuous Ranked Version and Its Decomposition

Monthly Weather Review ◽

10.1175/mwr-d-11-00266.1 ◽

2012 ◽

Vol 140 (6) ◽

pp. 2005-2017 ◽

Cited By ~ 15

Author(s):

Julian Tödter ◽

Bodo Ahrens

Keyword(s):

Information Theory ◽

Order Approximation ◽

Second Order ◽

Brier Score ◽

Forecast Verification ◽

Second Order Approximation ◽

Probability Score ◽

Continuous Ranked Probability Score ◽

Probabilistic Forecasts

Abstract The Brier score (BS) and its generalizations to the multicategory ranked probability score (RPS) and to the continuous ranked probability score (CRPS) are the prominent verification measures for probabilistic forecasts. Particularly, their decompositions into measures quantifying the reliability, resolution, and uncertainty of the forecasts are attractive. Information theory sets up the natural framework for forecast verification. Recently, it has been shown that the BS is a second-order approximation of the information-based ignorance score (IGN), which also contains easily interpretable components and can also be generalized to a ranked version (RIGN). Here, the IGN, its generalizations, and decompositions are systematically discussed in analogy to the variants of the BS. Additionally, a continuous ranked IGN (CRIGN) is introduced in analogy to the CRPS. The applicability and usefulness of the conceptually appealing CRIGN are illustrated, together with an algorithm to evaluate its components reliability, resolution, and uncertainty for ensemble-generated forecasts.

Download Full-text

Evaluating probabilistic forecasts of football matches: the case against the ranked probability score

Journal of Quantitative Analysis in Sports ◽

10.1515/jqas-2019-0089 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Edward Wheatcroft

Keyword(s):

Scoring Rules ◽

Brier Score ◽

Sporting Events ◽

Forecast Performance ◽

Scoring Rule ◽

Probability Score ◽

Probabilistic Forecasts ◽

Non Local ◽

Evaluating Forecasts ◽

Non Locality

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.

Download Full-text

Using prediction polling for infectious disease forecasting

International Journal of Infectious Diseases ◽

10.1016/j.ijid.2020.09.984 ◽

2020 ◽

Vol 101 ◽

pp. 374

Author(s):

T. Sell ◽

L. Warmbrod ◽

M. Trotochaud ◽

S. Ravi ◽

E. Martin ◽

...

Keyword(s):

Infectious Disease ◽

Disease Forecasting

Download Full-text

Comparative verification in complex topography

10.5194/ems2021-246 ◽

2021 ◽

Author(s):

Jonas Bhend ◽

Jean-Christophe Orain ◽

Vera Schönenberger ◽

Christoph Spirig ◽

Lionel Moret ◽

...

Keyword(s):

Weather Forecasting ◽

Absolute Error ◽

Time Of Day ◽

Probabilistic Forecasts ◽

Wide Range ◽

Direct Model ◽

Forecasting System ◽

The Mean ◽

Update Frequency ◽

Core Activity

<p>Verification is a core activity in weather forecasting. Insights from verification are used for monitoring, for reporting, to support and motivate development of the forecasting system, and to allow users to maximize forecast value. Due to the broad range of applications for which verification provides valuable input, the range of questions one would like to answer can be very large. Static analyses and summary verification results are often insufficient to cover this broad range. To this end, we developed an interactive verification platform at MeteoSwiss that allows users to inspect verification results from a wide range of angles to find answers to their specific questions.</p><p>We present the technical setup to achieve a flexible yet performant interactive platform and two prototype applications: monitoring of direct model output from operational NWP systems and understanding of the capabilities and limitations of our pre-operational postprocessing. We present two innovations that illustrate the user-oriented approach to comparative verification adopted as part of the platform. To facilitate the comparison of a broad range of forecasts issued with varying update frequency, we rely on the concept of time of verification to collocate the most recent available forecasts at the time of day at which the forecasts are used. In addition, we offer a matrix selection to more flexibly select forecast sources and scores for comparison. Doing so, we can for example compare the mean absolute error (MAE) for deterministic forecasts to the MAE and continuous ranked probability scores of probabilistic forecasts to illustrate the benefit of using probabilistic forecasts.</p>

Download Full-text

Optimization of continuous ranked probability score using PSO

Decision Science Letters ◽

10.5267/j.dsl.2015.4.001 ◽

2015 ◽

pp. 373-378 ◽

Cited By ~ 2

Author(s):

Seyedeh Atefeh Mohammadi ◽

Morteza Rahmani ◽

Majid Azadi

Keyword(s):

Probability Score ◽

Continuous Ranked Probability Score

Download Full-text

Continuous Ranked Probability Score Validation Methods in Mixture Bayesian Model for Microarray Data in Indonesia

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/546/5/052012 ◽

2019 ◽

Vol 546 ◽

pp. 052012

Author(s):

Ani Budi Astuti

Keyword(s):

Microarray Data ◽

Bayesian Model ◽

Probability Score ◽

Validation Methods ◽

Continuous Ranked Probability Score

Download Full-text

Performance and reliability of multimodel hydrological ensemble simulations based on seventeen lumped models and a thousand catchments

Hydrology and Earth System Sciences ◽

10.5194/hess-14-2303-2010 ◽

2010 ◽

Vol 14 (11) ◽

pp. 2303-2317 ◽

Cited By ~ 45

Author(s):

J. A. Velázquez ◽

F. Anctil ◽

C. Perrin

Keyword(s):

Predictive Distribution ◽

Added Value ◽

Average Error ◽

Hydrological Models ◽

Relative Operating Characteristic ◽

Probability Score ◽

Ensemble Simulations ◽

Continuous Ranked Probability Score ◽

The Mean ◽

Improved Performance

Abstract. This work investigates the added value of ensembles constructed from seventeen lumped hydrological models against their simple average counterparts. It is thus hypothesized that there is more information provided by all the outputs of these models than by their single aggregated predictors. For all available 1061 catchments, results showed that the mean continuous ranked probability score of the ensemble simulations were better than the mean average error of the aggregated simulations, confirming the added value of retaining all the components of the model outputs. Reliability of the simulation ensembles is also achieved for about 30% of the catchments, as assessed by rank histograms and reliability plots. Nonetheless this imperfection, the ensemble simulations were shown to have better skills than the deterministic simulations at discriminating between events and non-events, as confirmed by relative operating characteristic scores especially for larger streamflows. From 7 to 10 models are deemed sufficient to construct ensembles with improved performance, based on a genetic algorithm search optimizing the continuous ranked probability score. In fact, many model subsets were found improving the performance of the reference ensemble. This is thus not essential to implement as much as seventeen lumped hydrological models. The gain in performance of the optimized subsets is accompanied by some improvement of the ensemble reliability in most cases. Nonetheless, a calibration of the predictive distribution is still needed for many catchments.

Download Full-text