scholarly journals Evaluating epidemic forecasts in an interval format

2021 ◽  
Vol 17 (2) ◽  
pp. e1008618
Author(s):  
Johannes Bracher ◽  
Evan L. Ray ◽  
Tilmann Gneiting ◽  
Nicholas G. Reich

For practical reasons, many forecasts of case, hospitalization, and death counts in the context of the current Coronavirus Disease 2019 (COVID-19) pandemic are issued in the form of central predictive intervals at various levels. This is also the case for the forecasts collected in the COVID-19 Forecast Hub (https://covid19forecasthub.org/). Forecast evaluation metrics like the logarithmic score, which has been applied in several infectious disease forecasting challenges, are then not available as they require full predictive distributions. This article provides an overview of how established methods for the evaluation of quantile and interval forecasts can be applied to epidemic forecasts in this format. Specifically, we discuss the computation and interpretation of the weighted interval score, which is a proper score that approximates the continuous ranked probability score. It can be interpreted as a generalization of the absolute error to probabilistic forecasts and allows for a decomposition into a measure of sharpness and penalties for over- and underprediction.

2012 ◽  
Vol 140 (6) ◽  
pp. 2005-2017 ◽  
Author(s):  
Julian Tödter ◽  
Bodo Ahrens

Abstract The Brier score (BS) and its generalizations to the multicategory ranked probability score (RPS) and to the continuous ranked probability score (CRPS) are the prominent verification measures for probabilistic forecasts. Particularly, their decompositions into measures quantifying the reliability, resolution, and uncertainty of the forecasts are attractive. Information theory sets up the natural framework for forecast verification. Recently, it has been shown that the BS is a second-order approximation of the information-based ignorance score (IGN), which also contains easily interpretable components and can also be generalized to a ranked version (RIGN). Here, the IGN, its generalizations, and decompositions are systematically discussed in analogy to the variants of the BS. Additionally, a continuous ranked IGN (CRIGN) is introduced in analogy to the CRPS. The applicability and usefulness of the conceptually appealing CRIGN are illustrated, together with an algorithm to evaluate its components reliability, resolution, and uncertainty for ensemble-generated forecasts.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Edward Wheatcroft

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.


2020 ◽  
Vol 101 ◽  
pp. 374
Author(s):  
T. Sell ◽  
L. Warmbrod ◽  
M. Trotochaud ◽  
S. Ravi ◽  
E. Martin ◽  
...  

2021 ◽  
Author(s):  
Jonas Bhend ◽  
Jean-Christophe Orain ◽  
Vera Schönenberger ◽  
Christoph Spirig ◽  
Lionel Moret ◽  
...  

<p>Verification is a core activity in weather forecasting. Insights from verification are used for monitoring, for reporting, to support and motivate development of the forecasting system, and to allow users to maximize forecast value. Due to the broad range of applications for which verification provides valuable input, the range of questions one would like to answer can be very large. Static analyses and summary verification results are often insufficient to cover this broad range. To this end, we developed an interactive verification platform at MeteoSwiss that allows users to inspect verification results from a wide range of angles to find answers to their specific questions.</p><p>We present the technical setup to achieve a flexible yet performant interactive platform and two prototype applications: monitoring of direct model output from operational NWP systems and understanding of the capabilities and limitations of our pre-operational postprocessing. We present two innovations that illustrate the user-oriented approach to comparative verification adopted as part of the platform. To facilitate the comparison of a broad range of forecasts issued with varying update frequency, we rely on the concept of time of verification to collocate the most recent available forecasts at the time of day at which the forecasts are used. In addition, we offer a matrix selection to more flexibly select forecast sources and scores for comparison. Doing so, we can for example compare the mean absolute error (MAE) for deterministic forecasts to the MAE and continuous ranked probability scores of probabilistic forecasts to illustrate the benefit of using probabilistic forecasts.</p>


2015 ◽  
pp. 373-378 ◽  
Author(s):  
Seyedeh Atefeh Mohammadi ◽  
Morteza Rahmani ◽  
Majid Azadi

2010 ◽  
Vol 14 (11) ◽  
pp. 2303-2317 ◽  
Author(s):  
J. A. Velázquez ◽  
F. Anctil ◽  
C. Perrin

Abstract. This work investigates the added value of ensembles constructed from seventeen lumped hydrological models against their simple average counterparts. It is thus hypothesized that there is more information provided by all the outputs of these models than by their single aggregated predictors. For all available 1061 catchments, results showed that the mean continuous ranked probability score of the ensemble simulations were better than the mean average error of the aggregated simulations, confirming the added value of retaining all the components of the model outputs. Reliability of the simulation ensembles is also achieved for about 30% of the catchments, as assessed by rank histograms and reliability plots. Nonetheless this imperfection, the ensemble simulations were shown to have better skills than the deterministic simulations at discriminating between events and non-events, as confirmed by relative operating characteristic scores especially for larger streamflows. From 7 to 10 models are deemed sufficient to construct ensembles with improved performance, based on a genetic algorithm search optimizing the continuous ranked probability score. In fact, many model subsets were found improving the performance of the reference ensemble. This is thus not essential to implement as much as seventeen lumped hydrological models. The gain in performance of the optimized subsets is accompanied by some improvement of the ensemble reliability in most cases. Nonetheless, a calibration of the predictive distribution is still needed for many catchments.


Sign in / Sign up

Export Citation Format

Share Document