scholarly journals A construction principle for proper scoring rules

2021 ◽  
Vol 8 (24) ◽  
pp. 297-301
Author(s):  
Jonas Brehmer

Proper scoring rules enable decision-theoretically principled comparisons of probabilistic forecasts. New scoring rules can be constructed by identifying the predictive distribution with an element of a parametric family and then applying a known scoring rule. We introduce a condition which ensures propriety in this construction and thereby obtain novel proper scoring rules.

2015 ◽  
Vol 143 (4) ◽  
pp. 1321-1334 ◽  
Author(s):  
Michael Scheuerer ◽  
Thomas M. Hamill

Abstract Proper scoring rules provide a theoretically principled framework for the quantitative assessment of the predictive performance of probabilistic forecasts. While a wide selection of such scoring rules for univariate quantities exists, there are only few scoring rules for multivariate quantities, and many of them require that forecasts are given in the form of a probability density function. The energy score, a multivariate generalization of the continuous ranked probability score, is the only commonly used score that is applicable in the important case of ensemble forecasts, where the multivariate predictive distribution is represented by a finite sample. Unfortunately, its ability to detect incorrectly specified correlations between the components of the multivariate quantity is somewhat limited. In this paper the authors present an alternative class of proper scoring rules based on the geostatistical concept of variograms. The sensitivity of these variogram-based scoring rules to incorrectly predicted means, variances, and correlations is studied in a number of examples with simulated observations and forecasts; they are shown to be distinctly more discriminative with respect to the correlation structure. This conclusion is confirmed in a case study with postprocessed wind speed forecasts at five wind park locations in Colorado.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Edward Wheatcroft

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.


2020 ◽  
Vol 17 (2) ◽  
pp. 115-133
Author(s):  
Zachary J. Smith ◽  
J. Eric Bickel

In this paper, we develop strictly proper scoring rules that may be used to evaluate the accuracy of a sequence of probabilistic forecasts. In practice, when forecasts are submitted for multiple uncertainties, competing forecasts are ranked by their cumulative or average score. Alternatively, one could score the implied joint distributions. We demonstrate that these measures of forecast accuracy disagree under some commonly used rules. Furthermore, and most importantly, we show that forecast rankings can depend on the selected scoring procedure. In other words, under some scoring rules, the relative ranking of probabilistic forecasts does not depend solely on the information content of those forecasts and the observed outcome. Instead, the relative ranking of forecasts is a function of the process by which those forecasts are evaluated. As an alternative, we describe additive and strongly additive strictly proper scoring rules, which have the property that the score for the joint distribution is equal to a sum of scores for the associated marginal and conditional distributions. We give methods for constructing additive rules and demonstrate that the logarithmic score is the only strongly additive rule. Finally, we connect the additive properties of scoring rules with analogous properties for a general class of entropy measures.


Author(s):  
Hailiang Du

AbstractThe evaluation of probabilistic forecasts plays a central role both in the interpretation and in the use of forecast systems and their development. Probabilistic scores (scoring rules) provide statistical measures to assess the quality of probabilistic forecasts. Often, many probabilistic forecast systems are available while evaluations of their performance are not standardized, with different scoring rules being used to measure different aspects of forecast performance. Even when the discussion is restricted to strictly proper scoring rules, there remains considerable variability between them; indeed strictly proper scoring rules need not rank competing forecast systems in the same order when none of these systems are perfect. The locality property is explored to further distinguish scoring rules. The nonlocal strictly proper scoring rules considered are shown to have a property that can produce “unfortunate” evaluations. Particularly the fact that Continuous Rank Probability Score prefers the outcome close to the median of the forecast distribution regardless the probability mass assigned to the value at/near the median raises concern to its use. The only local strictly proper scoring rules, the logarithmic score, has direct interpretations in terms of probabilities and bits of information. The nonlocal strictly proper scoring rules, on the other hand, lack meaningful direct interpretation for decision support. The logarithmic score is also shown to be invariant under smooth transformation of the forecast variable, while the nonlocal strictly proper scoring rules considered may, however, change their preferences due to the transformation. It is therefore suggested that the logarithmic score always be included in the evaluation of probabilistic forecasts.


2020 ◽  
Author(s):  
Nico Keilman

Abstract Statisticians have developed scoring rules for evaluating probabilistic forecasts against observations. However, there are very few applications in the literature on population forecasting. A scoring rule measures the distance between the predictive distribution and its outcome. We review scoring rules that reward accuracy (the outcome is close to the expectation of the distribution) and sharpness (the distribution has low variance, which makes it difficult to hit the target). We evaluate probabilistic population forecasts for France, the Netherlands, and Norway. Forecasts for total population size for the Netherlands and for Norway performed quite well. The error in the jump-off population caused a bad score for the French forecast. We evaluate the age and sex composition predicted for the year 2010. The predictions for the Netherlands received the best scores, except for the oldest old. The age pattern for the Norwegian score reflects the under-prediction of immigration after the enlargement of the European Union in 2005. JEL codes: C15, C44, J11,


2021 ◽  
Author(s):  
Christian Basteck

AbstractWe characterize voting procedures according to the social choice correspondence they implement when voters cast ballots strategically, applying iteratively undominated strategies. In elections with three candidates, the Borda Rule is the unique positional scoring rule that satisfies unanimity (U) (i.e., elects a candidate whenever it is unanimously preferred) and is majoritarian after eliminating a worst candidate (MEW)(i.e., if there is a unanimously disliked candidate, the majority-preferred among the other two is elected). In a larger class of rules, Approval Voting is characterized by a single axiom that implies both U and MEW but is weaker than Condorcet-consistency (CON)—it is the only direct mechanism scoring rule that is majoritarian after eliminating a Pareto-dominated candidate (MEPD)(i.e., if there is a Pareto-dominated candidate, the majority-preferred among the other two is elected); among all finite scoring rules that satisfy MEPD, Approval Voting is the most decisive. However, it fails a desirable monotonicity property: a candidate that is elected for some preference profile, may lose the election once she gains further in popularity. In contrast, the Borda Rule is the unique direct mechanism scoring rule that satisfies U, MEW and monotonicity (MON). There exists no direct mechanism scoring rule that satisfies both MEPD and MON and no finite scoring rule satisfying CON.


Sign in / Sign up

Export Citation Format

Share Document