Probabilistic Contingency Tables: An Improvement to Verify Probability Forecasts

Abstract The 45th Weather Squadron (45 WS) records daily rain and lightning probabilistic forecasts and the associated binary event outcomes. Subsequently, they evaluate forecast performance and determine necessary adjustments with an established verification process. For deterministic outcomes, weather forecast analysis typically utilizes a traditional contingency table (TCT) for verification; however, the 45 WS uses an alternative tool, the probabilistic contingency table (PCT). Using the TCT for verification requires a threshold, typically at 50%, to dichotomize probabilistic forecasts. The PCT maintains the valuable information in probabilities and verifies the true forecasts being reported. Simulated forecasts and outcomes as well as 2015–18 45 WS data are utilized to compare forecast performance metrics produced from the TCT and PCT to determine which verification tool better reflects the quality of forecasts. Comparisons of frequency bias and other statistical metrics computed from both dichotomized and continuous forecasts reveal misrepresentative performance metrics from the TCT as well as a loss of information necessary for verification. PCT bias better reflects forecast verification in contrast to that of TCT bias, which suggests suboptimal forecasts when in fact the forecasts are accurate.

Download Full-text

Kullback–Leibler Divergence as a Forecast Skill Score with Classic Reliability–Resolution–Uncertainty Decomposition

Monthly Weather Review ◽

10.1175/2010mwr3229.1 ◽

2010 ◽

Vol 138 (9) ◽

pp. 3387-3399 ◽

Cited By ~ 47

Author(s):

Steven V. Weijs ◽

Ronald van Nooijen ◽

Nick van de Giesen

Keyword(s):

Relative Entropy ◽

Skill Score ◽

Diagnostic Information ◽

Brier Score ◽

Forecast Verification ◽

Theoretical Concepts ◽

Probabilistic Forecasts ◽

Leibler Divergence ◽

Forecast Skill Score

Abstract This paper presents a score that can be used for evaluating probabilistic forecasts of multicategory events. The score is a reinterpretation of the logarithmic score or ignorance score, now formulated as the relative entropy or Kullback–Leibler divergence of the forecast distribution from the observation distribution. Using the information–theoretical concepts of entropy and relative entropy, a decomposition into three components is presented, analogous to the classic decomposition of the Brier score. The information–theoretical twins of the components uncertainty, resolution, and reliability provide diagnostic information about the quality of forecasts. The overall score measures the information conveyed by the forecast. As was shown recently, information theory provides a sound framework for forecast verification. The new decomposition, which has proven to be very useful for the Brier score and is widely used, can help acceptance of the logarithmic score in meteorology.

Download Full-text

Beyond Strictly Proper Scoring Rules: The Importance of Being Local

Weather and Forecasting ◽

10.1175/waf-d-19-0205.1 ◽

2021 ◽

Author(s):

Hailiang Du

Keyword(s):

Scoring Rules ◽

Forecast Performance ◽

Locality Property ◽

Proper Scoring Rules ◽

Statistical Measures ◽

Probabilistic Forecasts ◽

Direct Interpretation ◽

Probability Mass ◽

Smooth Transformation

AbstractThe evaluation of probabilistic forecasts plays a central role both in the interpretation and in the use of forecast systems and their development. Probabilistic scores (scoring rules) provide statistical measures to assess the quality of probabilistic forecasts. Often, many probabilistic forecast systems are available while evaluations of their performance are not standardized, with different scoring rules being used to measure different aspects of forecast performance. Even when the discussion is restricted to strictly proper scoring rules, there remains considerable variability between them; indeed strictly proper scoring rules need not rank competing forecast systems in the same order when none of these systems are perfect. The locality property is explored to further distinguish scoring rules. The nonlocal strictly proper scoring rules considered are shown to have a property that can produce “unfortunate” evaluations. Particularly the fact that Continuous Rank Probability Score prefers the outcome close to the median of the forecast distribution regardless the probability mass assigned to the value at/near the median raises concern to its use. The only local strictly proper scoring rules, the logarithmic score, has direct interpretations in terms of probabilities and bits of information. The nonlocal strictly proper scoring rules, on the other hand, lack meaningful direct interpretation for decision support. The logarithmic score is also shown to be invariant under smooth transformation of the forecast variable, while the nonlocal strictly proper scoring rules considered may, however, change their preferences due to the transformation. It is therefore suggested that the logarithmic score always be included in the evaluation of probabilistic forecasts.

Download Full-text

Classification of methods for generating pseudo-CT from MRI images for MRI-alone RT

Medical Technologies Journal ◽

10.26415/2572-004x-vol1iss3p54-54 ◽

2017 ◽

Vol 1 (3) ◽

pp. 54

Author(s):

BOUKELLOUZ Wafa ◽

MOUSSAOUI Abdelouahab

Keyword(s):

Performance Metrics ◽

Radiation Treatment ◽

Hybrid Methods ◽

Dose Calculation ◽

Absolute Error ◽

Ct Images ◽

General Technique ◽

Therapy Field ◽

The Brain

Background: Since the last decades, research have been oriented towards an MRI-alone radiation treatment planning (RTP), where MRI is used as the primary modality for imaging, delineation and dose calculation by assigning to it the needed electron density (ED) information. The idea is to create a computed tomography (CT) image or so-called pseudo-CT from MRI data. In this paper, we review and classify methods for creating pseudo-CT images from MRI data. Each class of methods is explained and a group of works in the literature is presented in detail with statistical performance. We discuss the advantages, drawbacks and limitations of each class of methods. Methods: We classified most recent works in deriving a pseudo-CT from MR images into four classes: segmentation-based, intensity-based, atlas-based and hybrid methods. We based the classification on the general technique applied in the approach. Results: Most of research focused on the brain and the pelvis regions. The mean absolute error (MAE) ranged from 80 HU to 137 HU and from 36.4 HU to 74 HU for the brain and pelvis, respectively. In addition, an interest in the Dixon MR sequence is increasing since it has the advantage of producing multiple contrast images with a single acquisition. Conclusion: Radiation therapy field is emerging towards the generalization of MRI-only RT thanks to the advances in techniques for generation of pseudo-CT images. However, a benchmark is needed to set in common performance metrics to assess the quality of the generated pseudo-CT and judge on the efficiency of a certain method.

Download Full-text

FUNDAMENTAL AND APPLIED SCIENTIFIC RESEARCH: сб.науч.тр./ OEAPS Inc.; редкол.: Флора Бертран (отв.ред.) [и др.]. - Берлин, Германия : OEAPS Inc., 2018. - 58 С.

10.31219/osf.io/hj4yf ◽

2018 ◽

Author(s):

Мария Григорьевна Алпатова ◽

Мария Игоревна Щеглова ◽

Elmira Kalybaevna Adil’bekova ◽

Nuradin Alibaev ◽

Arunas Svitojus

Keyword(s):

Service Sector ◽

Applied Research ◽

Point Of View ◽

Scientific Quality ◽

Final Decision ◽

Main Criterion ◽

The Public ◽

Peer Reviewers ◽

Verification Process

The conference is a major international forum for analyzing and discussing trends and approaches in research in the field of basic science and applied research. We provide a platform for discussions on innovative, theoretical and empirical research. The form of the conference: in absentia, without specifying the form in the collection of articles. Working languages: Russian, English Doctors and candidates of science, scientists, specialists of various profiles and directions, applicants for academic degrees, teachers, graduate students, undergraduates, students are invited to participate in the conference. There is one blind verification process in the journal. All articles will be initially evaluated by the editor for compliance with the journal. Manuscripts that are considered appropriate are then usually sent to at least two independent peer reviewers to assess the scientific quality of the article. The editor is responsible for the final decision on whether to accept or reject the article. The editor's decision is final. The main criterion used in assessing the manuscript submitted to the journal is: uniqueness or innovation in the work from the point of view of the methodology being developed and / or its application to a problem of particular importance in the public sector or service sector and / or the setting in which the efforts, for example, in the developing region of the world. That is, the very model / methodology, application and context of problems, at least one of them must be unique and important. Additional criteria considered in the consideration of the submitted document are its accuracy, organization / presentation (ie logical flow) and recording quality.

Download Full-text

FUNDAMENTAL AND APPLIED SCIENTIFIC RESEARCH April 2019: сб.науч.тр./ OEAPS Inc.; редкол.: Флора Бертран (отв.ред.) [и др.]. - Берлин, Германия : OEAPS Inc., 2019. - 98 C.

10.31219/osf.io/qmrey ◽

2019 ◽

Author(s):

Изабелла Станиславовна Чибисова ◽

Диана Ильгизаровна Шарипова ◽

Альфия Галиевна Зулькарнаева ◽

Ксения Александровна Дулова ◽

Садег Амирзадеган ◽

...

Keyword(s):

Service Sector ◽

Applied Research ◽

Point Of View ◽

Scientific Quality ◽

Final Decision ◽

Main Criterion ◽

The Public ◽

Peer Reviewers ◽

Verification Process

Download Full-text

Performance Evaluation of IMERG GPM Products during Tropical Storm Imelda

Atmosphere ◽

10.3390/atmos12060687 ◽

2021 ◽

Vol 12 (6) ◽

pp. 687

Author(s):

Salman Sakib ◽

Dawit Ghebreyesus ◽

Hatim O. Sharif

Keyword(s):

Stage Iv ◽

Tropical Storm ◽

Average Correlation ◽

Acceptable Range ◽

Precipitation Estimates ◽

Overall Performance ◽

Early Late ◽

Statistical Metrics ◽

Better Than

Tropical Storm Imelda struck the southeast coastal regions of Texas from 17–19 September, 2019, and delivered precipitation above 500 mm over about 6000 km2. The performance of the three IMERG (Early-, Late-, and Final-run) GPM satellite-based precipitation products was evaluated against Stage-IV radar precipitation estimates. Basic and probabilistic statistical metrics, such as CC, RSME, RBIAS, POD, FAR, CSI, and PSS were employed to assess the performance of the IMERG products. The products captured the event adequately, with a fairly high POD value of 0.9. The best product (Early-run) showed an average correlation coefficient of 0.60. The algorithm used to produce the Final-run improved the quality of the data by removing systematic errors that occurred in the near-real-time products. Less than 5 mm RMSE error was experienced in over three-quarters (ranging from 73% to 76%) of the area by all three IMERG products in estimating the Tropical Storm Imelda. The Early-run product showed a much better RBIAS relatively to the Final-run product. The overall performance was poor, as areas with an acceptable range of RBIAS (i.e., between −10% and 10%) in all the three IMERG products were only 16% to 17% of the total area. Overall, the Early-run product was found to be better than Late- and Final-run.

Download Full-text

Evaluating probabilistic forecasts of football matches: the case against the ranked probability score

Journal of Quantitative Analysis in Sports ◽

10.1515/jqas-2019-0089 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Edward Wheatcroft

Keyword(s):

Scoring Rules ◽

Brier Score ◽

Sporting Events ◽

Forecast Performance ◽

Scoring Rule ◽

Probability Score ◽

Probabilistic Forecasts ◽

Non Local ◽

Evaluating Forecasts ◽

Non Locality

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.

Download Full-text

Cold stress in the harvest period: effects on tobacco leaf quality and curing characteristics

BMC Plant Biology ◽

10.1186/s12870-021-02895-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yan Li ◽

Ke Ren ◽

Mengyang Hu ◽

Xian He ◽

Kaiyuan Gu ◽

...

Keyword(s):

Cold Stress ◽

High Altitude ◽

Weather Forecast ◽

Production Quality ◽

Chemical Components ◽

Leaf Quality ◽

Tobacco Leaves ◽

Tobacco Leaf ◽

Antioxidant Enzyme System

Abstract Background Weather change in high-altitude areas subjects mature tobacco (Nicotiana tabacum L.) to cold stress, which damages tobacco leaf yield and quality. A brupt diurnal temperature differences (the daily temperature dropping more than 20 °C) along with rainfall in tobacco-growing areas at an altitude above 2450 m, caused cold stress to field-grown tobacco. Results After the flue-cured tobacco suffered cold stress in the field, the surface color of tobacco leaves changed and obvious large browning areas were appeared, and the curing availability was extremely poor. Further research found the quality of fresh tobacco leaves, the content of key chemical components, and the production quality were greatly reduced by cold stress. We hypothesize that cold stress in high altitude environments destroyed the antioxidant enzyme system of mature flue-cured tobacco. Therefore, the quality of fresh tobacco leaves, the content of key chemical components, and the production quality were greatly reduced by cold stress. Conclusion This study confirmed that cold stress in high-altitude tobacco areas was the main reason for the browning of tobacco leaves during the tobacco curing process. This adverse environment seriously damaged the quality of tobacco leaves, but can be mitigated by pay attention to the weather forecast and pick tobacco leaves in advance.

Download Full-text

Statistical metrics for assessing the quality of wind power scenarios for stochastic unit commitment

Wind Energy ◽

10.1002/we.1872 ◽

2015 ◽

Vol 19 (5) ◽

pp. 873-893 ◽

Cited By ~ 13

Author(s):

Didem Sari ◽

Youngrok Lee ◽

Sarah Ryan ◽

David Woodruff

Keyword(s):

Wind Power ◽

Unit Commitment ◽

Stochastic Unit Commitment ◽

Statistical Metrics

Download Full-text

Intercomparison of Spatial Forecast Verification Methods: Identifying Skillful Spatial Scales Using the Fractions Skill Score

Weather and Forecasting ◽

10.1175/2009waf2222260.1 ◽

2010 ◽

Vol 25 (1) ◽

pp. 343-354 ◽

Cited By ~ 100

Author(s):

Marion Mittermaier ◽

Nigel Roberts

Keyword(s):

Spatial Scales ◽

Wrf Model ◽

Skill Score ◽

Area Ratio ◽

Forecast Verification ◽

Care Needs ◽

Forecast Performance ◽

Verification Methods ◽

Upper Level ◽

Formed Part

Abstract The fractions skill score (FSS) was one of the measures that formed part of the Intercomparison of Spatial Forecast Verification Methods project. The FSS was used to assess a common dataset that consisted of real and perturbed Weather Research and Forecasting (WRF) model precipitation forecasts, as well as geometric cases. These datasets are all based on the NCEP 240 grid, which translates to approximately 4-km resolution over the contiguous United States. The geometric cases showed that the FSS can provide a truthful assessment of displacement errors and forecast skill. In addition, the FSS can be used to determine the scale at which an acceptable level of skill is reached and this usage is perhaps more helpful than interpreting the actual FSS value. This spatial-scale approach is becoming more popular for monitoring operational forecast performance. The study also shows how the FSS responds to forecast bias. A more biased forecast always gives lower FSS values at large scales and usually at smaller scales. It is possible, however, for a more biased forecast to give a higher score at smaller scales, when additional rain overlaps the observed rain. However, given a sufficiently large sample of forecasts, a more biased forecast system will score lower. The use of percentile thresholds can remove the impacts of the bias. When the proportion of the domain that is “wet” (the wet-area ratio) is small, subtle differences introduced through near-threshold misses can lead to large changes in FSS magnitude in individual cases (primarily because the bias is changed). Reliable statistics for small wet-area ratios require a larger sample of forecasts. Care needs to be taken in the choice of verification domain. For high-resolution models, the domain should be large enough to encompass the length scale of the typical mesoscale forcing (e.g., upper-level troughs or squall lines). If the domain is too large, the wet-area ratios will always be small. If the domain is too small, fluctuations in the wet-area ratio can be large and larger spatial errors may be missed. The FSS is a good measure of the spatial accuracy of precipitation forecasts. Different methods are needed to determine other patterns of behavior.

Download Full-text