scholarly journals On the ROC Area of Ensemble Forecasts for Rare Events

Author(s):  
Zied Ben Bouallegue ◽  
David S. Richardson

The relative operating characteristic (ROC) curve is a popular diagnostic tool in forecast verification, with the area under the ROC curve (AUC) used as a verification metric measuring the discrimination ability of a forecast. Along with calibration, discrimination is deemed as a fundamental probabilistic forecast attribute. In particular, in ensemble forecast verification, AUC provides a basis for the comparison of potential predictive skill of competing forecasts. While this approach is straightforward when dealing with forecasts of common events (e.g. probability of precipitation), the AUC interpretation can turn out to be oversimplistic or misleading when focusing on rare events (e.g. precipitation exceeding some warning criterion). How should we interpret AUC of ensemble forecasts when focusing on rare events? How can changes in the way probability forecasts are derived from the ensemble forecast affect AUC results? How can we detect a genuine improvement in terms of predictive skill? Based on verification experiments, a critical eye is cast on the AUC interpretation to answer these questions. As well as the traditional trapezoidal approximation and the well-known bi-normal fitting model, we discuss a new approach which embraces the concept of imprecise probabilities and relies on the subdivision of the lowest ensemble probability category.

2007 ◽  
Vol 135 (4) ◽  
pp. 1424-1438 ◽  
Author(s):  
Andrew R. Lawrence ◽  
James A. Hansen

Abstract An ensemble-based data assimilation approach is used to transform old ensemble forecast perturbations with more recent observations for the purpose of inexpensively increasing ensemble size. The impact of the transformations are propagated forward in time over the ensemble’s forecast period without rerunning any models, and these transformed ensemble forecast perturbations can be combined with the most recent ensemble forecast to sensibly increase forecast ensemble sizes. Because the transform takes place in perturbation space, the transformed perturbations must be centered on the ensemble mean from the most recent forecasts. Thus, the benefit of the approach is in terms of improved ensemble statistics rather than improvements in the mean. Larger ensemble forecasts can be used for numerous purposes, including probabilistic forecasting, targeted observations, and to provide boundary conditions to limited-area models. This transformed lagged ensemble forecasting approach is explored and is shown to give positive results in the context of a simple chaotic model. By incorporating a suitable perturbation inflation factor, the technique was found to generate forecast ensembles whose skill were statistically comparable to those produced by adding nonlinear model integrations. Implications for ensemble forecasts generated by numerical weather prediction models are briefly discussed, including multimodel ensemble forecasting.


2020 ◽  
Author(s):  
Χαρά Λιάσκου

Η πρόβλεψη του δύσκολου αεραγωγού βασίζεται σε μετρήσεις ανατομικών χαρακτηριστικών του τραχήλου και του προσώπου με στόχο να αποφευχθούν επιπλοκές κατά τη διαχείρισή του. Σκοπός της διατριβής είναι η μελέτη συγκεκριμένων ανατομικών χαρακτηριστικών του τραχήλου και προεγχειρητικών δοκιμασιών, η εκτίμηση της αξίας τους στην πρόγνωση της λαρυγγοσκοπικής εικόνας κατά Cormack-Lehane (Cormack-Lehane Grade, CLG), καθώς και η δημιουργία αξιόπιστων πολυπαραγοντικών μοντέλων για την πρόγνωση της δύσκολης λαρυγγοσκόπησης με μεταβλητές που είτε απαιτούν τη συνεργασία ασθενή είτε όχι. Η μελέτη περιλαμβάνει 1134 ασθενείς με δείκτη μάζας σώματος μικρότερο από 35 kg/m2 και ηλικία πάνω από 18 ετών. Όλοι οι ασθενείς έδωσαν γραπτή συγκατάθεση ότι δέχονται να συμμετάσχουν στη μελέτη. Εκτιμήθηκαν προεγχειρητικά οι παρακάτω προγνωστικοί δείκτες: θυρεοπωγωνική απόσταση (ΘΠΑ), στερνοπωγωνική απόσταση (ΣΤΠΑ), λόγος ύψους προς θυρεοπωγωνική απόσταση (Λ-Υ/ΘΠΑ), περίμετρος τραχήλου (ΠΤ), λόγος περίμετρος τραχήλου προς θυρεοπωγωνική απόσταση (Λ-ΠΤ/ΘΠΑ), υοειδοπωγωνική απόσταση σε μέγιστη έκταση κεφαλής (ΥΠΓΑ-ΜΕΚ), υοειδοπωγωνική απόσταση σε ουδέτερη θέση κεφαλής (ΥΠΓΑ-ΟΘΚ), λόγος υοειδοπωγωνικής απόστασης (ΛΥΠΓΑ), κατηγοριοποίηση Mallampati (MLC), δοκιμασία επικάλυψης βλεννογόνου άνω χείλους από τους κάτω όδοντες (ULBT), άνοιγμα στόματος (ΑΣ) και έκταση κεφαλής (ΕΚ). Η δύσκολη λαρυγγοσκόπηση ορίστηκε ως Cormack-Lehane κατηγορία 3 ή 4 και καταγράφηκε από αναισθησιολόγο ο οποίος δε γνώριζε τα αποτελέσματα των προεγχειρητικών δοκιμασιών που είχαν γίνει από την ερευνήτρια. Υπολογίστηκαν ευαισθησία, ειδικότητα, θετική και αρνητική προγνωστική αξία όλων των μεταβλητών. Στη συνέχεια υπολογίστηκαν τα βέλτιστα όρια-σημεία τομής για κάθε μεταβλητή με βάση τις καμπύλες λειτουργικής αντιστοιχίας μεταξύ ευαισθησίας και ειδικότητας (ROC curve). Τέλος, έγινε πολυπαραγοντική ανάλυση με λογιστική παλινδρόμηση, συμπεριλαμβάνοντας όλους τους προγνωστικούς δείκτες με τα νέα όρια-σημεία τομής για τη δημιουργία προγνωστικών μοντέλων της δύσκολης λαρυγγοσκόπησης. Η συχνότητα δύσκολης λαρυγγοσκόπησης (CLG 3 & 4) στο δείγμα της παρούσας έρευνας βρέθηκε 10,5%. Ανάμεσα στις μεμονωμένες μεταβλητές, η ΕΚ είχε την υψηλότερη ευαισθησία (78,5%) και αρνητική προγνωστική αξία (96%). Τα νέα όρια-σημεία τομής που βρέθηκαν είναι: ΘΠΑ ≤ 7εκ, ΣΤΠΑ ≤ 17εκ, Λ-Υ/ΘΠΑ > 21.25, ΠΤ > 38εκ, Λ-ΠΤ/ΘΠΑ > 4.94, ΥΠΓΑ-ΟΘΚ ≤ 4.5εκ, ΥΠΓΑ-ΜΕΚ ≤ 5.5εκ, ΛΥΠΓΑ > 1.2, ΑΣ ≤ 3.8εκ, ΕΚ ≤ 35˚. Όλες οι μεταβλητές, εκτός από την ΕΚ , είχαν βέλτιστα όρια-σημεία τομής τα οποία διέφεραν σημαντικά ανάμεσα στα φύλα. Τα όρια-σημεία τομής των ΣΤΠΑ, ΥΠΓΑ-ΟΘΚ, ΥΠΓΑ-ΜΕΚ και ΛΥΠΓΑ διέφεραν σημαντικά σε μεγαλύτερο βαθμό ανάμεσα στα φύλα, αυξάνοντας την προγνωστική αξία της ΣΤΠΑ στους άνδρες (ευαισθησία 81,4% - αρνητική προγνωστική αξία 95,7%) και των ΥΠΓΑ-ΜΕΚ, ΥΠΓΑ-ΟΘΚ, και ΛΥΠΓΑ στις γυναίκες (ευαισθησία 84,6% - αρνητική προγνωστική αξία 97%, ευαισθησία 84,6% - αρνητική προγνωστική αξία 97% and ευαισθησία 88,5% - αρνητική προγνωστική αξία 97,5%, αντιστοίχως). Στην πολυπαραγοντική ανάλυση με τα νέα-όρια σημεία τομής δημιουργήθηκε μοντέλο που περιλαμβάνει πέντε μεταβλητές: MLC, ULBT, EK, ΥΠΓΑ-ΜΕΚ και Λ-ΠΤ/ΘΠΑ. Αυτό το μοντέλο εμφανίζει υψηλή προγνωστική αξία [x2(5)=109,12, p<0,001, AUC=0,86, p<0,001]. Η ευαισθησία, ειδικότητα και αρνητική προγνωστική αξία του μοντέλου ήταν 82,3%, 74,8% και 97,4% αντιστοίχως. Επίσης, ένα δεύτερο προγνωστικό μοντέλο που περιλαμβάνει δύο μεταβλητές, χωρίς να χρειάζεται η συνεργασία ασθενή (Λ-ΠΤ/ΘΠΑ και ΥΠΓΑ-ΜΕΚ), βρέθηκε να έχει καλή προγνωστική αξία [ROC-Area Under Curve: 0,77, p<0,001, x2(2)=63,5, p<0,001] με ευαισθησία, ειδικότητα και αρνητική προγνωστική αξία 75,2%, 70,8% και 96,2% αντιστοίχως. Η παρούσα έρευνα έγινε σε δείγμα ελληνικού πληθυσμού και γι’ αυτό προτείνονται περαιτέρω μελέτες για την πιθανή εφαρμογή των μοντέλων σε άλλους πληθυσμούς με διαφορετικά μορφολογικά χαρακτηριστικά.


2019 ◽  
Vol 147 (8) ◽  
pp. 2997-3023 ◽  
Author(s):  
Craig S. Schwartz

Abstract Two sets of global, 132-h (5.5-day), 10-member ensemble forecasts were produced with the Model for Prediction Across Scales (MPAS) for 35 cases in April and May 2017. One MPAS ensemble had a quasi-uniform 15-km mesh while the other employed a variable-resolution mesh with 3-km cell spacing over the conterminous United States (CONUS) that smoothly relaxed to 15 km over the rest of the globe. Precipitation forecasts from both MPAS ensembles were objectively verified over the central and eastern CONUS to assess the potential benefits of configuring MPAS with a 3-km mesh refinement region for medium-range forecasts. In addition, forecasts from NCEP’s operational Global Ensemble Forecast System were evaluated and served as a baseline against which to compare the experimental MPAS ensembles. The 3-km MPAS ensemble most faithfully reproduced the observed diurnal cycle of precipitation throughout the 132-h forecasts and had superior precipitation skill and reliability over the first 48 h. However, after 48 h, the three ensembles had more similar spread, reliability, and skill, and differences between probabilistic precipitation forecasts derived from the 3- and 15-km MPAS ensembles were typically statistically insignificant. Nonetheless, despite fewer benefits of increased resolution for spatial placement after 48 h, 3-km ensemble members explicitly provided potentially valuable guidance regarding convective mode throughout the 132-h forecasts while the other ensembles did not. Collectively, these results suggest both strengths and limitations of medium-range high-resolution ensemble forecasts and reveal pathways for future investigations to improve understanding of high-resolution global ensembles with variable-resolution meshes.


2003 ◽  
Vol 10 (6) ◽  
pp. 463-468 ◽  
Author(s):  
G. Pellerin ◽  
L. Lefaivre ◽  
P. Houtekamer ◽  
C. Girard

Abstract. Ensemble forecasts are run operationally since February 1998 at the Canadian Meteorological Centre, with outputs up to ten days. The ensemble size was increased from eight to sixteen members in August 1999. The method of producing the perturbed analyses consists of running independent assimilation cycles that use perturbed sets of observations and are driven by eight different models, mainly different in their physical parameterizations. Perturbed analyses are doubled by taking opposite pairs. A multi-model approach is then used to obtain the forecasts. The ensemble output has been used to generate several products. In view of increasing computing facilities, the ensemble prediction system horizontal resolution was increased to TL149 in June 2001. Heights at 500 hPa and mean sea-level pressure maps are regularly used. Charts of precipitation with the probability of precipitation being above various thresholds are also produced at each run. The probabilistic forecast of the 24-h accumulated precipitation has shown skill as demonstrated by the relative operating characteristic (ROC). Verifications of the ensemble forecasts will be presented.


2014 ◽  
Vol 142 (6) ◽  
pp. 2198-2219 ◽  
Author(s):  
Jeffrey D. Duda ◽  
Xuguang Wang ◽  
Fanyou Kong ◽  
Ming Xue

Abstract Two approaches for accounting for errors in quantitative precipitation forecasts (QPFs) due to uncertainty in the microphysics (MP) parameterization in a convection-allowing ensemble are examined. They include mixed MP (MMP) composed mostly of double-moment schemes and perturbing parameters within the Weather Research and Forecasting single-moment 6-class microphysics scheme (WSM6) MP scheme (PPMP). Thirty-five cases of real-time storm-scale ensemble forecasts produced by the Center for Analysis and Prediction of Storms during the NOAA Hazardous Weather Testbed 2011 Spring Experiment were examined. The MMP ensemble had better fractions Brier scores (FBSs) for most lead times and thresholds, but the PPMP ensemble had better relative operating characteristic (ROC) scores for higher precipitation thresholds. The pooled ensemble formed by randomly drawing five members from the MMP and PPMP ensembles was no more skillful than the more accurate of the MMP and PPMP ensembles. Significant positive impact was found when the two were combined to form a larger ensemble. The QPF and the systematic behaviors of derived microphysical variables were also examined. The skill of the QPF among different members depended on the thresholds, verification metrics, and forecast lead times. The profiles of microphysics variables from the double-moment schemes contained more variation in the vertical than those from the single-moment members. Among the double-moment schemes, WDM6 produced the smallest raindrops and very large number concentrations. Among the PPMP members, the behaviors were found to be consistent with the prescribed intercept parameters. The perturbed intercept parameters used in the PPMP ensemble fell within the range of values retrieved from the double-moment schemes.


2019 ◽  
Vol 34 (6) ◽  
pp. 1955-1964
Author(s):  
Adam J. Clark

Abstract This study compares ensemble precipitation forecasts from 10-member, 3-km grid-spacing, CONUS domain single- and multicore ensembles that were a part of the 2016 Community Leveraged Unified Ensemble (CLUE) that was run for the 2016 NOAA Hazardous Weather Testbed Spring Forecasting Experiment. The main results are that a 10-member ARW ensemble was significantly more skillful than a 10-member NMMB ensemble, and a 10-member MIX ensemble (5 ARW and 5 NMMB members) performed about the same as the 10-member ARW ensemble. Skill was measured by area under the relative operating characteristic curve (AUC) and fractions skill score (FSS). Rank histograms in the ARW ensemble were flatter than the NMMB ensemble indicating that the envelope of ensemble members better encompassed observations (i.e., better reliability) in the ARW. Rank histograms in the MIX ensemble were similar to the ARW ensemble. In the context of NOAA’s plans for a Unified Forecast System featuring a CAM ensemble with a single core, the results are positive and indicate that it should be possible to develop a single-core system that performs as well as or better than the current operational CAM ensemble, which is known as the High-Resolution Ensemble Forecast System (HREF). However, as new modeling applications are developed and incremental changes that move HREF toward a single-core system are made possible, more thorough testing and evaluation should be conducted.


2008 ◽  
Vol 136 (3) ◽  
pp. 1054-1074 ◽  
Author(s):  
Tomislava Vukicevic ◽  
Isidora Jankov ◽  
John McGinley

Abstract In the current study, a technique that offers a way to evaluate ensemble forecast uncertainties produced either by initial conditions or different model versions, or both, is presented. The technique consists of first diagnosing the performance of the forecast ensemble and then optimizing the ensemble forecast using results of the diagnosis. The technique is based on the explicit evaluation of probabilities that are associated with the Gaussian stochastic representation of the weather analysis and forecast. It combines an ensemble technique for evaluating the analysis error covariance and the standard Monte Carlo approach for computing samples from a known Gaussian distribution. The technique was demonstrated in a tutorial manner on two relatively simple examples to illustrate the impact of ensemble characteristics including ensemble size, various observation strategies, and configurations including different model versions and varying initial conditions. In addition, the authors assessed improvements in the consensus forecasts gained by optimal weighting of the ensemble members based on time-varying, prior-probabilistic skill measures. The results with different observation configurations indicate that, as observations become denser, there is a need for larger-sized ensembles and/or more accuracy among individual members for the ensemble forecast to exhibit prediction skill. The main conclusions relative to ensembles built up with different physics configurations were, first, that almost all members typically exhibited some skill at some point in the model run, suggesting that all should be retained to acquire the best consensus forecast; and, second, that the normalized probability metric can be used to determine what sets of weights or physics configurations are performing best. A comparison of forecasts derived from a simple ensemble mean to forecasts from a mean developed from variably weighting the ensemble members based on prior performance by the probabilistic measure showed that the latter had substantially reduced mean absolute error. The study also indicates that a weighting scheme that utilized more prior cycles showed additional reduction in forecast error.


2014 ◽  
Vol 15 (3) ◽  
pp. 1152-1165 ◽  
Author(s):  
Di Tian ◽  
Christopher J. Martinez

Abstract NOAA’s second-generation retrospective forecast (reforecast) dataset was created using the currently operational Global Ensemble Forecast System (GEFS). It has the potential to accurately forecast daily reference evapotranspiration ETo and can be useful for water management. This study was conducted to evaluate daily ETo forecasts using the GEFS reforecasts in the southeastern United States (SEUS) and to incorporate the ETo forecasts into irrigation scheduling to explore the usefulness of the forecasts for water management. ETo was estimated using the Penman–Monteith equation, and ensemble forecasts were downscaled and bias corrected using a forecast analog approach. The overall forecast skill was evaluated using the linear error in probability space skill score, and the forecast in five categories (terciles and 10th and 90th percentiles) was evaluated using the Brier skill score, relative operating characteristic, and reliability diagrams. Irrigation scheduling was evaluated by water deficit WD forecasts, which were determined based on the agricultural reference index for drought (ARID) model driven by the GEFS-based ETo forecasts. All forecast skill was generally positive up to lead day 7 throughout the year, with higher skill in cooler months compared to warmer months. The GEFS reforecast improved ETo forecast skill for all lead days over the SEUS compared to the first-generation reforecast. The WD forecasts driven by the ETo forecasts showed higher accuracy and less uncertainty than the forecasts driven by climatology, indicating their usefulness for irrigation scheduling, hydrological forecasting, and water demand forecasting in the SEUS.


Science ◽  
2012 ◽  
Vol 335 (6064) ◽  
pp. 76-79 ◽  
Author(s):  
Daniela Matei ◽  
Johanna Baehr ◽  
Johann H. Jungclaus ◽  
Helmuth Haak ◽  
Wolfgang A. Müller ◽  
...  

Attempts to predict changes in Atlantic Meridional Overturning Circulation (AMOC) have yielded little success to date. Here, we demonstrate predictability for monthly mean AMOC strength at 26.5°N for up to 4 years in advance. This AMOC predictive skill arises predominantly from the basin-wide upper-mid-ocean geostrophic transport, which in turn can be predicted because we have skill in predicting the upper-ocean zonal density difference. Ensemble forecasts initialized between January 2008 and January 2011 indicate a stable AMOC at 26.5°N until at least 2014, despite a brief wind-induced weakening in 2010. Because AMOC influences many aspects of climate, our results establish AMOC as an important potential carrier of climate predictability.


Sign in / Sign up

Export Citation Format

Share Document