Beyond the Basics: Evaluating Model-Based Precipitation Forecasts Using Traditional, Spatial, and Object-Based Methods

2014 ◽  
Vol 29 (6) ◽  
pp. 1451-1472 ◽  
Author(s):  
Jamie K. Wolff ◽  
Michelle Harrold ◽  
Tressa Fowler ◽  
John Halley Gotway ◽  
Louisa Nance ◽  
...  

Abstract While traditional verification methods are commonly used to assess numerical model quantitative precipitation forecasts (QPFs) using a grid-to-grid approach, they generally offer little diagnostic information or reasoning behind the computed statistic. On the other hand, advanced spatial verification techniques, such as neighborhood and object-based methods, can provide more meaningful insight into differences between forecast and observed features in terms of skill with spatial scale, coverage area, displacement, orientation, and intensity. To demonstrate the utility of applying advanced verification techniques to mid- and coarse-resolution models, the Developmental Testbed Center (DTC) applied several traditional metrics and spatial verification techniques to QPFs provided by the Global Forecast System (GFS) and operational North American Mesoscale Model (NAM). Along with frequency bias and Gilbert skill score (GSS) adjusted for bias, both the fractions skill score (FSS) and Method for Object-Based Diagnostic Evaluation (MODE) were utilized for this study with careful consideration given to how these methods were applied and how the results were interpreted. By illustrating the types of forecast attributes appropriate to assess with the spatial verification techniques, this paper provides examples of how to obtain advanced diagnostic information to help identify what aspects of the forecast are or are not performing well.

2016 ◽  
Vol 31 (3) ◽  
pp. 713-735 ◽  
Author(s):  
Patrick S. Skinner ◽  
Louis J. Wicker ◽  
Dustan M. Wheatley ◽  
Kent H. Knopfmeier

Abstract Two spatial verification methods are applied to ensemble forecasts of low-level rotation in supercells: a four-dimensional, object-based matching algorithm and the displacement and amplitude score (DAS) based on optical flow. Ensemble forecasts of low-level rotation produced using the National Severe Storms Laboratory (NSSL) Experimental Warn-on-Forecast System are verified against WSR-88D single-Doppler azimuthal wind shear values interpolated to the model grid. Verification techniques are demonstrated using four 60-min forecasts issued at 15-min intervals in the hour preceding development of the 20 May 2013 Moore, Oklahoma, tornado and compared to results from two additional forecasts of tornadic supercells occurring during the springs of 2013 and 2014. The object-based verification technique and displacement component of DAS are found to reproduce subjectively determined forecast characteristics in successive forecasts for the 20 May 2013 event, as well as to discriminate in subjective forecast quality between different events. Ensemble-mean, object-based measures quantify spatial and temporal displacement, as well as storm motion biases in predicted low-level rotation in a manner consistent with subjective interpretation. Neither method produces useful measures of the intensity of low-level rotation, owing to deficiencies in the verification dataset and forecast resolution.


2021 ◽  
Vol 36 (1) ◽  
pp. 3-19
Author(s):  
Burkely T. Gallo ◽  
Jamie K. Wolff ◽  
Adam J. Clark ◽  
Israel Jirak ◽  
Lindsay R. Blank ◽  
...  

AbstractVerification methods for convection-allowing models (CAMs) should consider the finescale spatial and temporal detail provided by CAMs, and including both neighborhood and object-based methods can account for displaced features that may still provide useful information. This work explores both contingency table–based verification techniques and object-based verification techniques as they relate to forecasts of severe convection. Two key fields in severe weather forecasting are investigated: updraft helicity (UH) and simulated composite reflectivity. UH is used to generate severe weather probabilities called surrogate severe fields, which have two tunable parameters: the UH threshold and the smoothing level. Probabilities computed using the UH threshold and smoothing level that give the best area under the receiver operating curve result in very high probabilities, while optimizing the parameters based on the Brier score reliability component results in much lower probabilities. Subjective ratings from participants in the 2018 NOAA Hazardous Weather Testbed Spring Forecasting Experiment (SFE) provide a complementary evaluation source. This work compares the verification methodologies in the context of three CAMs using the Finite-Volume Cubed-Sphere Dynamical Core (FV3), which will be the foundation of the U.S. Unified Forecast System (UFS). Three agencies ran FV3-based CAMs during the five-week 2018 SFE. These FV3-based CAMs are verified alongside a current operational CAM, the High-Resolution Rapid Refresh version 3 (HRRRv3). The HRRR is planned to eventually use the FV3 dynamical core as part of the UFS; as such evaluations relative to current HRRR configurations are imperative to maintaining high forecast quality and informing future implementation decisions.


2021 ◽  
Vol 4 ◽  
pp. 30-49
Author(s):  
A.Yu. Bundel ◽  
◽  
A.V. Muraviev ◽  
E.D. Olkhovaya ◽  
◽  
...  

State-of-the-art high-resolution NWP models simulate mesoscale systems with a high degree of detail, with large amplitudes and high gradients of fields of weather variables. Higher resolution leads to the spatial and temporal error growth and to a well-known double penalty problem. To solve this problem, the spatial verification methods have been developed over the last two decades, which ignore moderate errors (especially in the position), but can still evaluate the useful skill of a high-resolution model. The paper refers to the updated classification of spatial verification methods, briefly describes the main methods, and gives an overview of the international projects for intercomparison of the methods. Special attention is given to the application of the spatial approach to ensemble forecasting. Popular software packages are considered. The Russian translation is proposed for the relevant English terms. Keywords: high-resolution models, verification, double penalty, spatial methods, ensemble forecasting, object-based methods


2020 ◽  
Author(s):  
Marion Mittermaier ◽  
Rachel North ◽  
Christine Pequignet ◽  
Jan Maksymczuk

<div> <p>HiVE is a CMEMS funded collaboration between the atmospheric Numerical Weather Prediction (NWP) verification and the ocean community within the Met Office, aimed at demonstrating the use of spatial verification methods originally developed for the evaluation of high-resolution NWP forecasts, to CMEMS ocean model forecast products. Spatial verification methods provide more scale appropriate ways to better assess forecast characteristics and accuracy of km-scale forecasts, where the detail looks realistic but may not be in the right place at the right time. As a result, it can be the case that coarser resolution forecasts verify better (e.g. lower root-mean-square-error) than the higher resolution forecast. In this instance the smoothness of the coarser resolution forecast is rewarded, though the higher-resolution forecast may be better. The project utilised open source code library known as Model Evaluation Tools (MET) developed at the US National Center for Atmospheric Research (NCAR).</p> </div><div> <p> </p> </div><div> <p>This project saw, for the first time, the application of spatial verification methods to sub-10 km resolution ocean model forecasts. The project consisted of two parts. Part 1 is described in the companion poster to this one. Part 2 describes the skill of CMEMS products for forecasting events or features of interest such as algal blooms.  </p> </div><div> <p> </p> </div><div> <p>The Method for Object-based Diagnostic Evaluation (MODE) and the time dimension version MODE Time Domain (MTD) were applied to daily mean chlorophyll forecasts for the European North West Shelf from the FOAM-ERSEM model on the AMM7 grid. The forecasts are produced from a “cold start”, i.e. no data assimilation of biological variables. Here the entire 2019 algal bloom season was analysed to understand: intensity and spatial (area) biases; location and timing errors. Forecasts were compared to the CMEMS daily cloud free (L4) multi-sensor chlorophyll-<em>a</em> product. </p> </div><div> <p> </p> </div><div> <p>It has been found that there are large differences between forecast and observed concentrations of chlorophyll. This has meant that a quantile mapping approach for removing the bias was necessary before analysing the spatial properties of the forecast. Despite this the model still produces areas of chlorophyll which are too large compared to the observed. The model often produces areas of enhanced chlorophyll in approximately the right locations but the forecast and observed areas are rarely collocated and/or overlapping. Finally, the temporal analysis shows that the model struggled to get the onset of the season (being close to a month too late), but once the model picked up the signal there was better correspondence between the observed and forecast chlorophyll peaks for the remainder of the season. There was very little variation in forecast performance with lead time, suggesting that chlorophyll is a very slowly varying quantity.  </p> </div><div> <p> </p> </div><div> <p>Comparing an analysis which included the assimilation of observed chlorophyll shows that it is much closer to the observed L4 product than the non-biological assimilative analysis. It must be concluded that if the forecast were started from a DA analysis that included chlorophyll, it would lead to forecasts with less bias, and possibly a better detection of the onset of the bloom.  </p> </div><div> <p> </p> </div>


2013 ◽  
Vol 28 (1) ◽  
pp. 119-138 ◽  
Author(s):  
Derek R. Stratman ◽  
Michael C. Coniglio ◽  
Steven E. Koch ◽  
Ming Xue

Abstract This study uses both traditional and newer verification methods to evaluate two 4-km grid-spacing Weather Research and Forecasting Model (WRF) forecasts: a “cold start” forecast that uses the 12-km North American Mesoscale Model (NAM) analysis and forecast cycle to derive the initial and boundary conditions (C0) and a “hot start” forecast that adds radar data into the initial conditions using a three-dimensional variational data assimilation (3DVAR)/cloud analysis technique (CN). These forecasts were evaluated as part of 2009 and 2010 NOAA Hazardous Weather Test Bed (HWT) Spring Forecasting Experiments. The Spring Forecasting Experiment participants noted that the skill of CN’s explicit forecasts of convection estimated by some traditional objective metrics often seemed large compared to the subjectively determined skill. The Gilbert skill score (GSS) reveals CN scores higher than C0 at lower thresholds likely due to CN having higher-frequency biases than C0, but the difference is negligible at higher thresholds, where CN’s and C0’s frequency biases are similar. This suggests that if traditional skill scores are used to quantify convective forecasts, then higher (>35 dBZ) reflectivity thresholds should be used to be consistent with expert’s subjective assessments of the lack of forecast skill for individual convective cells. The spatial verification methods show that both CN and C0 generally have little to no skill at scales <8–12Δx starting at forecast hour 1, but CN has more skill at larger spatial scales (40–320 km) than C0 for the majority of the forecasting period. This indicates that the hot start provides little to no benefit for forecasts of convective cells, but that it has some benefit for larger mesoscale precipitation systems.


Author(s):  
Pierre-Loïc Garoche

The verification of control system software is critical to a host of technologies and industries, from aeronautics and medical technology to the cars we drive. The failure of controller software can cost people their lives. This book provides control engineers and computer scientists with an introduction to the formal techniques for analyzing and verifying this important class of software. Too often, control engineers are unaware of the issues surrounding the verification of software, while computer scientists tend to be unfamiliar with the specificities of controller software. The book provides a unified approach that is geared to graduate students in both fields, covering formal verification methods as well as the design and verification of controllers. It presents a wealth of new verification techniques for performing exhaustive analysis of controller software. These include new means to compute nonlinear invariants, the use of convex optimization tools, and methods for dealing with numerical imprecisions such as floating point computations occurring in the analyzed software. As the autonomy of critical systems continues to increase—as evidenced by autonomous cars, drones, and satellites and landers—the numerical functions in these systems are growing ever more advanced. The techniques presented here are essential to support the formal analysis of the controller software being used in these new and emerging technologies.


2010 ◽  
Vol 25 (1) ◽  
pp. 343-354 ◽  
Author(s):  
Marion Mittermaier ◽  
Nigel Roberts

Abstract The fractions skill score (FSS) was one of the measures that formed part of the Intercomparison of Spatial Forecast Verification Methods project. The FSS was used to assess a common dataset that consisted of real and perturbed Weather Research and Forecasting (WRF) model precipitation forecasts, as well as geometric cases. These datasets are all based on the NCEP 240 grid, which translates to approximately 4-km resolution over the contiguous United States. The geometric cases showed that the FSS can provide a truthful assessment of displacement errors and forecast skill. In addition, the FSS can be used to determine the scale at which an acceptable level of skill is reached and this usage is perhaps more helpful than interpreting the actual FSS value. This spatial-scale approach is becoming more popular for monitoring operational forecast performance. The study also shows how the FSS responds to forecast bias. A more biased forecast always gives lower FSS values at large scales and usually at smaller scales. It is possible, however, for a more biased forecast to give a higher score at smaller scales, when additional rain overlaps the observed rain. However, given a sufficiently large sample of forecasts, a more biased forecast system will score lower. The use of percentile thresholds can remove the impacts of the bias. When the proportion of the domain that is “wet” (the wet-area ratio) is small, subtle differences introduced through near-threshold misses can lead to large changes in FSS magnitude in individual cases (primarily because the bias is changed). Reliable statistics for small wet-area ratios require a larger sample of forecasts. Care needs to be taken in the choice of verification domain. For high-resolution models, the domain should be large enough to encompass the length scale of the typical mesoscale forcing (e.g., upper-level troughs or squall lines). If the domain is too large, the wet-area ratios will always be small. If the domain is too small, fluctuations in the wet-area ratio can be large and larger spatial errors may be missed. The FSS is a good measure of the spatial accuracy of precipitation forecasts. Different methods are needed to determine other patterns of behavior.


Author(s):  
Jeffrey D. Duda ◽  
David D. Turner

AbstractThe Method of Object-based Diagnostic Evaluation (MODE) is used to perform an object-based verification of approximately 1400 forecasts of composite reflectivity from the operational HRRR from April – September 2019. In this study, MODE is configured to prioritize deep, moist convective storm cells typical of those that produce severe weather across the central and eastern US during the warm season. In particular, attributes related to distance and size are given the greatest attribute weights for computing interest in MODE.HRRR tends to over-forecast all objects, but substantially over-forecasts both small objects at low reflectivity thresholds and large objects at high reflectivity thresholds. HRRR tends to either under-forecast objects in the southern and central Plains or has a correct frequency bias there, whereas it over-forecasts objects across the southern and eastern US. Attribute comparisons reveal the inability of the HRRR to fully resolve convective scale features and the impact of data assimilation and loss of skill during the initial hours of the forecasts.Scalar metrics are defined and computed based on MODE output, chiefly relying on the interest value. The object-based threat score (OTS), in particular, reveals similar performance of HRRR forecasts as does the Heidke Skill Score, but with differing magnitudes, suggesting value in adopting an object-based approach to forecast verification. The typical distance between centroids of objects is also analyzed and shows gradual degradation with increasing forecast length.


Author(s):  
Ali Fawzi Najm Al-Shammari ◽  
Adolfo Villafiorita

A large amount of research has been conducted to improve public verifiability of e-voting systems. One of the challenges is ensuring that different and apparently contradicting requirements are met: anonymity and representation, vote secrecy and verifiability. System robustness from attacks adds further complexity. This chapter summarizes some of the known vote verification techniques and highlights the pros and cons of each technique. Also, it reviews how different verification technologies cover different phases of the voting process and evaluates how these techniques satisfy the e-voting requirements.


2009 ◽  
Vol 24 (6) ◽  
pp. 1498-1510 ◽  
Author(s):  
Elizabeth E. Ebert

Abstract High-resolution forecasts may be quite useful even when they do not match the observations exactly. Neighborhood verification is a strategy for evaluating the “closeness” of the forecast to the observations within space–time neighborhoods rather than at the grid scale. Various properties of the forecast within a neighborhood can be assessed for similarity to the observations, including the mean value, fractional coverage, occurrence of a forecast event sufficiently near an observed event, and so on. By varying the sizes of the neighborhoods, it is possible to determine the scales for which the forecast has sufficient skill for a particular application. Several neighborhood verification methods have been proposed in the literature in the last decade. This paper examines four such methods in detail for idealized and real high-resolution precipitation forecasts, highlighting what can be learned from each of the methods. When applied to idealized and real precipitation forecasts from the Spatial Verification Methods Intercomparison Project, all four methods showed improved forecast performance for neighborhood sizes larger than grid scale, with the optimal scale for each method varying as a function of rainfall intensity.


Sign in / Sign up

Export Citation Format

Share Document