scholarly journals Intercomparison of Spatial Forecast Verification Methods: Identifying Skillful Spatial Scales Using the Fractions Skill Score

2010 ◽  
Vol 25 (1) ◽  
pp. 343-354 ◽  
Author(s):  
Marion Mittermaier ◽  
Nigel Roberts

Abstract The fractions skill score (FSS) was one of the measures that formed part of the Intercomparison of Spatial Forecast Verification Methods project. The FSS was used to assess a common dataset that consisted of real and perturbed Weather Research and Forecasting (WRF) model precipitation forecasts, as well as geometric cases. These datasets are all based on the NCEP 240 grid, which translates to approximately 4-km resolution over the contiguous United States. The geometric cases showed that the FSS can provide a truthful assessment of displacement errors and forecast skill. In addition, the FSS can be used to determine the scale at which an acceptable level of skill is reached and this usage is perhaps more helpful than interpreting the actual FSS value. This spatial-scale approach is becoming more popular for monitoring operational forecast performance. The study also shows how the FSS responds to forecast bias. A more biased forecast always gives lower FSS values at large scales and usually at smaller scales. It is possible, however, for a more biased forecast to give a higher score at smaller scales, when additional rain overlaps the observed rain. However, given a sufficiently large sample of forecasts, a more biased forecast system will score lower. The use of percentile thresholds can remove the impacts of the bias. When the proportion of the domain that is “wet” (the wet-area ratio) is small, subtle differences introduced through near-threshold misses can lead to large changes in FSS magnitude in individual cases (primarily because the bias is changed). Reliable statistics for small wet-area ratios require a larger sample of forecasts. Care needs to be taken in the choice of verification domain. For high-resolution models, the domain should be large enough to encompass the length scale of the typical mesoscale forcing (e.g., upper-level troughs or squall lines). If the domain is too large, the wet-area ratios will always be small. If the domain is too small, fluctuations in the wet-area ratio can be large and larger spatial errors may be missed. The FSS is a good measure of the spatial accuracy of precipitation forecasts. Different methods are needed to determine other patterns of behavior.

2008 ◽  
Vol 136 (5) ◽  
pp. 1747-1757 ◽  
Author(s):  
Eric Gilleland ◽  
Thomas C. M. Lee ◽  
John Halley Gotway ◽  
R. G. Bullock ◽  
Barbara G. Brown

Abstract An important focus of research in the forecast verification community is the development of alternative verification approaches for quantitative precipitation forecasts, as well as for other spatial forecasts. The need for information that is meaningful in an operational context and the importance of capturing the specific sources of forecast error at varying spatial scales are two primary motivating factors. In this paper, features of precipitation as identified by a convolution threshold technique are merged within fields and matched across fields in an automatic and computationally efficient manner using Baddeley’s metric for binary images. The method is carried out on 100 test cases, and 4 representative cases are shown in detail. Results of merging and matching objects are generally positive in that they are consistent with how a subjective observer might merge and match features. The results further suggest that the Baddeley metric may be useful as a computationally efficient summary metric giving information about location, shape, and size differences of individual features, which could be employed for other spatial forecast verification methods.


2010 ◽  
Vol 138 (9) ◽  
pp. 3418-3433 ◽  
Author(s):  
Tanja Weusthoff ◽  
Felix Ament ◽  
Marco Arpagaus ◽  
Mathias W. Rotach

Abstract High-resolution numerical weather prediction (NWP) models produce more detailed precipitation structures but the real benefit is probably the more realistic statistics gained with the higher resolution and not the information on the specific grid point. By evaluating three model pairs, each consisting of a high-resolution NWP system resolving convection explicitly and its low-resolution-driving model with parameterized convection, on different spatial scales and for different thresholds, this paper addresses the question of whether high-resolution models really perform better than their driving lower-resolution counterparts. The model pairs are evaluated by means of two fuzzy verification methods—upscaling (UP) and fractions skill score (FSS)—for the 6 months of the D-PHASE Operations Period and in a highly complex terrain. Observations are provided by the Swiss radar composite and the evaluation is restricted to the area covered by the Swiss radar stations. The high-resolution models outperform or equal the performance of their respective lower-resolution driving models. The differences between the models are significant and robust against small changes in the verification settings. An evaluation based on individual months shows that high-resolution models give better results, particularly with regard to convective, more localized precipitation events.


2010 ◽  
Vol 25 (1) ◽  
pp. 113-143 ◽  
Author(s):  
B. Casati

Abstract The intensity-scale verification technique introduced in 2004 by Casati, Ross, and Stephenson is revisited and improved. Recalibration is no longer performed, and the intensity-scale skill score for biased forecasts is evaluated. Energy and its percentages are introduced in order to assess the bias on different scales and to characterize the overall scale structure of the precipitation fields. Aggregation of the intensity-scale statistics for multiple cases is performed, and confidence intervals are provided by bootstrapping. Four different approaches for addressing the dyadic domain constraints are illustrated and critically compared. The intensity-scale verification is applied to the case studies of the Intercomparison of Spatial Forecast Verification Methods Project. The geometric and synthetically perturbed cases show that the intensity-scale verification statistics are sensitive to displacement and bias errors. The intensity-scale skill score assesses the skill for different precipitation intensities and on different spatial scales, separately. The spatial scales of the error are attributed to both the size of the features and their displacement. The energy percentages allow one to objectively analyze the scale structure of the fields and to understand the intensity-scale relationship. Aggregated statistics for the Storm Prediction Center/National Severe Storms Laboratory (SPC/NSSL) 2005 Spring Program case studies show no significant differences among the models’ skill; however, the 4-km simulations of the NCEP version of the Weather Research and Forecast model (WRF4 NCEP) overforecast to a greater extent than the 2- and 4-km simulations of the NCAR version of the WRF (WRF2 and WRF4 NCAR). For the aggregated multiple cases, the different approaches addressing the dyadic domain constraints lead to similar results. On the other hand, for a single case, tiling provides the most robust and reliable approach, since it smoothes the effects of the discrete wavelet support and does not alter the original precipitation fields.


2017 ◽  
Vol 32 (2) ◽  
pp. 733-741 ◽  
Author(s):  
Craig S. Schwartz

Abstract As high-resolution numerical weather prediction models are now commonplace, “neighborhood” verification metrics are regularly employed to evaluate forecast quality. These neighborhood approaches relax the requirement that perfect forecasts must match observations at the grid scale, contrasting traditional point-by-point verification methods. One recently proposed metric, the neighborhood equitable threat score, is calculated from 2 × 2 contingency tables that are populated within a neighborhood framework. However, the literature suggests three subtly different methods of populating neighborhood-based contingency tables. Thus, this work compares and contrasts these three variants and shows they yield statistically significantly different conclusions regarding forecast performance, illustrating that neighborhood-based contingency tables should be constructed carefully and transparently. Furthermore, this paper shows how two of the methods use inconsistent event definitions and suggests a “neighborhood maximum” approach be used to fill neighborhood-based contingency tables.


2007 ◽  
Vol 135 (9) ◽  
pp. 3052-3069 ◽  
Author(s):  
B. Casati ◽  
L. J. Wilson

Abstract A new scale decomposition of the Brier score for the verification of probabilistic forecasts defined on a spatial domain is introduced. The technique is illustrated on the Canadian Meteorological Centre (CMC) lightning probability forecasts. Probability forecasts of lightning occurrence in 3-h time windows and 24-km spatial resolution are verified against lightning observations from the North American Lightning Detection Network (NALDN) on a domain encompassing Canada and the northern United States. Verification is performed for lightning occurrences exceeding two different thresholds, to assess the forecast performance both for modest and intense lightning activity. Observation and forecast fields are decomposed into the sum of components on different spatial scales by performing a discrete 2D Haar wavelet decomposition. Wavelets, rather than Fourier transforms, were chosen because they are locally defined, and therefore more suitable for representing discontinuous spatial fields characterized by the presence of a few sparse nonzero values, such as lightning. Verification at different spatial scales is performed by evaluating Brier score and Brier skill score for each spatial-scale component. Reliability and resolution are also evaluated on different scales. Moreover, the bias on different scales is assessed, along with the ability of the forecasts to reproduce the observed-scale structure.


2013 ◽  
Vol 28 (1) ◽  
pp. 119-138 ◽  
Author(s):  
Derek R. Stratman ◽  
Michael C. Coniglio ◽  
Steven E. Koch ◽  
Ming Xue

Abstract This study uses both traditional and newer verification methods to evaluate two 4-km grid-spacing Weather Research and Forecasting Model (WRF) forecasts: a “cold start” forecast that uses the 12-km North American Mesoscale Model (NAM) analysis and forecast cycle to derive the initial and boundary conditions (C0) and a “hot start” forecast that adds radar data into the initial conditions using a three-dimensional variational data assimilation (3DVAR)/cloud analysis technique (CN). These forecasts were evaluated as part of 2009 and 2010 NOAA Hazardous Weather Test Bed (HWT) Spring Forecasting Experiments. The Spring Forecasting Experiment participants noted that the skill of CN’s explicit forecasts of convection estimated by some traditional objective metrics often seemed large compared to the subjectively determined skill. The Gilbert skill score (GSS) reveals CN scores higher than C0 at lower thresholds likely due to CN having higher-frequency biases than C0, but the difference is negligible at higher thresholds, where CN’s and C0’s frequency biases are similar. This suggests that if traditional skill scores are used to quantify convective forecasts, then higher (>35 dBZ) reflectivity thresholds should be used to be consistent with expert’s subjective assessments of the lack of forecast skill for individual convective cells. The spatial verification methods show that both CN and C0 generally have little to no skill at scales <8–12Δx starting at forecast hour 1, but CN has more skill at larger spatial scales (40–320 km) than C0 for the majority of the forecasting period. This indicates that the hot start provides little to no benefit for forecasts of convective cells, but that it has some benefit for larger mesoscale precipitation systems.


2012 ◽  
Vol 69 (11) ◽  
pp. 3350-3371 ◽  
Author(s):  
Christopher Melhauser ◽  
Fuqing Zhang

Abstract This study explores both the practical and intrinsic predictability of severe convective weather at the mesoscales using convection-permitting ensemble simulations of a squall line and bow echo event during the Bow Echo and Mesoscale Convective Vortex (MCV) Experiment (BAMEX) on 9–10 June 2003. Although most ensemble members—initialized with realistic initial condition uncertainties smaller than the NCEP Global Forecast System Final Analysis (GFS FNL) using an ensemble Kalman filter—forecast broad areas of severe convection, there is a large variability of forecast performance among different members, highlighting the limit of practical predictability. In general, the best-performing members tend to have a stronger upper-level trough and associated surface low, producing a more conducive environment for strong long-lived squall lines and bow echoes, once triggered. The divergence in development is a combination of a dislocation of the upper-level trough, surface low with corresponding marginal environmental differences between developing and nondeveloping members, and cold pool evolution by deep convection prior to squall line formation. To further explore the intrinsic predictability of the storm, a sequence of sensitivity experiments was performed with the initial condition differences decreased to nearly an order of magnitude smaller than typical analysis and observation errors. The ensemble forecast and additional sensitivity experiments demonstrate that this storm has a limited practical predictability, which may be further improved with more accurate initial conditions. However, it is possible that the true storm could be near the point of bifurcation, where predictability is intrinsically limited. The limits of both practical and intrinsic predictability highlight the need for probabilistic and ensemble forecasts for severe weather prediction.


Atmosphere ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 1141
Author(s):  
Steven Greco ◽  
George D. Emmitt ◽  
Alice DuVivier ◽  
Keith Hines ◽  
Michael Kavaya

During October–November 2014 and May 2015, NASA sponsored and conducted a pair of airborne campaigns called Polar Winds to investigate atmospheric circulations, particularly in the boundary layer, over the Arctic using NASA’s Doppler Aerosol WiNd (DAWN) lidar. A description of the campaigns, the DAWN instrument, wind retrieval methods and data processing is provided. During the campaigns, the DAWN instrument faced backscatter sensitivity issues in the low aerosol conditions that were fairly frequent in the 2–6 km altitude range. However, when DAWN was able to make measurements, comparisons with dropsondes show good agreement and very low bias and supports the use of an airborne Doppler wind lidar such as DAWN that can provide profiles with high velocity precision, ~65 m vertical resolution and horizontal spacing as fine as 3–7 km. Case study analyses of a Greenland tip jet, barrier winds and an upper level jet are presented and show how, despite sensitivity issues, DAWN data can be confidently used in diagnostic studies of dynamic features in the Arctic. Comparisons with both an operational and research Weather Research and Forecasting (WRF) model for these events also show the potential for utilization in model validation. The sensitivity issues of the DAWN laser have since been corrected.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Lin Liu ◽  
Chunze Lin ◽  
Yongqing Bai ◽  
Dengxin He

Microphysics parameterization becomes increasingly important as the model grid spacing increases toward convection-resolving scales. Using observations from a field campaign for Mei-Yu rainfall in China, four bulk cloud microphysics schemes in the Weather Research and Forecasting (WRF) model were evaluated with respect to their ability to simulate precipitation, structure, and cloud microphysical properties over convective and stratiform regimes. These are the Thompson (THOM), Morrison graupel/hail (MOR_G/H), Stony Brook University (SBU_YLIN), and WRF double-moment six-class microphysics graupel/hail (WDM6_G/H). All schemes were able to predict the rain band but underestimated the total precipitation by 23%–35%. This is mainly attributed to the underestimation of stratiform precipitation and overestimation of convective rain. For the vertical distribution of radar reflectivity, many problems remain, such as lower reflectivity values aloft in both convective and stratiform regions and higher reflectivity values at middle level. Each bulk scheme has its advantages and shortcomings for different cloud regimes. Overall, the discrepancies between model output and observations mostly exist in the midlevel to upper level, which results from the inability of the model to accurately represent the particle size distribution, ice processes, and storm dynamics. Further observations from major field campaigns and more detailed evaluation are still necessary.


2019 ◽  
Vol 147 (3) ◽  
pp. 971-985 ◽  
Author(s):  
Sang-Hun Park ◽  
Joseph B. Klemp ◽  
Jung-Hoon Kim

Abstract Although a terrain-following vertical coordinate is well suited for the application of surface boundary conditions, it is well known that the influences of the terrain on the coordinate surfaces can contribute to increase numerical errors, particularly over steep topography. To reduce these errors, a hybrid sigma–pressure coordinate is formulated in the Weather Research and Forecasting (WRF) Model, and its effects are illustrated for both an idealized test case and a real-data forecast for upper-level turbulence. The idealized test case confirms that with the basic sigma coordinate, significant upper-level disturbances can be produced due to numerical errors that arise as the advection of strong horizontal flow is computed along coordinate surfaces that are perturbed by smaller-scale terrain influences. With the hybrid coordinate, this artificial noise is largely eliminated as the mid- and upper-level coordinate surfaces correspond much more closely to constant pressure surfaces. In real-data simulations for upper-level turbulence forecasting, the WRF Model using the basic sigma coordinate tends to overpredict the strength of upper-air turbulence over mountainous regions because of numerical errors arising as a strong upper-level jet is advected along irregular coordinate surfaces. With the hybrid coordinate, these errors are reduced, resulting in an improved forecast of upper-level turbulence. Analysis of kinetic energy spectra for these simulations confirms that artificial amplitudes in the smaller scales at upper levels that arise with the basic sigma coordinate are effectively removed when the hybrid coordinate is used.


Sign in / Sign up

Export Citation Format

Share Document