Use of Multiple Verification Methods to Evaluate Forecasts of Convection from Hot- and Cold-Start Convection-Allowing Models

Abstract This study uses both traditional and newer verification methods to evaluate two 4-km grid-spacing Weather Research and Forecasting Model (WRF) forecasts: a “cold start” forecast that uses the 12-km North American Mesoscale Model (NAM) analysis and forecast cycle to derive the initial and boundary conditions (C0) and a “hot start” forecast that adds radar data into the initial conditions using a three-dimensional variational data assimilation (3DVAR)/cloud analysis technique (CN). These forecasts were evaluated as part of 2009 and 2010 NOAA Hazardous Weather Test Bed (HWT) Spring Forecasting Experiments. The Spring Forecasting Experiment participants noted that the skill of CN’s explicit forecasts of convection estimated by some traditional objective metrics often seemed large compared to the subjectively determined skill. The Gilbert skill score (GSS) reveals CN scores higher than C0 at lower thresholds likely due to CN having higher-frequency biases than C0, but the difference is negligible at higher thresholds, where CN’s and C0’s frequency biases are similar. This suggests that if traditional skill scores are used to quantify convective forecasts, then higher (>35 dBZ) reflectivity thresholds should be used to be consistent with expert’s subjective assessments of the lack of forecast skill for individual convective cells. The spatial verification methods show that both CN and C0 generally have little to no skill at scales <8–12Δx starting at forecast hour 1, but CN has more skill at larger spatial scales (40–320 km) than C0 for the majority of the forecasting period. This indicates that the hot start provides little to no benefit for forecasts of convective cells, but that it has some benefit for larger mesoscale precipitation systems.

Download Full-text

Intercomparison of Spatial Forecast Verification Methods: Identifying Skillful Spatial Scales Using the Fractions Skill Score

Weather and Forecasting ◽

10.1175/2009waf2222260.1 ◽

2010 ◽

Vol 25 (1) ◽

pp. 343-354 ◽

Cited By ~ 100

Author(s):

Marion Mittermaier ◽

Nigel Roberts

Keyword(s):

Spatial Scales ◽

Wrf Model ◽

Skill Score ◽

Area Ratio ◽

Forecast Verification ◽

Care Needs ◽

Forecast Performance ◽

Verification Methods ◽

Upper Level ◽

Formed Part

Abstract The fractions skill score (FSS) was one of the measures that formed part of the Intercomparison of Spatial Forecast Verification Methods project. The FSS was used to assess a common dataset that consisted of real and perturbed Weather Research and Forecasting (WRF) model precipitation forecasts, as well as geometric cases. These datasets are all based on the NCEP 240 grid, which translates to approximately 4-km resolution over the contiguous United States. The geometric cases showed that the FSS can provide a truthful assessment of displacement errors and forecast skill. In addition, the FSS can be used to determine the scale at which an acceptable level of skill is reached and this usage is perhaps more helpful than interpreting the actual FSS value. This spatial-scale approach is becoming more popular for monitoring operational forecast performance. The study also shows how the FSS responds to forecast bias. A more biased forecast always gives lower FSS values at large scales and usually at smaller scales. It is possible, however, for a more biased forecast to give a higher score at smaller scales, when additional rain overlaps the observed rain. However, given a sufficiently large sample of forecasts, a more biased forecast system will score lower. The use of percentile thresholds can remove the impacts of the bias. When the proportion of the domain that is “wet” (the wet-area ratio) is small, subtle differences introduced through near-threshold misses can lead to large changes in FSS magnitude in individual cases (primarily because the bias is changed). Reliable statistics for small wet-area ratios require a larger sample of forecasts. Care needs to be taken in the choice of verification domain. For high-resolution models, the domain should be large enough to encompass the length scale of the typical mesoscale forcing (e.g., upper-level troughs or squall lines). If the domain is too large, the wet-area ratios will always be small. If the domain is too small, fluctuations in the wet-area ratio can be large and larger spatial errors may be missed. The FSS is a good measure of the spatial accuracy of precipitation forecasts. Different methods are needed to determine other patterns of behavior.

Download Full-text

Assessing the Benefits of Convection-Permitting Models by Neighborhood Verification: Examples from MAP D-PHASE

Monthly Weather Review ◽

10.1175/2010mwr3380.1 ◽

2010 ◽

Vol 138 (9) ◽

pp. 3418-3433 ◽

Cited By ~ 86

Author(s):

Tanja Weusthoff ◽

Felix Ament ◽

Marco Arpagaus ◽

Mathias W. Rotach

Keyword(s):

High Resolution ◽

Spatial Scales ◽

Weather Prediction ◽

Grid Point ◽

Skill Score ◽

Verification Methods ◽

Driving Model ◽

High Resolution Models ◽

Precipitation Events ◽

Better Than

Abstract High-resolution numerical weather prediction (NWP) models produce more detailed precipitation structures but the real benefit is probably the more realistic statistics gained with the higher resolution and not the information on the specific grid point. By evaluating three model pairs, each consisting of a high-resolution NWP system resolving convection explicitly and its low-resolution-driving model with parameterized convection, on different spatial scales and for different thresholds, this paper addresses the question of whether high-resolution models really perform better than their driving lower-resolution counterparts. The model pairs are evaluated by means of two fuzzy verification methods—upscaling (UP) and fractions skill score (FSS)—for the 6 months of the D-PHASE Operations Period and in a highly complex terrain. Observations are provided by the Swiss radar composite and the evaluation is restricted to the area covered by the Swiss radar stations. The high-resolution models outperform or equal the performance of their respective lower-resolution driving models. The differences between the models are significant and robust against small changes in the verification settings. An evaluation based on individual months shows that high-resolution models give better results, particularly with regard to convective, more localized precipitation events.

Download Full-text

Beyond the Basics: Evaluating Model-Based Precipitation Forecasts Using Traditional, Spatial, and Object-Based Methods

Weather and Forecasting ◽

10.1175/waf-d-13-00135.1 ◽

2014 ◽

Vol 29 (6) ◽

pp. 1451-1472 ◽

Cited By ~ 48

Author(s):

Jamie K. Wolff ◽

Michelle Harrold ◽

Tressa Fowler ◽

John Halley Gotway ◽

Louisa Nance ◽

...

Keyword(s):

Mesoscale Model ◽

Skill Score ◽

Careful Consideration ◽

Diagnostic Information ◽

Coverage Area ◽

Object Based ◽

Grid Approach ◽

Verification Methods ◽

Verification Techniques ◽

Spatial Verification

Abstract While traditional verification methods are commonly used to assess numerical model quantitative precipitation forecasts (QPFs) using a grid-to-grid approach, they generally offer little diagnostic information or reasoning behind the computed statistic. On the other hand, advanced spatial verification techniques, such as neighborhood and object-based methods, can provide more meaningful insight into differences between forecast and observed features in terms of skill with spatial scale, coverage area, displacement, orientation, and intensity. To demonstrate the utility of applying advanced verification techniques to mid- and coarse-resolution models, the Developmental Testbed Center (DTC) applied several traditional metrics and spatial verification techniques to QPFs provided by the Global Forecast System (GFS) and operational North American Mesoscale Model (NAM). Along with frequency bias and Gilbert skill score (GSS) adjusted for bias, both the fractions skill score (FSS) and Method for Object-Based Diagnostic Evaluation (MODE) were utilized for this study with careful consideration given to how these methods were applied and how the results were interpreted. By illustrating the types of forecast attributes appropriate to assess with the spatial verification techniques, this paper provides examples of how to obtain advanced diagnostic information to help identify what aspects of the forecast are or are not performing well.

Download Full-text

Multiconvective Parameterizations as a Multimodel Proxy for Seasonal Climate Studies

Journal of Climate ◽

10.1175/jcli3448.1 ◽

2005 ◽

Vol 18 (15) ◽

pp. 2963-2978 ◽

Cited By ~ 8

Author(s):

T. E. LaRow ◽

S. D. Cocke ◽

D. W. Shin

Keyword(s):

Initial Conditions ◽

Coupled Model ◽

Skill Score ◽

Model Ensemble ◽

Single Model ◽

Multimodel Ensemble ◽

Temperature And Precipitation ◽

Skill Scores ◽

Climate Studies ◽

Start Dates

Abstract A six-member multicoupled model ensemble is created by using six state-of-the-art deep atmospheric convective schemes. The six convective schemes are used inside a single model and make up the ensemble. This six-member ensemble is compared against a multianalysis ensemble, which is created by varying the initial start dates of the atmospheric component of the coupled model. Both ensembles were integrated for seven months (November–May) over a 12-yr period from 1987 to 1998. Examination of the sea surface temperature and precipitation show that while deterministic skill scores are slightly better for the multicoupled model ensemble the probabilistic skill scores favor the multimodel approach. Combining the two ensembles to create a larger ensemble size increases the probabilistic skill score compared to the multimodel. This altering physics approach to create a multimodel ensemble is seen as an easy way for small modeling centers to generate ensembles with better reliability than by only varying the initial conditions.

Download Full-text

New Developments of the Intensity-Scale Technique within the Spatial Verification Methods Intercomparison Project

Weather and Forecasting ◽

10.1175/2009waf2222257.1 ◽

2010 ◽

Vol 25 (1) ◽

pp. 113-143 ◽

Cited By ~ 35

Author(s):

B. Casati

Keyword(s):

Case Studies ◽

Single Case ◽

Spatial Scales ◽

Forecast Model ◽

Skill Score ◽

Scale Structure ◽

Discrete Wavelet ◽

Intensity Scale ◽

Verification Methods ◽

Domain Constraints

Abstract The intensity-scale verification technique introduced in 2004 by Casati, Ross, and Stephenson is revisited and improved. Recalibration is no longer performed, and the intensity-scale skill score for biased forecasts is evaluated. Energy and its percentages are introduced in order to assess the bias on different scales and to characterize the overall scale structure of the precipitation fields. Aggregation of the intensity-scale statistics for multiple cases is performed, and confidence intervals are provided by bootstrapping. Four different approaches for addressing the dyadic domain constraints are illustrated and critically compared. The intensity-scale verification is applied to the case studies of the Intercomparison of Spatial Forecast Verification Methods Project. The geometric and synthetically perturbed cases show that the intensity-scale verification statistics are sensitive to displacement and bias errors. The intensity-scale skill score assesses the skill for different precipitation intensities and on different spatial scales, separately. The spatial scales of the error are attributed to both the size of the features and their displacement. The energy percentages allow one to objectively analyze the scale structure of the fields and to understand the intensity-scale relationship. Aggregated statistics for the Storm Prediction Center/National Severe Storms Laboratory (SPC/NSSL) 2005 Spring Program case studies show no significant differences among the models’ skill; however, the 4-km simulations of the NCEP version of the Weather Research and Forecast model (WRF4 NCEP) overforecast to a greater extent than the 2- and 4-km simulations of the NCAR version of the WRF (WRF2 and WRF4 NCAR). For the aggregated multiple cases, the different approaches addressing the dyadic domain constraints lead to similar results. On the other hand, for a single case, tiling provides the most robust and reliable approach, since it smoothes the effects of the discrete wavelet support and does not alter the original precipitation fields.

Download Full-text

Lessons learned after two years of operational high-resolution (1,5 km) WRF simulations in Catalonia and a plan to increase their skill

10.5194/ems2021-236 ◽

2021 ◽

Author(s):

Jordi Mercader Carbó ◽

Manel Bravo Blanco ◽

Jordi Moré Pratdesaba ◽

Abdelmalik Sairouni Afif

Keyword(s):

Wind Speed ◽

Daily Precipitation ◽

Mesoscale Model ◽

Initial Conditions ◽

Skill Score ◽

Lessons Learned ◽

Grid Spacing ◽

Slight Improvement ◽

Model Topography ◽

Score Table

The WRF-ARW has been the flagship mesoscale model in the Meteorological Service of Catalonia (SMC) since 2012. Several operational runs are performed daily (initialised at 00 and 12 UTC), using both the GFS and the IFS model for initial and boundary conditions, to account for uncertainties in the synoptic evolution. To provide more accurate forecasts to end-users, a convection-allowing simulation with a grid spacing of 1,5 km was added to the operational chain, starting in the summer of 2019.&#160;However, the verification results show that the improvement over its mother domain (a 3 km simulation with parameterised convection) is irregular because it does not happen for all the variables. For instance, the 2 m temperature forecasts are more reliable for the highest resolution domain but the wind speed at 10 m has a comparable skill. Regarding the precipitation, there is a very slight improvement only for high daily precipitation rates (50 or 80 mm) during some seasons; nevertheless, the results are worse in forecasting the occurrence of precipitation (that is, when considering low daily precipitation quantities). The comparison of the verification results among different model configurations (with various resolutions and initial conditions) can be easily performed by using a skill score table. This table and its design will also be presented in this session.&#160;Certainly, these results help to conceive strategies to enhance the skill of the 1,5 km simulations for some of the variables that arise as more inaccurate. For instance, it is evaluated to what extent using alternative static fields (changing the model topography or the land category) improves the forecasts of temperature, humidity or wind near the surface. Furthermore, the sensitivity of precipitation forecasts to several physics schemes is tested, seeking an enhancement of their skill.&#160;

Download Full-text

A Comparative Verification of High-Resolution Precipitation Forecasts Using Model Output Statistics

Monthly Weather Review ◽

10.1175/mwr-d-16-0256.1 ◽

2017 ◽

Vol 145 (10) ◽

pp. 4037-4054 ◽

Cited By ~ 5

Author(s):

Emiel van der Plas ◽

Maurice Schmeits ◽

Nicolien Hooijman ◽

Kees Kok

Keyword(s):

High Resolution ◽

Weather Prediction ◽

Skill Score ◽

Model Output ◽

Probability Forecast ◽

Maximum Information ◽

Verification Methods ◽

Skill Scores ◽

Nwp Model ◽

Model Output Statistics

Verification of localized events such as precipitation has become even more challenging with the advent of high-resolution mesoscale numerical weather prediction (NWP). The realism of a forecast suggests that it should compare well against precipitation radar imagery with similar resolution, both spatially and temporally. Spatial verification methods solve some of the representativity issues that point verification gives rise to. In this paper, a verification strategy based on model output statistics (MOS) is applied that aims to address both double-penalty and resolution effects that are inherent to comparisons of NWP models with different resolutions. Using predictors based on spatial precipitation patterns around a set of stations, an extended logistic regression (ELR) equation is deduced, leading to a probability forecast distribution of precipitation for each NWP model, analysis, and lead time. The ELR equations are derived for predictands based on areal-calibrated radar precipitation and SYNOP observations. The aim is to extract maximum information from a series of precipitation forecasts, like a trained forecaster would. The method is applied to the nonhydrostatic model Harmonie-AROME (2.5-km resolution), HIRLAM (11-km resolution), and the ECMWF model (16-km resolution), overall yielding similar Brier skill scores for the three postprocessed models, but somewhat larger differences for individual lead times. In addition, the fractions skill score is computed using the three deterministic forecasts, showing slightly higher skill for the Harmonie-AROME model. In other words, despite the realism of Harmonie-AROME precipitation forecasts, they only perform similarly or somewhat better than precipitation forecasts from the two lower-resolution models, at least in the Netherlands.

Download Full-text

On the generation of internal waves by river plumes in subcritical initial conditions

Scientific Reports ◽

10.1038/s41598-021-81464-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

R. Mendes ◽

J. C. B. da Silva ◽

J. M. Magalhaes ◽

B. St-Denis ◽

D. Bourgault ◽

...

Keyword(s):

Numerical Modeling ◽

Internal Waves ◽

Coastal Waters ◽

Initial Conditions ◽

Spatial Scales ◽

River Plumes ◽

Wide Range ◽

Near Shore ◽

Generation Mechanisms ◽

Douro River

AbstractInternal waves (IWs) in the ocean span across a wide range of time and spatial scales and are now acknowledged as important sources of turbulence and mixing, with the largest observations having 200 m in amplitude and vertical velocities close to 0.5 m s−1. Their origin is mostly tidal, but an increasing number of non-tidal generation mechanisms have also been observed. For instance, river plumes provide horizontally propagating density fronts, which were observed to generate IWs when transitioning from supercritical to subcritical flow. In this study, satellite imagery and autonomous underwater measurements are combined with numerical modeling to investigate IW generation from an initial subcritical density front originating at the Douro River plume (western Iberian coast). These unprecedented results may have important implications in near-shore dynamics since that suggest that rivers of moderate flow may play an important role in IW generation between fresh riverine and coastal waters.

Download Full-text

Scalable co-optimization of morphology and control in embodied machines

Journal of The Royal Society Interface ◽

10.1098/rsif.2017.0937 ◽

2018 ◽

Vol 15 (143) ◽

pp. 20170937 ◽

Cited By ~ 10

Author(s):

Nick Cheney ◽

Josh Bongard ◽

Vytas SunSpiral ◽

Hod Lipson

Keyword(s):

Embodied Cognition ◽

Initial Conditions ◽

Control Policy ◽

Sensorimotor Control ◽

Body Plan ◽

The Body ◽

Test Bed ◽

Local Optima ◽

Close Coupling ◽

And Control

Evolution sculpts both the body plans and nervous systems of agents together over time. By contrast, in artificial intelligence and robotics, a robot's body plan is usually designed by hand, and control policies are then optimized for that fixed design. The task of simultaneously co-optimizing the morphology and controller of an embodied robot has remained a challenge. In psychology, the theory of embodied cognition posits that behaviour arises from a close coupling between body plan and sensorimotor control, which suggests why co-optimizing these two subsystems is so difficult: most evolutionary changes to morphology tend to adversely impact sensorimotor control, leading to an overall decrease in behavioural performance. Here, we further examine this hypothesis and demonstrate a technique for ‘morphological innovation protection’, which temporarily reduces selection pressure on recently morphologically changed individuals, thus enabling evolution some time to ‘readapt’ to the new morphology with subsequent control policy mutations. We show the potential for this method to avoid local optima and converge to similar highly fit morphologies across widely varying initial conditions, while sustaining fitness improvements further into optimization. While this technique is admittedly only the first of many steps that must be taken to achieve scalable optimization of embodied machines, we hope that theoretical insight into the cause of evolutionary stagnation in current methods will help to enable the automation of robot design and behavioural training—while simultaneously providing a test bed to investigate the theory of embodied cognition.

Download Full-text

Using Fractal Downscaling of Satellite Precipitation Products for Hydrometeorological Applications

Journal of Atmospheric and Oceanic Technology ◽

10.1175/2009jtecha1219.1 ◽

2010 ◽

Vol 27 (3) ◽

pp. 409-427 ◽

Cited By ~ 41

Author(s):

Kun Tao ◽

Ana P. Barros

Keyword(s):

Spatial Resolution ◽

Stage Iv ◽

Tropical Rainfall Measuring Mission ◽

Skill Score ◽

Probability Of Detection ◽

Grid Spacing ◽

Satellite Precipitation ◽

Target Field ◽

Skill Scores ◽

Central United States

Abstract The objective of spatial downscaling strategies is to increase the information content of coarse datasets at smaller scales. In the case of quantitative precipitation estimation (QPE) for hydrological applications, the goal is to close the scale gap between the spatial resolution of coarse datasets (e.g., gridded satellite precipitation products at resolution L × L) and the high resolution (l × l; L ≫ l) necessary to capture the spatial features that determine spatial variability of water flows and water stores in the landscape. In essence, the downscaling process consists of weaving subgrid-scale heterogeneity over a desired range of wavelengths in the original field. The defining question is, which properties, statistical and otherwise, of the target field (the known observable at the desired spatial resolution) should be matched, with the caveat that downscaling methods be as a general as possible and therefore ideally without case-specific constraints and/or calibration requirements? Here, the attention is focused on two simple fractal downscaling methods using iterated functions systems (IFS) and fractal Brownian surfaces (FBS) that meet this requirement. The two methods were applied to disaggregate spatially 27 summertime convective storms in the central United States during 2007 at three consecutive times (1800, 2100, and 0000 UTC, thus 81 fields overall) from the Tropical Rainfall Measuring Mission (TRMM) version 6 (V6) 3B42 precipitation product (∼25-km grid spacing) to the same resolution as the NCEP stage IV products (∼4-km grid spacing). Results from bilinear interpolation are used as the control. A fundamental distinction between IFS and FBS is that the latter implies a distribution of downscaled fields and thus an ensemble solution, whereas the former provides a single solution. The downscaling effectiveness is assessed using fractal measures (the spectral exponent β, fractal dimension D, Hurst coefficient H, and roughness amplitude R) and traditional operational scores statistics scores [false alarm rate (FR), probability of detection (PD), threat score (TS), and Heidke skill score (HSS)], as well as bias and the root-mean-square error (RMSE). The results show that both IFS and FBS fractal interpolation perform well with regard to operational skill scores, and they meet the additional requirement of generating structurally consistent fields. Furthermore, confidence intervals can be directly generated from the FBS ensemble. The results were used to diagnose errors relevant for hydrometeorological applications, in particular a spatial displacement with characteristic length of at least 50 km (2500 km2) in the location of peak rainfall intensities for the cases studied.

Download Full-text