scholarly journals Model Output Statistics (MOS) applied to CAMS O<sub>3</sub> forecasts: trade-offs between continuous and categorical skill scores

2021 ◽  
Author(s):  
Hervé Petetin ◽  
Dene Bowdalo ◽  
Pierre-Antoine Bretonnière ◽  
Marc Guevara ◽  
Oriol Jorba ◽  
...  

Abstract. Air quality (AQ) forecasting systems are usually built upon physics-based numerical models that are affected by a number of uncertainty sources. In order to reduce forecast errors, first and foremost the bias, they are often coupled with Model Output Statistics (MOS) modules. MOS methods are statistical techniques used to correct raw forecasts at surface monitoring station locations, where AQ observations are available. In this study, we investigate to what extent AQ forecasts can be improved using a variety of MOS methods, including persistence (PERS), moving average (MA), quantile mapping (QM), Kalman Filter (KF), analogs (AN), and gradient boosting machine (GBM). We apply our analysis to the Copernicus Atmospheric Monitoring Service (CAMS) regional ensemble median O3 forecasts over the Iberian Peninsula during 2018–2019. A key aspect of our study is the evaluation, which is performed using a very comprehensive set of continuous and categorical metrics at various time scales (hourly to daily), along different lead times (1 to 4 days), and using different meteorological input data (forecast vs reanalyzed). Our results show that O3 forecasts can be substantially improved using such MOS corrections and that this improvement goes much beyond the correction of the systematic bias. Although it typically affects all lead times, some MOS methods appear more adversely impacted by the lead time. When considering MOS methods relying on meteorological information and comparing the results obtained with IFS forecasts and ERA5 reanalysis, the relative deterioration brought by the use of IFS is minor, which paves the way for their use in operational MOS applications. Importantly, our results also clearly show the trade-offs between continuous and categorical skills and their dependencies on the MOS method. The most sophisticated MOS methods better reproduce O3 mixing ratios overall, with lowest errors and highest correlations. However, they are not necessarily the best in predicting the highest O3 episodes, for which simpler MOS methods can give better results. Although the complex impact of MOS methods on the distribution and variability of raw forecasts can only be comprehended through an extended set of complementary statistical metrics, our study shows that optimally implementing MOS in AQ forecast systems crucially requires selecting the appropriate skill score to be optimized for the forecast application of interest.

2017 ◽  
Vol 145 (10) ◽  
pp. 4037-4054 ◽  
Author(s):  
Emiel van der Plas ◽  
Maurice Schmeits ◽  
Nicolien Hooijman ◽  
Kees Kok

Verification of localized events such as precipitation has become even more challenging with the advent of high-resolution mesoscale numerical weather prediction (NWP). The realism of a forecast suggests that it should compare well against precipitation radar imagery with similar resolution, both spatially and temporally. Spatial verification methods solve some of the representativity issues that point verification gives rise to. In this paper, a verification strategy based on model output statistics (MOS) is applied that aims to address both double-penalty and resolution effects that are inherent to comparisons of NWP models with different resolutions. Using predictors based on spatial precipitation patterns around a set of stations, an extended logistic regression (ELR) equation is deduced, leading to a probability forecast distribution of precipitation for each NWP model, analysis, and lead time. The ELR equations are derived for predictands based on areal-calibrated radar precipitation and SYNOP observations. The aim is to extract maximum information from a series of precipitation forecasts, like a trained forecaster would. The method is applied to the nonhydrostatic model Harmonie-AROME (2.5-km resolution), HIRLAM (11-km resolution), and the ECMWF model (16-km resolution), overall yielding similar Brier skill scores for the three postprocessed models, but somewhat larger differences for individual lead times. In addition, the fractions skill score is computed using the three deterministic forecasts, showing slightly higher skill for the Harmonie-AROME model. In other words, despite the realism of Harmonie-AROME precipitation forecasts, they only perform similarly or somewhat better than precipitation forecasts from the two lower-resolution models, at least in the Netherlands.


2017 ◽  
Vol 14 ◽  
pp. 123-129 ◽  
Author(s):  
Alfonso Ferrone ◽  
Daniele Mastrangelo ◽  
Piero Malguzzi

Abstract. The 2 m-temperature anomalies from the reforecasts of the CNR-ISAC and ECMWF monthly prediction systems have been combined in a multimodel super-ensemble. Tercile probability predictions obtained from the multimodel have been constructed using direct model outputs (DMO) and model output statistics (MOS), like logistic and nonhomogeneous Gaussian regression, for the 1990–2010 winter seasons. Verification with ERA-Interim reanalyses indicates that logistic regression gives the best results in terms of ranked probability skill scores (RPSS) and reliability diagrams for low–medium forecast probabilities. Also, it is argued that the logistic regression would not yield further improvements if a larger dataset was used.


2007 ◽  
Vol 4 (1) ◽  
pp. 189-212 ◽  
Author(s):  
M. Tonani ◽  
N. Pinardi ◽  
C. Fratianni ◽  
S. Dobricic

Abstract. This paper describes a first comprehensive evaluation of the quality of the ten days ocean forecasts produced by the Mediterranean ocean Forecasting System (MFS). Once a week ten days forecasts are produced. The forecast starts on Tuesday at noon and the prediction is released on Wednesday morning with less then 24 hr delay. In this work we have considered 22 ten days forecasts produced from the 16 August 2005 to the 10 January 2006. All the statistical scores have been done for the Mediterranean basin and for 13 regions in which the Mediterranean sea has been subdivided. The forecast evaluation is given here in terms of root mean square (rms) values. The main skill score is computed as the root mean square of the difference between forecast and analysis (FA) and forecast and persistence (FP), where the persistence is defined as the average of the day of the analysis corresponding to the first day of the forecast. A second skill score (SSP) is defined as the ratio between rms of FA and FP, giving the percentage of accuracy of the forecast with respect to the persistence (Murphy 1993). The rms of FA is always better than FP and the FP rms error is double than the rms of FA. It is found that in the surface layers the error growth is controlled mainly by the atmospheric forcing inaccuracies while at depth the forecast errors could be due to adjustments of the data assimilation scheme to the data insertion procedure. The predictability limit for our ocean forecast seems to be 5–6 days connected to atmospheric forcing inaccuracies and to the data availability for assimilation.


2020 ◽  
Vol 148 (2) ◽  
pp. 499-521 ◽  
Author(s):  
Rochelle P. Worsnop ◽  
Michael Scheuerer ◽  
Thomas M. Hamill

Abstract Probabilistic fire-weather forecasts provide pertinent information to assess fire behavior and danger of current or potential fires. Operational fire-weather guidance is provided for lead times fewer than seven days, with most products only providing day 1–3 outlooks. Extended-range forecasts can aid in decisions regarding placement of in- and out-of-state resources, prescribed burns, and overall preparedness levels. We demonstrate how ensemble model output statistics and ensemble copula coupling (ECC) postprocessing methods can be used to provide locally calibrated and spatially coherent probabilistic forecasts of the hot–dry–windy index (and its components). The univariate postprocessing fits the truncated normal distribution to data transformed with a flexible selection of power exponents. Forecast scenarios are generated via the ECC-Q variation, which maintains their spatial and temporal coherence by reordering samples from the univariate distributions according to ranks of the raw ensemble. A total of 20 years of ECMWF reforecasts and ERA-Interim reanalysis data over the continental United States are used. Skill of the forecasts is quantified with the continuous ranked probability score using benchmarks of raw and climatological forecasts. Results show postprocessing is beneficial during all seasons over CONUS out to two weeks. Forecast skill relative to climatological forecasts depends on the atmospheric variable, season, location, and lead time, where winter (summer) generally provides the most (least) skill at the longest lead times. Additional improvements of forecast skill can be achieved by aggregating forecast days. Illustrations of these postprocessed forecasts are explored for a past fire event.


Author(s):  
Chanh Kieu ◽  
Cole Evans ◽  
Yi Jin ◽  
James D. Doyle ◽  
Hao Jin ◽  
...  

AbstractThis study examines the dependence of tropical cyclone (TC) intensity forecast errors on track forecast errors in the Coupled Ocean/Atmosphere Mesoscale Prediction System for Tropical Cyclones (COAMPS-TC) model. Using real-time forecasts and retrospective experiments during 2015-2018, verification of TC intensity errors conditioned on different 5-day track error thresholds shows that reducing the 5-day track errors by 50-70% can help reduce the absolute intensity errors by 18-20% in the 2018 version of the COAMPS-TC model. Such impacts of track errors on the TC intensity errors are most persistent at 4-5 day lead times in all three major ocean basins, indicating a significant control of global models on the forecast skill of the COAMPS-TC model. It is of interest to find, however, that lowering the 5-day track errors below 80 nm does not reduce TC absolute intensity errors further. Instead, the 4-5 day intensity errors appear to be saturated at around 10-12 kt for cases with small track errors, thus suggesting the existence of some inherent intensity errors in regional models.Additional idealized simulations under a perfect model scenario reveal that the COAMPS-TC model possesses an intrinsic intensity variation at the TC mature stage in the range of 4-5 kt, regardless of the large-scale environment. Such intrinsic intensity variability in the COAMPS-TC model highlights the importance of potential chaotic TC dynamics, rather than model deficiencies, in determining TC intensity errors at 4-5 day lead times. These results indicate a fundamental limit in the improvement of TC intensity forecasts by numerical models that one should consider in future model development and evaluation.


2013 ◽  
Vol 28 (2) ◽  
pp. 353-367 ◽  
Author(s):  
Hui Yu ◽  
Peiyan Chen ◽  
Qingqing Li ◽  
Bi Tang

Abstract Forecasts of tropical cyclone (TC) intensity from six operational models (three global models and three regional models) during 2010 and 2011 are assessed to study the current capability of model guidance in the western North Pacific. The evaluation is performed on both Vmax and Pmin from several aspects, including the relative error, skill assessment, category score, the hitting rate of trend, and so on. It is encouraging to see that the models have had some skills in the prediction of TC intensity, including that two of them are better than a statistical baseline in Vmax at several lead times and three of them show some skill in intensity change. With dissipated cases included, all the models have skills in category and trend forecasting at lead times longer than 24 h or so. The model forecast errors are found to be significantly correlated with initial error and the observed initial intensity. A statistical calibration scheme for model forecasting is proposed based on such an attribute, which is more effective for Pmin than Vmax. The statistically calibrated model forecasts are important in setting up a skillful multimodel consensus, for either the mean or the statistically weighted mean. The Vmax forecasts converted from the calibrated Pmin consensus based on a statistical wind–pressure relationship show significant skill over the baseline and a skillful scheme is also proposed to deal with the delay of the model forecasts in operation.


2020 ◽  
Vol 35 (3) ◽  
pp. 841-856
Author(s):  
William E. Lewis ◽  
Christopher Rozoff ◽  
Stefano Alessandrini ◽  
Luca Delle Monache

Abstract The performance of the Hurricane Weather Research and Forecasting (HWRF) Model Rapid Intensification Analog Ensemble (RI-AnEn) is evaluated for real-time forecasts made during the National Oceanic and Atmospheric Administration (NOAA)’s 2018 Hurricane Forecast Improvement Program (HFIP) demonstration. Using a variety of assessment tools (Brier skill score, reliability diagrams, ROC curves, ROC skill scores), RI-AnEn is shown to perform competitively compared to both the deterministic HWRF and current operational probabilistic RI forecast aids. The assessment is extended to include forecasts from the 2017 HFIP demonstration and shows that RI-AnEn is the only model with significant RI forecast skill at all lead times in the Atlantic and eastern Pacific basins. Though RI-AnEn is overconfident in its RI forecasts, it is generally well calibrated for all lead times. Furthermore, significance testing indicates that for the 2017–18 Atlantic and eastern Pacific sample, RI-AnEn is more skillful than HWRF at all lead times and better than most of the other probabilistic guidance at 48 and 72 h. ROC curves reveal that RI-AnEn offers a good combination of sensitivity and specificity, performing comparably to SHIPS-RII at all lead times in both basins. With respect to specific high-impact cases from the 2018 Atlantic season, performance of RI-AnEn ranges from excellent (Hurricane Michael) to poor (Hurricane Florence). The multiyear assessment and results for two high-impact case studies from 2018 indicate that, while promising, RI-AnEn requires further work to refine its performance as well as to accurately situate its effectiveness relative to other RI forecasts aids.


2019 ◽  
Vol 58 (8) ◽  
pp. 1709-1723 ◽  
Author(s):  
Dian Nur Ratri ◽  
Kirien Whan ◽  
Maurice Schmeits

AbstractDynamical seasonal forecasts are afflicted with biases, including seasonal ensemble precipitation forecasts from the new ECMWF seasonal forecast system 5 (SEAS5). In this study, biases have been corrected using empirical quantile mapping (EQM) bias correction (BC). We bias correct SEAS5 24-h rainfall accumulations at seven monthly lead times over the period 1981–2010 in Java, Indonesia. For the observations, we have used a new high-resolution (0.25°) land-only gridded rainfall dataset [Southeast Asia observations (SA-OBS)]. A comparative verification of both raw and bias-corrected reforecasts is performed using several verification metrics. In this verification, the daily rainfall data were aggregated to monthly accumulated rainfall. We focus on July, August, and September because these are agriculturally important months; if the rainfall accumulation exceeds 100 mm, farmers may decide to grow a third rice crop. For these months, the first 2-month lead times show improved and mostly positive continuous ranked probability skill scores after BC. According to the Brier skill score (BSS), the BC reforecasts improve upon the raw reforecasts for the lower precipitation thresholds at the 1-month lead time. Reliability diagrams show that the BC reforecasts have good reliability for events exceeding the agriculturally relevant 100-mm threshold. A cost/loss analysis, comparing the potential economic value of the raw and BC reforecasts for this same threshold, shows that the value of the BC reforecasts is larger than that of the raw ones, and that the BC reforecasts have value for a wider range of users at 1- to 7-month lead times.


2021 ◽  
Vol 56 ◽  
pp. 89-96
Author(s):  
Aheli Das ◽  
Somnath Baidya Roy

Abstract. This study evaluates subseasonal to seasonal scale (S2S) forecasts of meteorological variables relevant for the renewable energy (RE) sector of India from six ocean-atmosphere coupled models: ECMWF SEAS5, DWD GCFS 2.0, Météo-France's System 6, NCEP CFSv2, UKMO GloSea5 GC2-LI, and CMCC SPS3. The variables include 10 m wind speed, incoming solar radiation, 2 m temperature, and 2 m relative humidity because they are critical for estimating the supply and demand of renewable energy. The study is conducted over seven homogenous regions of India for 1994–2016. The target months are April and May when the electricity demand is the highest and June–September when the renewable resources outstrip the demand. The evaluation is done by comparing the forecasts at 1, 2, 3, 4, and 5-months lead-times with the ERA5 reanalysis spatially averaged over each region. The fair continuous ranked probability skill score (FCRPSS) is used to quantitatively assess the forecast skill. Results show that incoming surface solar radiation predictions are the best, while 2 m relative humidity is the worst. Overall SEAS5 is the best performing model for all variables, for all target months in all regions at all lead times while GCFS 2.0 performs the worst. Predictability is higher over the southern regions of the country compared to the north and north-eastern parts. Overall, the quality of the raw S2S forecasts from numerical models over India are not good. These forecasts require calibration for further skill improvement before being deployed for applications in the RE sector.


2018 ◽  
Vol 146 (6) ◽  
pp. 1763-1784 ◽  
Author(s):  
Juliana Dias ◽  
Maria Gehne ◽  
George N. Kiladis ◽  
Naoko Sakaeda ◽  
Peter Bechtold ◽  
...  

Despite decades of research on the role of moist convective processes in large-scale tropical dynamics, tropical forecast skill in operational models is still deficient when compared to the extratropics, even at short lead times. Here we compare tropical and Northern Hemisphere (NH) forecast skill for quantitative precipitation forecasts (QPFs) in the NCEP Global Forecast System (GFS) and ECMWF Integrated Forecast System (IFS) during January 2015–March 2016. Results reveal that, in general, initial conditions are reasonably well estimated in both forecast systems, as indicated by relatively good skill scores for the 6–24-h forecasts. However, overall, tropical QPF forecasts in both systems are not considered useful by typical metrics much beyond 4 days. To quantify the relationship between QPF and dynamical skill, space–time spectra and coherence of rainfall and divergence fields are calculated. It is shown that while tropical variability is too weak in both models, the IFS is more skillful in propagating tropical waves for longer lead times. In agreement with past studies demonstrating that extratropical skill is partially drawn from the tropics, a comparison of daily skill in the tropics versus NH suggests that in both models NH forecast skill at lead times beyond day 3 is enhanced by tropical skill in the first couple of days. As shown in previous work, this study indicates that the differences in physics used in each system, in particular, how moist convective processes are coupled to the large-scale flow through these parameterizations, appear as a major source of tropical forecast errors.


Sign in / Sign up

Export Citation Format

Share Document