On reliability analysis of multi-categorical forecasts

Abstract. Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.

Download Full-text

Using Artificial Neural Networks for Generating Probabilistic Subseasonal Precipitation Forecasts over California

Monthly Weather Review ◽

10.1175/mwr-d-20-0096.1 ◽

2020 ◽

Vol 148 (8) ◽

pp. 3489-3506

Author(s):

Michael Scheuerer ◽

Matthew B. Switanek ◽

Rochelle P. Worsnop ◽

Thomas M. Hamill

Keyword(s):

Neural Network ◽

Large Scale ◽

Signal To Noise Ratio ◽

Weather Prediction ◽

Forecast Skill ◽

Lead Times ◽

Probabilistic Forecasts ◽

Medium Range ◽

Artificial Neural ◽

Artificial Neural Network Ann

Abstract Forecast skill of numerical weather prediction (NWP) models for precipitation accumulations over California is rather limited at subseasonal time scales, and the low signal-to-noise ratio makes it challenging to extract information that provides reliable probabilistic forecasts. A statistical postprocessing framework is proposed that uses an artificial neural network (ANN) to establish relationships between NWP ensemble forecast and gridded observed 7-day precipitation accumulations, and to model the increase or decrease of the probabilities for different precipitation categories relative to their climatological frequencies. Adding predictors with geographic information and location-specific normalization of forecast information permits the use of a single ANN for the entire forecast domain and thus reduces the risk of overfitting. In addition, a convolutional neural network (CNN) framework is proposed that extends the basic ANN and takes images of large-scale predictors as inputs that inform local increase or decrease of precipitation probabilities relative to climatology. Both methods are demonstrated with ECMWF ensemble reforecasts over California for lead times up to 4 weeks. They compare favorably with a state-of-the-art postprocessing technique developed for medium-range ensemble precipitation forecasts, and their forecast skill relative to climatology is positive everywhere within the domain. The magnitude of skill, however, is low for week-3 and week-4, and suggests that additional sources of predictability need to be explored.

Download Full-text

Forecasting the Daily Maximal and Minimal Temperatures from Radiosonde Measurements Using Neural Networks

Applied Sciences ◽

10.3390/app112210852 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10852

Author(s):

Gregor Skok ◽

Doruntina Hoxha ◽

Žiga Zaplotnik

Keyword(s):

Neural Networks ◽

Lead Time ◽

Forecast Error ◽

Lead Times ◽

Profile Measurement ◽

Profile Data ◽

Forecast Lead Time ◽

Medium Range ◽

Range Forecasts

This study investigates the potential of direct prediction of daily extremes of temperature at 2 m from a vertical profile measurement using neural networks (NNs). The analysis is based on 3800 daily profiles measured in the period 2004–2019. Various setups of dense sequential NNs are trained to predict the daily extremes at different lead times ranging from 0 to 500 days into the future. The short- to medium-range forecasts rely mainly on the profile data from the lowest layer—mostly on the temperature in the lowest 1 km. For the long-range forecasts (e.g., 100 days), the NN relies on the data from the whole troposphere. The error increases with forecast lead time, but at the same time, it exhibits periodic behavior for long lead times. The NN forecast beats the persistence forecast but becomes worse than the climatological forecast on day two or three. The forecast slightly improves when the previous-day measurements of temperature extremes are added as a predictor. The best forecast is obtained when the climatological value is added as well, with the biggest improvement in the long-term range where the error is constrained to the climatological forecast error.

Download Full-text

Spatial Bias in Medium-Range Forecasts of Heavy Precipitation in the Sacramento River Basin: Implications for Water Management

Journal of Hydrometeorology ◽

10.1175/jhm-d-19-0226.1 ◽

2020 ◽

Vol 21 (7) ◽

pp. 1405-1423

Author(s):

Zachary P. Brodeur ◽

Scott Steinschneider

Keyword(s):

Forecast Error ◽

Heavy Precipitation ◽

Sacramento River ◽

Precipitation Forecast ◽

Lead Times ◽

Synoptic Scale ◽

Basin Scale ◽

Medium Range ◽

Precipitation Events ◽

Range Forecasts

AbstractForecasts of heavy precipitation delivered by atmospheric rivers (ARs) are becoming increasingly important for both flood control and water supply management in reservoirs across California. This study examines the hypothesis that medium-range forecasts of heavy precipitation at the basin scale exhibit recurrent spatial biases that are driven by mesoscale and synoptic-scale features of associated AR events. This hypothesis is tested for heavy precipitation events in the Sacramento River basin using 36 years of NCEP medium-range reforecasts from 1984 to 2019. For each event we cluster precipitation forecast error across western North America for lead times ranging from 1 to 15 days. Integrated vapor transport (IVT), 500-hPa geopotential heights, and landfall characteristics of ARs are composited across clusters and lead times to diagnose the causes of precipitation forecast biases. We investigate the temporal evolution of forecast error to characterize its persistence across lead times, and explore the accuracy of forecasted IVT anomalies across different domains of the North American west coast during heavy precipitation events in the Sacramento basin. Our results identify recurrent spatial patterns of precipitation forecast error consistent with errors of forecasted synoptic-scale features, especially at long (5–15 days) leads. Moreover, we find evidence that forecasts of AR landfalls well outside of the latitudinal bounds of the Sacramento basin precede heavy precipitation events within the basin. These results suggest the potential for using medium-range forecasts of large-scale climate features across the Pacific–North American sector, rather than just local forecasts of basin-scale precipitation, when designing forecast-informed reservoir operations.

Download Full-text

Evaluating probabilistic forecasts of football matches: the case against the ranked probability score

Journal of Quantitative Analysis in Sports ◽

10.1515/jqas-2019-0089 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Edward Wheatcroft

Keyword(s):

Scoring Rules ◽

Brier Score ◽

Sporting Events ◽

Forecast Performance ◽

Scoring Rule ◽

Probability Score ◽

Probabilistic Forecasts ◽

Non Local ◽

Evaluating Forecasts ◽

Non Locality

Abstract A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is ‘sensitive to distance’, that is it takes into account the ordering in the outcomes (a home win is ‘closer’ to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.

Download Full-text

Probabilistic Forecasts Using Analogs in the Idealized Lorenz96 Setting

Monthly Weather Review ◽

10.1175/2010mwr3542.1 ◽

2011 ◽

Vol 139 (6) ◽

pp. 1960-1971 ◽

Cited By ~ 12

Author(s):

Jakob W. Messner ◽

Georg J. Mayr

Keyword(s):

Logistic Regression ◽

Systematic Errors ◽

Lead Times ◽

Model Output ◽

Weather Forecasts ◽

Probabilistic Forecasts ◽

Direct Model ◽

Nwp Model

Abstract Three methods to make probabilistic weather forecasts by using analogs are presented and tested. The basic idea of these methods is that finding similar NWP model forecasts to the current one in an archive of past forecasts and taking the corresponding analyses as prediction should remove all systematic errors of the model. Furthermore, this statistical postprocessing can convert NWP forecasts to forecasts for point locations and easily turn deterministic forecasts into probabilistic ones. These methods are tested in the idealized Lorenz96 system and compared to a benchmark bracket formed by ensemble relative frequencies from direct model output and logistic regression. The analog methods excel at longer lead times.

Download Full-text

A multivariate approach to generate synthetic short‐to‐medium range hydro‐meteorological forecasts across locations, variables, and lead times

Water Resources Research ◽

10.1029/2020wr029453 ◽

2021 ◽

Author(s):

Zachary P. Brodeur ◽

Scott Steinschneider

Keyword(s):

Lead Times ◽

Multivariate Approach ◽

Medium Range

Download Full-text

Verification of Ensemble-Based Uncertainty Circles around Tropical Cyclone Track Forecasts

Weather and Forecasting ◽

10.1175/waf-d-11-00007.1 ◽

2011 ◽

Vol 26 (5) ◽

pp. 664-676 ◽

Cited By ~ 21

Author(s):

Thierry Dupont ◽

Matthieu Plu ◽

Philippe Caroff ◽

Ghislain Faure

Keyword(s):

Tropical Cyclone ◽

Large Error ◽

Ensemble Prediction ◽

Probabilistic Forecasting ◽

Cyclone Track ◽

Ensemble Prediction System ◽

Weather Forecasts ◽

Probabilistic Forecasts ◽

Medium Range ◽

Uncertainty Information

Abstract Several tropical cyclone forecasting centers issue uncertainty information with regard to their official track forecasts, generally using the climatological distribution of position error. However, such methods are not able to convey information that depends on the situation. The purpose of the present study is to assess the skill of the Ensemble Prediction System (EPS) from the European Centre for Medium-Range Weather Forecasts (ECMWF) at measuring the uncertainty of up to 3-day track forecasts issued by the Regional Specialized Meteorological Centre (RSMC) La Réunion in the southwestern Indian Ocean. The dispersion of cyclone positions in the EPS is extracted and translated at the RSMC forecast position. The verification relies on existing methods for probabilistic forecasts that are presently adapted to a cyclone-position metric. First, the probability distribution of forecast positions is compared to the climatological distribution using Brier scores. The probabilistic forecasts have better scores than the climatology, particularly after applying a simple calibration scheme. Second, uncertainty circles are built by fixing the probability at 75%. Their skill at detecting small and large error values is assessed. The circles have some skill for large errors up to the 3-day forecast (and maybe after); but the detection of small radii is skillful only up to 2-day forecasts. The applied methodology may be used to assess and to compare the skill of different probabilistic forecasting systems of cyclone position.

Download Full-text

Evaluating sub-seasonal heatwave reforecasts of the ECMWF over Europe

10.5194/ems2021-146 ◽

2021 ◽

Author(s):

Natalia Korhonen ◽

Otto Hyvärinen ◽

Matti Kämäräinen ◽

Kirsti Jylhä

Keyword(s):

Heat Wave ◽

Heat Waves ◽

Lead Times ◽

Weather Forecasts ◽

Wave Forecast ◽

Extended Range ◽

Hot Days ◽

Medium Range ◽

Summer Temperatures

Severe heatwaves have harmful impacts on ecosystems and society. Early warning of heat waves help with decreasing their harmful impact. Previous research shows that the Extended Range Forecasts (ERF) of the European Centre for Medium-Range Weather Forecasts (ECMWF) have over Europe a somewhat higher reforecast skill for extreme hot summer temperatures than for long-term mean temperatures. Also it has been shown that the reforecast skill of the ERFs of the ECMWF was strongly increased by the most severe heat waves (the European heatwave 2003 and the Russian heatwave 2010).Our aim is to be able to estimate the skill of a heat wave forecast at the time the forecast is given. For that we investigated the spatial and temporal reforecast skill of the ERFs of the ECMWF to forecast hot days (here defined as a day on which the 5 days running mean surface temperature is above its summer 90th percentile) in the continental Europe in summers 2000-2019. We used the ECMWF 2-meter temperature reforecasts and verified them against the ERA5 reanalysis. The skill of the hot day reforecasts was estimated by the symmetric extremal dependence index (SEDI) which considers both hit rates and false alarm rates of the hot day forecasts. Further, we investigated the skill of the heatwave reforecasts based on at which time steps of the forecast the hot days were forecasted. We found that on the mesoscale (horizontal scale of ~500 km) the ERFs of the ECMWF were most skillful in predicting the life cycle of a heat wave (lasting up to 25 days) about a week before its start and during its course. That is, on the mesoscale those reforecasts, in which hot day(s) were forecasted to occur during the first 7&#8230;11 days, were more skillful on lead times up to 25 days than the rest of the heat wave forecasts. This finding is valuable information, e.g., in the energy and health sectors while preparing for a coming heat wave.The work presented here is part of the research project HEATCLIM (Heat and health in the changing climate) funded by the Academy of Finland.

Download Full-text

Effects of physics packages on medium-range forecasts in a global forecasting system

Journal of Atmospheric and Solar-Terrestrial Physics ◽

10.1016/j.jastp.2013.03.027 ◽

2013 ◽

Vol 100-101 ◽

pp. 50-58 ◽

Cited By ~ 5

Author(s):

Byoung-Kwon Park ◽

Song-You Hong

Keyword(s):

Medium Range ◽

Forecasting System ◽

Range Forecasts

Download Full-text

Influence of the Madden–Julian Oscillation on Forecasts of Extreme Precipitation in the Contiguous United States

Monthly Weather Review ◽

10.1175/2010mwr3512.1 ◽

2011 ◽

Vol 139 (2) ◽

pp. 332-350 ◽

Cited By ~ 37

Author(s):

Charles Jones ◽

Jon Gottschalck ◽

Leila M. V. Carvalho ◽

Wayne Higgins

Keyword(s):

United States ◽

Extreme Precipitation ◽

Western Hemisphere ◽

Boreal Winter ◽

Lead Times ◽

Madden Julian Oscillation ◽

Extreme Precipitation Events ◽

Skill Scores ◽

Probabilistic Forecasts ◽

Precipitation Events

Abstract Extreme precipitation events are among the most devastating weather phenomena since they are frequently accompanied by loss of life and property. This study uses reforecasts of the NCEP Climate Forecast System (CFS.v1) to evaluate the skill of nonprobabilistic and probabilistic forecasts of extreme precipitation in the contiguous United States (CONUS) during boreal winter for lead times up to two weeks. The CFS model realistically simulates the spatial patterns of extreme precipitation events over the CONUS, although the magnitudes of the extremes in the model are much larger than in the observations. Heidke skill scores (HSS) for forecasts of extreme precipitation at the 75th and 90th percentiles showed that the CFS model has good skill at week 1 and modest skill at week 2. Forecast skill is usually higher when the Madden–Julian oscillation (MJO) is active and has enhanced convection occurring over the Western Hemisphere, Africa, and/or the western Indian Ocean than in quiescent periods. HSS greater than 0.1 extends to lead times of up to two weeks in these situations. Approximately 10%–30% of the CONUS has HSS greater than 0.1 at lead times of 1–14 days when the MJO is active. Probabilistic forecasts for extreme precipitation events at the 75th percentile show improvements over climatology of 0%–40% at 1-day lead and 0%–5% at 7-day leads. The CFS has better skill in forecasting severe extremes (i.e., events exceeding the 90th percentile) at longer leads than moderate extremes (75th percentile). Improvements over climatology between 10% and 30% at leads of 3 days are observed over several areas across the CONUS—especially in California and in the Midwest.

Download Full-text