A framework for probabilistic weather forecast post-processing across models and lead times using machine learning

Forecasting the weather is an increasingly data-intensive exercise. Numerical weather prediction (NWP) models are becoming more complex, with higher resolutions, and there are increasing numbers of different models in operation. While the forecasting skill of NWP models continues to improve, the number and complexity of these models poses a new challenge for the operational meteorologist: how should the information from all available models, each with their own unique biases and limitations, be combined in order to provide stakeholders with well-calibrated probabilistic forecasts to use in decision making? In this paper, we use a road surface temperature example to demonstrate a three-stage framework that uses machine learning to bridge the gap between sets of separate forecasts from NWP models and the ‘ideal’ forecast for decision support: probabilities of future weather outcomes. First, we use quantile regression forests to learn the error profile of each numerical model, and use these to apply empirically derived probability distributions to forecasts. Second, we combine these probabilistic forecasts using quantile averaging. Third, we interpolate between the aggregate quantiles in order to generate a full predictive distribution, which we demonstrate has properties suitable for decision support. Our results suggest that this approach provides an effective and operationally viable framework for the cohesive post-processing of weather forecasts across multiple models and lead times to produce a well-calibrated probabilistic output. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.

Download Full-text

Post-processing rainfall forecasts from numerical weather prediction models for short-term streamflow forecasting

Hydrology and Earth System Sciences ◽

10.5194/hess-17-3587-2013 ◽

2013 ◽

Vol 17 (9) ◽

pp. 3587-3603 ◽

Cited By ~ 56

Author(s):

D. E. Robertson ◽

D. L. Shrestha ◽

Q. J. Wang

Keyword(s):

Numerical Weather Prediction ◽

Prediction Models ◽

Probability Distributions ◽

Weather Prediction ◽

Lead Times ◽

Streamflow Forecasting ◽

Post Processing ◽

Short Term ◽

Ensemble Forecasts ◽

Numerical Weather

Abstract. Sub-daily ensemble rainfall forecasts that are bias free and reliably quantify forecast uncertainty are critical for flood and short-term ensemble streamflow forecasting. Post-processing of rainfall predictions from numerical weather prediction models is typically required to provide rainfall forecasts with these properties. In this paper, a new approach to generate ensemble rainfall forecasts by post-processing raw numerical weather prediction (NWP) rainfall predictions is introduced. The approach uses a simplified version of the Bayesian joint probability modelling approach to produce forecast probability distributions for individual locations and forecast lead times. Ensemble forecasts with appropriate spatial and temporal correlations are then generated by linking samples from the forecast probability distributions using the Schaake shuffle. The new approach is evaluated by applying it to post-process predictions from the ACCESS-R numerical weather prediction model at rain gauge locations in the Ovens catchment in southern Australia. The joint distribution of NWP predicted and observed rainfall is shown to be well described by the assumed log-sinh transformed bivariate normal distribution. Ensemble forecasts produced using the approach are shown to be more skilful than the raw NWP predictions both for individual forecast lead times and for cumulative totals throughout all forecast lead times. Skill increases result from the correction of not only the mean bias, but also biases conditional on the magnitude of the NWP rainfall prediction. The post-processed forecast ensembles are demonstrated to successfully discriminate between events and non-events for both small and large rainfall occurrences, and reliably quantify the forecast uncertainty. Future work will assess the efficacy of the post-processing method for a wider range of climatic conditions and also investigate the benefits of using post-processed rainfall forecasts for flood and short-term streamflow forecasting.

Download Full-text

Probabilistic fire-danger forecasting: A framework for week-two forecasts using statistical post-processing techniques and the Global ECMWF Fire Forecast System (GEFF)

Weather and Forecasting ◽

10.1175/waf-d-21-0075.1 ◽

2021 ◽

Author(s):

Rochelle P. Worsnop ◽

Michael Scheuerer ◽

Francesca Di Giuseppe ◽

Christopher Barnard ◽

Thomas M. Hamill ◽

...

Keyword(s):

Prediction Models ◽

Weather Prediction ◽

Weather Forecast ◽

Reanalysis Data ◽

Post Processing ◽

Medium Range Weather Forecast ◽

Skill Scores ◽

Probabilistic Forecasts ◽

Systematic Biases ◽

Processing Techniques

AbstractWildfire guidance two weeks ahead is needed for strategic planning of fire mitigation and suppression. However, fire forecasts driven by meteorological forecasts from numerical weather prediction models inherently suffer from systematic biases. This study uses several statistical-postprocessing methods to correct these biases and increase the skill of ensemble fire forecasts over the contiguous United States 8–14 days ahead. We train and validate the post-processing models on 20 years of European Centre for Medium-range Weather Forecast (ECMWF) reforecasts and ERA5 reanalysis data for 11 meteorological variables related to fire, such as surface temperature, wind speed, relative humidity, cloud cover, and precipitation. The calibrated variables are then input to the Global ECMWF Fire Forecast (GEFF) system to produce probabilistic forecasts of daily fire-indicators which characterize the relationships between fuels, weather, and topography. Skill scores show that the post-processed forecasts overall have greater positive skill at Days 8–14 relative to raw and climatological forecasts. It is shown that the post-processed forecasts are more reliable at predicting above- and below-normal probabilities of various fire indicators than the raw forecasts and that the greatest skill for Days 8–14 is achieved by aggregating forecast days together.

Download Full-text

Artificial Learning Dispatch Planning with Probabilistic Forecasts: Using Uncertainties as an Asset

Energies ◽

10.3390/en13030616 ◽

2020 ◽

Vol 13 (3) ◽

pp. 616

Author(s):

Ana Carolina do Amaral Burghi ◽

Tobias Hirsch ◽

Robert Pitz-Paal

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Weather Forecast ◽

Machine Learning Algorithm ◽

Post Processing ◽

Weather Forecasts ◽

Additional Information ◽

Model Following ◽

Probabilistic Forecasts ◽

Artificial Learning

Weather forecast uncertainty is a key element for energy market volatility. By intelligently considering uncertainties on the schedule development, renewable energy systems with storage could improve dispatching accuracy, and therefore, effectively participate in electricity wholesale markets. Deterministic forecasts have been traditionally used to support dispatch planning, representing reduced or no uncertainty information about the future weather. Aiming at better representing the uncertainties involved, probabilistic forecasts have been developed to increase forecasting accuracy. For the dispatch planning, this can highly influence the development of a more precise schedule. This work extends a dispatch planning method to the use of probabilistic weather forecasts. The underlying method used a schedule optimizer coupled to a post-processing machine learning algorithm. This machine learning algorithm was adapted to include probabilistic forecasts, considering their additional information on uncertainties. This post-processing applied a calibration of the planned schedule considering the knowledge about uncertainties obtained from similar past situations. Simulations performed with a concentrated solar power plant model following the proposed strategy demonstrated promising financial improvement and relevant potential in dealing with uncertainties. Results especially show that information included in probabilistic forecasts can increase financial revenues up to 15% (in comparison to a persistence solar driven approach) if processed in a suitable way.

Download Full-text

Comparing Area Probability Forecasts of (Extreme) Local Precipitation Using Parametric and Machine Learning Statistical Postprocessing Methods

Monthly Weather Review ◽

10.1175/mwr-d-17-0290.1 ◽

2018 ◽

Vol 146 (11) ◽

pp. 3651-3673 ◽

Cited By ~ 14

Author(s):

Kirien Whan ◽

Maurice Schmeits

Keyword(s):

Machine Learning ◽

Deterministic Model ◽

Weather Prediction ◽

Location Parameter ◽

Important Predictor ◽

Lead Times ◽

Hourly Precipitation ◽

Severe Thunderstorms ◽

Probabilistic Forecasts ◽

Precipitation Thresholds

Abstract Probabilistic forecasts, which communicate forecast uncertainties, enable users to make better weather-based decisions. Using precipitation and numerous instability indices from the deterministic model HARMONIE–AROME (HA; a nonhydrostatic numerical weather prediction model) as potential predictors, we generate summer areal probabilistic maximum hourly precipitation forecasts across 11 regions of the Netherlands. We compare the skill of three statistical postprocessing methods: an extended logistic regression (ELR), a zero-adjusted gamma distribution (ZAGA), and a machine learning-based method, quantile regression forests (QRF). Forecast skill for low and moderate precipitation thresholds increases with the inclusion of extra predictors, in addition to HA precipitation. HA precipitation is the most important predictor at all lead times in ELR and QRF, while in ZAGA, the most important predictor for the location parameter shifts over lead times from HA precipitation to indices of atmospheric instability. All three methods improve upon a climatological forecast for low and moderate precipitation thresholds. ZAGA and QRF are generally the most skillful methods at moderate thresholds. QRF tends to be the most skillful method at higher thresholds, particularly during the afternoon period. Forecasts are reliable at low and moderate thresholds but tend to be overconfident at higher thresholds. QRF and ZAGA have more potential economic value than the deterministic forecast, with value remaining at high thresholds. A maximum local hourly precipitation threshold of 30 mm h−1 (a criterion in the Royal Netherlands Meteorological Institute’s code yellow warning for severe thunderstorms) is skillfully forecast by QRF in the afternoon period at short lead times.

Download Full-text

ARPEGE cloud cover forecast post-processing with convolutional neural network

10.5194/egusphere-egu2020-18325 ◽

2020 ◽

Author(s):

Florian Dupuy ◽

Olivier Mestre ◽

Léo Pfitzner

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Random Forests ◽

Cloud Cover ◽

Spatial Information ◽

Weather Prediction ◽

Excellent Result ◽

Machine Learning Algorithms ◽

Post Processing

Cloud cover is a crucial information for many applications such as planning land observation missions from space. However, cloud cover remains a challenging variable to forecast, and Numerical Weather Prediction (NWP) models suffer from significant biases, hence justifying the use of statistical post-processing techniques. In our application, the ground truth is a gridded cloud cover product derived from satellite observations over Europe, and predictors are spatial fields of various variables produced by ARPEGE (M&#233;t&#233;o-France global NWP) at the corresponding lead time.In this study, ARPEGE cloud cover is post-processed using a convolutional neural network (CNN). CNN is the most popular machine learning tool to deal with images. In our case, CNN allows to integrate spatial information contained in NWP outputs. We show that a simple U-Net architecture produces significant improvements over Europe. Compared to the raw ARPEGE forecasts, MAE drops from 25.1 % to 17.8 % and RMSE decreases from 37.0 % to 31.6 %. Considering specific needs for earth observation, special interest was put on forecasts with low cloud cover conditions (< 10 %). For this particular nebulosity class, we show that hit rate jumps from 40.6 to 70.7 (which is the order of magnitude of what can be achieved using classical machine learning algorithms such as random forests) while false alarm decreases from 38.2 to 29.9. This is an excellent result, since improving hit rates by means of random forests usually also results in a slight increase of false alarms.

Download Full-text

Exploring multi-modalities in weather prediction using a univariate graph based on machine learning techniques

10.5194/egusphere-egu21-11747 ◽

2021 ◽

Author(s):

Natacha Galmiche ◽

Nello Blaser ◽

Morten Brun ◽

Helwig Hauser ◽

Thomas Spengler ◽

...

Keyword(s):

Machine Learning ◽

Standard Deviation ◽

Probability Distributions ◽

Weather Prediction ◽

A Priori ◽

Clustering Algorithms ◽

Quantitative Information ◽

Machine Learning Techniques ◽

Topological Data Analysis ◽

Learning Techniques

Probability distributions based on ensemble forecasts are commonly used to assess uncertainty in weather prediction. However, interpreting these distributions is not trivial, especially in the case of multimodality with distinct likely outcomes. The conventional summary employs mean and standard deviation across ensemble members, which works well for unimodal, Gaussian-like distributions. In the case of multimodality this misleads, discarding crucial information.&#160;We aim at combining previously developed clustering algorithms in machine learning and topological data analysis to extract useful information such as the number of clusters in an ensemble. Given the chaotic behaviour of the atmosphere, machine learning techniques can provide relevant results even if no, or very little, a priori information about the data is available. In addition, topological methods that analyse the shape of the data can make results explainable.Given an ensemble of univariate time series, a graph is generated whose edges and vertices represent clusters of members, including additional information for each cluster such as the members belonging to them, their uncertainty, and their relevance according to the graph. In the case of multimodality, this approach provides relevant and quantitative information beyond the commonly used mean and standard deviation approach that helps to further characterise the predictability.

Download Full-text

Numerical Weather Forecast Post-processing with Ensemble Learning and Transfer Learning

10.5194/egusphere-egu2020-3885 ◽

2020 ◽

Author(s):

Yuwen Chen ◽

Xiaomeng Huang

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Ensemble Learning ◽

Weather Forecasting ◽

Weather Forecast ◽

Processing Temperature ◽

Forecast Errors ◽

Post Processing ◽

Numerical Weather ◽

One Year

Statistical approaches have been used for decades to augment and interpret numerical weather forecasts. The emergence of artificial intelligence algorithms has provided new perspectives in this field, but the extension of algorithms developed for station networks with rich historical records to include newly-built stations remains a challenge. To address this, we design a framework that combines two machine learning methods: temperature prediction based on ensemble of multiple machine learning models and transfer learning for newly-built stations. We then evaluate this framework by post-processing temperature forecasts provided by a leading weather forecast center and observations from 301 weather stations in China. Station clustering reduces forecast errors by 24.4% averagely, while transfer learning improves predictions by 13.4% for recently-built sites with only one year of data available. This work demonstrates how ensemble learning and transfer learning can be used to supplement weather forecasting.

Download Full-text

Statistical post-processing of wind speed forecasts using convolutional neural networks

Monthly Weather Review ◽

10.1175/mwr-d-20-0219.1 ◽

2021 ◽

Author(s):

Simon Veldkamp ◽

Kirien Whan ◽

Sjoerd Dirksen ◽

Maurice Schmeits

Keyword(s):

Neural Networks ◽

Wind Speed ◽

Convolutional Neural Networks ◽

Weather Prediction ◽

Estimation Methods ◽

Post Processing ◽

Skill Scores ◽

Probabilistic Forecasts ◽

Truncated Normal ◽

Nwp Model

AbstractCurrent statistical post-processing methods for probabilistic weather forecasting are not capable of using full spatial patterns from the numerical weather prediction (NWP) model. In this paper we incorporate spatial wind speed information by using convolutional neural networks (CNNs) and obtain probabilistic wind speed forecasts in the Netherlands for 48 hours ahead, based on KNMI’s deterministic Harmonie-Arome NWP model. The probabilistic forecasts from the CNNs are shown to have higher Brier skill scores for medium to higher wind speeds, as well as a better continuous ranked probability score (CRPS) and logarithmic score, than the forecasts from fully connected neural networks and quantile regression forests. As a secondary result, we have compared the CNNs using 3 different density estimation methods (quantized softmax (QS), kernel mixture networks, and fitting a truncated normal distribution), and found the probabilistic forecasts based on the QS method to be best.

Download Full-text

Using Artificial Neural Networks for Generating Probabilistic Subseasonal Precipitation Forecasts over California

Monthly Weather Review ◽

10.1175/mwr-d-20-0096.1 ◽

2020 ◽

Vol 148 (8) ◽

pp. 3489-3506

Author(s):

Michael Scheuerer ◽

Matthew B. Switanek ◽

Rochelle P. Worsnop ◽

Thomas M. Hamill

Keyword(s):

Neural Network ◽

Large Scale ◽

Signal To Noise Ratio ◽

Weather Prediction ◽

Forecast Skill ◽

Lead Times ◽

Probabilistic Forecasts ◽

Medium Range ◽

Artificial Neural ◽

Artificial Neural Network Ann

Abstract Forecast skill of numerical weather prediction (NWP) models for precipitation accumulations over California is rather limited at subseasonal time scales, and the low signal-to-noise ratio makes it challenging to extract information that provides reliable probabilistic forecasts. A statistical postprocessing framework is proposed that uses an artificial neural network (ANN) to establish relationships between NWP ensemble forecast and gridded observed 7-day precipitation accumulations, and to model the increase or decrease of the probabilities for different precipitation categories relative to their climatological frequencies. Adding predictors with geographic information and location-specific normalization of forecast information permits the use of a single ANN for the entire forecast domain and thus reduces the risk of overfitting. In addition, a convolutional neural network (CNN) framework is proposed that extends the basic ANN and takes images of large-scale predictors as inputs that inform local increase or decrease of precipitation probabilities relative to climatology. Both methods are demonstrated with ECMWF ensemble reforecasts over California for lead times up to 4 weeks. They compare favorably with a state-of-the-art postprocessing technique developed for medium-range ensemble precipitation forecasts, and their forecast skill relative to climatology is positive everywhere within the domain. The magnitude of skill, however, is low for week-3 and week-4, and suggests that additional sources of predictability need to be explored.

Download Full-text

Post-processing for NWP Outputs Based on Machine Learning for 2022 Winter Olympics Games over Complex Terrain

10.5194/egusphere-egu2020-10463 ◽

2020 ◽

Author(s):

Kang Yanyan ◽

Li Haochen ◽

Xia Jiangjiang ◽

Zhang Yingxin

Keyword(s):

Machine Learning ◽

Wind Speed ◽

Complex Terrain ◽

Weather Prediction ◽

Grid Point ◽

Forecast Accuracy ◽

Post Processing ◽

Winter Olympics ◽

Forecast Time ◽

Station Point

&#160; &#160; Weather forecasts play an important role in the Olympic game,especially the mountain snow projects, which will help to find a "window period" for the game. The winter Olympics track is located on very complex terrain, and a detailed weather forecast is needed. A Post-processing method based on machine learning is used for the future-10-days weather prediction with 1-km spatial resolution and 1-hour temporal resolution, which can greatly improve accuracy and refinement of numerical weather prediction(NWP). The ECWMF/RMAPS model data and the automatic weather station data(AWS) from 2015-2018 are prepared for the training data and test data, included 48 features and 4 labels (the observed 2m temperature, relative humidity , 10m wind speed and wind direction ). The model data are grid point, while the AWS data are station point. We take the nearest 9 model point to predict the station point, instead of making an interpolation between the grid point and station point. Then the feature number will be 48*9 in dataset. The interpolation error from grid point to station is eliminated,and the spatial distribution is considered to some extent. Machine leaning method we used are SVM, Random Forest, Gradient Boosting Decision Tree(GBDT) and XGBoost. We find that XGBoost method performs best, slightly better than GBDT and Random Forest. It is noted that we did some feature engineering work before training, and we found that it&#8217;s not that the more features, the better the model, while 10 features are enough. Also there is an interesting thing that the features that closely related the labels values becomes less important as the forecast time increases,such as the model outputed 2m temperature, 10m wind speed and wind direction. While some features that forecasters don&#8217;t pay attention to become more important in the 6-10 days prediction, such as latent heat flux, snow depth and so on. So it&#8217;s necessary to train the model based on dynamic weight parameters for different forecast time. Through the post-processing based on the machine learning method, the forecast accuracy has been greatly improved compared with EC model. The averaged forecast accuracy of 0-10 days for 2m relative humidity, 10m wind speed and direction has been increased by almost 15%, and the temperature accuracy has been increased by 20%~40% ( 40% for 0-3 days, and the accuracy decreased with the forecast time ).&#160;

Download Full-text