scholarly journals A precipitation forecasting model using machine learning on big data in clouds environment

MAUSAM ◽  
2021 ◽  
Vol 72 (4) ◽  
pp. 781-790
Author(s):  
MAHBOOB ALAM ◽  
MOHD. AMJAD

Numerical weather prediction (NWP) has long been a difficult task for meteorologists. Atmospheric dynamics is extremely complicated to model, and chaos theory teaches us that the mathematical equations used to predict the weather are sensitive to initial conditions; that is, slightly perturbed initial conditions could yield very different forecasts. Over the years, meteorologists have developed a number of different mathematical models for atmospheric dynamics, each making slightly different assumptions and simplifications, and hence each yielding different forecasts. It has been noted that each model has its strengths and weaknesses forecasting in different situations, and hence to improve performance, scientists now use an ensemble forecast consisting of different models and running those models with different initial conditions. This ensemble method uses statistical post-processing; usually linear regression. Recently, machine learning techniques have started to be applied to NWP. Studies of neural networks, logistic regression, and genetic algorithms have shown improvements over standard linear regression for precipitation prediction. Gagne et al proposed using multiple machine learning techniques to improve precipitation forecasting. They used Breiman’s random forest technique, which had previously been applied to other areas of meteorology. Performance was verified using Next Generation Weather Radar (NEXRAD) data. Instead of using an ensemble forecast, it discusses the usage of techniques pertaining to machine learning to improve the precipitation forecast. This paper is to present an approach for mapping of precipitation data. The project attempts to arrive at a machine learning method which is optimal and data driven for predicting precipitation levels that aids farmers thereby aiming to provide benefits to the agricultural domain.

2021 ◽  
Author(s):  
Natacha Galmiche ◽  
Nello Blaser ◽  
Morten Brun ◽  
Helwig Hauser ◽  
Thomas Spengler ◽  
...  

<p>Probability distributions based on ensemble forecasts are commonly used to assess uncertainty in weather prediction. However, interpreting these distributions is not trivial, especially in the case of multimodality with distinct likely outcomes. The conventional summary employs mean and standard deviation across ensemble members, which works well for unimodal, Gaussian-like distributions. In the case of multimodality this misleads, discarding crucial information. </p><p>We aim at combining previously developed clustering algorithms in machine learning and topological data analysis to extract useful information such as the number of clusters in an ensemble. Given the chaotic behaviour of the atmosphere, machine learning techniques can provide relevant results even if no, or very little, a priori information about the data is available. In addition, topological methods that analyse the shape of the data can make results explainable.</p><p>Given an ensemble of univariate time series, a graph is generated whose edges and vertices represent clusters of members, including additional information for each cluster such as the members belonging to them, their uncertainty, and their relevance according to the graph. In the case of multimodality, this approach provides relevant and quantitative information beyond the commonly used mean and standard deviation approach that helps to further characterise the predictability.</p>


Author(s):  
Jonathan Becker ◽  
Aveek Purohit ◽  
Zheng Sun

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.


2020 ◽  
Author(s):  
Pramod Kumar ◽  
Sameer Ambekar ◽  
Manish Kumar ◽  
Subarna Roy

This chapter aims to introduce the common methods and practices of statistical machine learning techniques. It contains the development of algorithms, applications of algorithms and also the ways by which they learn from the observed data by building models. In turn, these models can be used to predict. Although one assumes that machine learning and statistics are not quite related to each other, it is evident that machine learning and statistics go hand in hand. We observe how the methods used in statistics such as linear regression and classification are made use of in machine learning. We also take a look at the implementation techniques of classification and regression techniques. Although machine learning provides standard libraries to implement tons of algorithms, we take a look on how to tune the algorithms and what parameters of the algorithm or the features of the algorithm affect the performance of the algorithm based on the statistical methods.


2020 ◽  
Author(s):  
Nicola Bodini ◽  
Julie K. Lundquist ◽  
Mike Optis

Abstract. Current turbulence parameterizations in numerical weather prediction models at the mesoscale assume a local equilibrium between production and dissipation of turbulence. As this assumption does not hold at fine horizontal resolutions, improved ways to represent turbulent kinetic energy (TKE) dissipation rate (ε) are needed. Here, we use a 6-week data set of turbulence measurements from 184 sonic anemometers in complex terrain at the Perdigão field campaign to suggest improved representations of dissipation rate. First, we demonstrate that a widely used Mellor, Yamada, Nakanishi, and Niino (MYNN) parameterization of TKE dissipation rate leads to a large inaccuracy and bias in the representation of ε. Next, we assess the potential of machine-learning techniques to predict TKE dissipation rate from a set of atmospheric and terrain-related features. We train and test several machine-learning algorithms using the data at Perdigão, and we find that multivariate polynomial regressions and random forests can eliminate the bias MYNN currently shows in representing ε, while also reducing the average error by up to 30 %. Of all the variables included in the algorithms, TKE is the variable responsible for most of the variability of ε, and a strong positive correlation exists between the two. These results suggest further consideration of machine-learning techniques to enhance parameterizations of turbulence in numerical weather prediction models.


2020 ◽  
Vol 13 (9) ◽  
pp. 4271-4285
Author(s):  
Nicola Bodini ◽  
Julie K. Lundquist ◽  
Mike Optis

Abstract. Current turbulence parameterizations in numerical weather prediction models at the mesoscale assume a local equilibrium between production and dissipation of turbulence. As this assumption does not hold at fine horizontal resolutions, improved ways to represent turbulent kinetic energy (TKE) dissipation rate (ϵ) are needed. Here, we use a 6-week data set of turbulence measurements from 184 sonic anemometers in complex terrain at the Perdigão field campaign to suggest improved representations of dissipation rate. First, we demonstrate that the widely used Mellor, Yamada, Nakanishi, and Niino (MYNN) parameterization of TKE dissipation rate leads to a large inaccuracy and bias in the representation of ϵ. Next, we assess the potential of machine-learning techniques to predict TKE dissipation rate from a set of atmospheric and terrain-related features. We train and test several machine-learning algorithms using the data at Perdigão, and we find that the models eliminate the bias MYNN currently shows in representing ϵ, while also reducing the average error by up to almost 40 %. Of all the variables included in the algorithms, TKE is the variable responsible for most of the variability of ϵ, and a strong positive correlation exists between the two. These results suggest further consideration of machine-learning techniques to enhance parameterizations of turbulence in numerical weather prediction models.


2021 ◽  
Author(s):  
Hrvoje Kalinić ◽  
Zvonimir Bilokapić ◽  
Frano Matić

<p>In certain measurement endeavours spatial resolution of the data is restricted, while in others data have poor temporal resolution. Typical example of these scenarios come from geoscience where measurement stations are fixed and scattered sparsely in space which results in poor spatial resolution of acquired data. Thus, we ask if it is possible to use a portion of data as a proxy to estimate the rest of the data using different machine learning techniques. In this study, four supervised machine learning methods are trained on the wind data from the Adriatic Sea and used to reconstruct the missing data. The vector wind data components at 10m height are taken from ERA5 reanalysis model in range from 1981 to 2017 and sampled every 6 hours. Data taken from the northern part of the Adriatic Sea was used to estimate the wind at the southern part of Adriatic. The machine learning models utilized for this task were linear regression, K-nearest neighbours, decision trees and a neural network. As a measure of quality of reconstruction the difference between the true and estimated values of wind data in the southern part of Adriatic was used. The result shows that all four models reconstruct the data few hundred kilometres away with average amplitude error below 1m/s. Linear regression, K-nearest neighbours, decision trees and a neural network show average amplitude reconstruction error of 0.52, 0.91, 0.76 and 0.73, and standard deviation of 1.00, 1.42, 1.23 and 1.17, respectively. This work has been supported by Croatian Science Foundation under the project UIP-2019-04-1737.</p>


2020 ◽  
Vol 12 (5) ◽  
pp. 854-864
Author(s):  
Mehdi Gholami Rostam ◽  
Seyyed Javad Sadatinejad ◽  
Arash Malekian

Energies ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 1975 ◽  
Author(s):  
Wei Dong ◽  
Qiang Yang ◽  
Xinli Fang

Accurate generation prediction at multiple time-steps is of paramount importance for reliable and economical operation of wind farms. This study proposed a novel algorithmic solution using various forms of machine learning techniques in a hybrid manner, including phase space reconstruction (PSR), input variable selection (IVS), K-means clustering and adaptive neuro-fuzzy inference system (ANFIS). The PSR technique transforms the historical time series into a set of phase-space variables combining with the numerical weather prediction (NWP) data to prepare candidate inputs. A minimal redundancy maximal relevance (mRMR) criterion based filtering approach is used to automatically select the optimal input variables for the multi-step ahead prediction. Then, the input instances are divided into a set of subsets using the K-means clustering to train the ANFIS. The ANFIS parameters are further optimized to improve the prediction performance by the use of particle swarm optimization (PSO) algorithm. The proposed solution is extensively evaluated through case studies of two realistic wind farms and the numerical results clearly confirm its effectiveness and improved prediction accuracy compared to benchmark solutions.


Sign in / Sign up

Export Citation Format

Share Document