scholarly journals Combining ensemble Kalman filter and reservoir computing to predict spatiotemporal chaotic systems from imperfect observations and models

2021 ◽  
Vol 14 (9) ◽  
pp. 5623-5635 ◽  
Author(s):  
Futo Tomizawa ◽  
Yohei Sawada

Abstract. Prediction of spatiotemporal chaotic systems is important in various fields, such as numerical weather prediction (NWP). While data assimilation methods have been applied in NWP, machine learning techniques, such as reservoir computing (RC), have recently been recognized as promising tools to predict spatiotemporal chaotic systems. However, the sensitivity of the skill of the machine-learning-based prediction to the imperfectness of observations is unclear. In this study, we evaluate the skill of RC with noisy and sparsely distributed observations. We intensively compare the performances of RC and local ensemble transform Kalman filter (LETKF) by applying them to the prediction of the Lorenz 96 system. In order to increase the scalability to larger systems, we applied a parallelized RC framework. Although RC can successfully predict the Lorenz 96 system if the system is perfectly observed, we find that RC is vulnerable to observation sparsity compared with LETKF. To overcome this limitation of RC, we propose to combine LETKF and RC. In our proposed method, the system is predicted by RC that learned the analysis time series estimated by LETKF. Our proposed method can successfully predict the Lorenz 96 system using noisy and sparsely distributed observations. Most importantly, our method can predict better than LETKF when the process-based model is imperfect.

2020 ◽  
Author(s):  
Futo Tomizawa ◽  
Yohei Sawada

Abstract. Prediction of spatio-temporal chaotic systems is important in various fields, such as Numerical Weather Prediction (NWP). While data assimilation methods have been applied in NWP, machine learning techniques, such as Reservoir Computing (RC), are recently recognized as promising tools to predict spatio-temporal chaotic systems. However, the sensitivity of the skill of the machine learning based prediction to the imperfectness of observations is unclear. In this study, we evaluate the skill of RC with noisy and sparsely distributed observations. We intensively compare the performances of RC and Local Ensemble Transform Kalman Filter (LETKF) by applying them to the prediction of the Lorenz 96 system. Although RC can successfully predict the Lorenz 96 system if the system is perfectly observed, we find that RC is vulnerable to observation sparsity compared with LETKF. To overcome this limitation of RC, we propose to combine LETKF and RC. In our proposed method, the system is predicted by RC that learned the analysis time series estimated by LETKF. Our proposed method can successfully predict the Lorenz 96 system using noisy and sparsely distributed observations. Most importantly, our method can predict better than LETKF when the process-based model is imperfect.


2021 ◽  
Author(s):  
Natacha Galmiche ◽  
Nello Blaser ◽  
Morten Brun ◽  
Helwig Hauser ◽  
Thomas Spengler ◽  
...  

<p>Probability distributions based on ensemble forecasts are commonly used to assess uncertainty in weather prediction. However, interpreting these distributions is not trivial, especially in the case of multimodality with distinct likely outcomes. The conventional summary employs mean and standard deviation across ensemble members, which works well for unimodal, Gaussian-like distributions. In the case of multimodality this misleads, discarding crucial information. </p><p>We aim at combining previously developed clustering algorithms in machine learning and topological data analysis to extract useful information such as the number of clusters in an ensemble. Given the chaotic behaviour of the atmosphere, machine learning techniques can provide relevant results even if no, or very little, a priori information about the data is available. In addition, topological methods that analyse the shape of the data can make results explainable.</p><p>Given an ensemble of univariate time series, a graph is generated whose edges and vertices represent clusters of members, including additional information for each cluster such as the members belonging to them, their uncertainty, and their relevance according to the graph. In the case of multimodality, this approach provides relevant and quantitative information beyond the commonly used mean and standard deviation approach that helps to further characterise the predictability.</p>


2020 ◽  
Author(s):  
Nicola Bodini ◽  
Julie K. Lundquist ◽  
Mike Optis

Abstract. Current turbulence parameterizations in numerical weather prediction models at the mesoscale assume a local equilibrium between production and dissipation of turbulence. As this assumption does not hold at fine horizontal resolutions, improved ways to represent turbulent kinetic energy (TKE) dissipation rate (ε) are needed. Here, we use a 6-week data set of turbulence measurements from 184 sonic anemometers in complex terrain at the Perdigão field campaign to suggest improved representations of dissipation rate. First, we demonstrate that a widely used Mellor, Yamada, Nakanishi, and Niino (MYNN) parameterization of TKE dissipation rate leads to a large inaccuracy and bias in the representation of ε. Next, we assess the potential of machine-learning techniques to predict TKE dissipation rate from a set of atmospheric and terrain-related features. We train and test several machine-learning algorithms using the data at Perdigão, and we find that multivariate polynomial regressions and random forests can eliminate the bias MYNN currently shows in representing ε, while also reducing the average error by up to 30 %. Of all the variables included in the algorithms, TKE is the variable responsible for most of the variability of ε, and a strong positive correlation exists between the two. These results suggest further consideration of machine-learning techniques to enhance parameterizations of turbulence in numerical weather prediction models.


2020 ◽  
Vol 13 (9) ◽  
pp. 4271-4285
Author(s):  
Nicola Bodini ◽  
Julie K. Lundquist ◽  
Mike Optis

Abstract. Current turbulence parameterizations in numerical weather prediction models at the mesoscale assume a local equilibrium between production and dissipation of turbulence. As this assumption does not hold at fine horizontal resolutions, improved ways to represent turbulent kinetic energy (TKE) dissipation rate (ϵ) are needed. Here, we use a 6-week data set of turbulence measurements from 184 sonic anemometers in complex terrain at the Perdigão field campaign to suggest improved representations of dissipation rate. First, we demonstrate that the widely used Mellor, Yamada, Nakanishi, and Niino (MYNN) parameterization of TKE dissipation rate leads to a large inaccuracy and bias in the representation of ϵ. Next, we assess the potential of machine-learning techniques to predict TKE dissipation rate from a set of atmospheric and terrain-related features. We train and test several machine-learning algorithms using the data at Perdigão, and we find that the models eliminate the bias MYNN currently shows in representing ϵ, while also reducing the average error by up to almost 40 %. Of all the variables included in the algorithms, TKE is the variable responsible for most of the variability of ϵ, and a strong positive correlation exists between the two. These results suggest further consideration of machine-learning techniques to enhance parameterizations of turbulence in numerical weather prediction models.


Author(s):  
Sumit Kumar ◽  
Sanlap Acharya

The prediction of stock prices has always been a very challenging problem for investors. Using machine learning techniques to predict stock prices is also one of the favourite topics for academics working in this domain. This chapter discusses five supervised learning techniques and two unsupervised learning techniques to solve the problem of stock price prediction and has compared the performances of all the algorithms. Among the supervised learning techniques, Long Short-Term Memory (LSTM) algorithm performed better than the others whereas, among the unsupervised learning techniques, Restricted Boltzmann Machine (RBM) performed better. RBM is found to be performing even better than LSTM.


Author(s):  
Muaz Gultekin ◽  
Oya Kalipsiz

Until now, numerous effort estimation models for software projects have been developed, most of them producing accurate results but not providing the flexibility to decision makers during the software development process. The main objective of this study is to objectively and accurately estimate the effort when using the Scrum methodology. A dynamic effort estimation model is developed by using regression-based machine learning algorithms. Story point as a unit of measure is used for estimating the effort involved in an issue. Projects are divided into phases and the phases are respectively divided into iterations and issues. Effort estimation is performed for each issue, then the total effort is calculated with aggregate functions respectively for iteration, phase and project. This architecture of our model provides flexibility to decision makers in any case of deviation from the project plan. An empirical evaluation demonstrates that the error rate of our story point-based estimation model is better than others.


Energies ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 1975 ◽  
Author(s):  
Wei Dong ◽  
Qiang Yang ◽  
Xinli Fang

Accurate generation prediction at multiple time-steps is of paramount importance for reliable and economical operation of wind farms. This study proposed a novel algorithmic solution using various forms of machine learning techniques in a hybrid manner, including phase space reconstruction (PSR), input variable selection (IVS), K-means clustering and adaptive neuro-fuzzy inference system (ANFIS). The PSR technique transforms the historical time series into a set of phase-space variables combining with the numerical weather prediction (NWP) data to prepare candidate inputs. A minimal redundancy maximal relevance (mRMR) criterion based filtering approach is used to automatically select the optimal input variables for the multi-step ahead prediction. Then, the input instances are divided into a set of subsets using the K-means clustering to train the ANFIS. The ANFIS parameters are further optimized to improve the prediction performance by the use of particle swarm optimization (PSO) algorithm. The proposed solution is extensively evaluated through case studies of two realistic wind farms and the numerical results clearly confirm its effectiveness and improved prediction accuracy compared to benchmark solutions.


2021 ◽  
Author(s):  
Moohanad Jawthari ◽  
Veronika Stoffová

AbstractThe target (dependent) variable is often influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables in classification analysis. Majority of machine learning techniques accept only numerical inputs. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. If the variable does not have relation or order between its values, assigning numbers will mislead the machine learning techniques. This paper presents a modified k-nearest-neighbors algorithm that calculates the distances values of categorical (nominal) variables without encoding them. A student’s academic performance dataset is used for testing the enhanced algorithm. It shows that the proposed algorithm outperforms standard one that needs nominal variables encoding to calculate the distance between the nominal variables. The results show the proposed algorithm preforms 14% better than standard one in accuracy, and it is not sensitive to outliers.


Author(s):  
Nurmi Hidayasari ◽  
Imam Riadi ◽  
Yudi Prayudi

Steganalysis method is used to detect the presence or absence of steganography files or can be referred to anti-steganography. Steganalysis can be used for positive purposes, which is to know the weaknesses of a steganography method, so that improvements can be made. One category of steganalysis is blind steganalysis, which is a way to detect secret files without knowing what steganography method is used. Blind steganalysis is difficult to implement, but then machine learning techniques emerged that could be used to create a detection model using experimental data, one of which is Convolutional Neural Networks (CNN). A study proposes that the CNN method can detect steganography files using the latest method with a low error probability value compared to other methods, CNN Yedroudj-net. As one of the steganalysis methods with the latest machine learning steganalysis techniques, an experiment is needed to find out whether Yedroudj-net can be a steganalysis for the output of many tools commonly used for steganography applications. Knowing the performance of CNN Yedroudj-net on several steganography tools is very important, to measure the level of ability in terms of steganalysis of some of these tools. Especially so far, machine learning performance is still doubtful in blind steganalysis. Plus some previous research only focused on certain methods to prove the performance of the proposed technique, including Yedroudj-net. This study will use five tools that are Hide In Picture (HIP), OpenStego, SilentEye, Steg and S-Tools, which are not known exactly what steganography methods are used on the tools. Yedroudj-net method will be implemented in the steganography file from the output of the five tools. Then a comparison with the popular steganalysis tool is used, StegSpy. The results show that Yedroudj-net is quite capable of detecting the presence of steganography files, slightly better than StegSpy.


Sign in / Sign up

Export Citation Format

Share Document