scholarly journals Analysis of Gradient Vanishing of RNNs and Performance Comparison

Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 442
Author(s):  
Seol-Hyun Noh

A recurrent neural network (RNN) combines variable-length input data with a hidden state that depends on previous time steps to generate output data. RNNs have been widely used in time-series data analysis, and various RNN algorithms have been proposed, such as the standard RNN, long short-term memory (LSTM), and gated recurrent units (GRUs). In particular, it has been experimentally proven that LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. The learning ability is a measure of the effectiveness of gradient of error information that would be backpropagated. This study provided a theoretical and experimental basis for the result that LSTM and GRU have more efficient gradient descent than the standard RNN by analyzing and experimenting the gradient vanishing of the standard RNN, LSTM, and GRU. As a result, LSTM and GRU are robust to the degradation of gradient descent even when LSTM and GRU learn long-range input data, which means that the learning ability of LSTM and GRU is greater than standard RNN when learning long-range input data. Therefore, LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. In addition, it was verified whether the experimental results of river-level prediction models, solar power generation prediction models, and speech signal models using the standard RNN, LSTM, and GRUs are consistent with the analysis results of gradient vanishing.

Open Physics ◽  
2021 ◽  
Vol 19 (1) ◽  
pp. 618-627
Author(s):  
Weixing Song ◽  
Jingjing Wu ◽  
Jianshe Kang ◽  
Jun Zhang

Abstract The aim of this study was to improve the low accuracy of equipment spare parts requirement predicting, which affects the quality and efficiency of maintenance support, based on the summary and analysis of the existing spare parts requirement predicting research. This article introduces the current latest popular long short-term memory (LSTM) algorithm which has the best effect on time series data processing to equipment spare parts requirement predicting, according to the time series characteristics of spare parts consumption data. A method for predicting the requirement for maintenance spare parts based on the LSTM recurrent neural network is proposed, and the network structure is designed in detail, the realization of network training and network prediction is given. The advantages of particle swarm algorithm are introduced to optimize the network parameters, and actual data of three types of equipment spare parts consumption are used for experiments. The performance comparison of predictive models such as BP neural network, generalized regression neural network, wavelet neural network, and squeeze-and-excitation network prove that the new method is effective and provides an effective method for scientifically predicting the requirement for maintenance spare parts and improving the quality of equipment maintenance.


Agriculture ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 612
Author(s):  
Helin Yin ◽  
Dong Jin ◽  
Yeong Hyeon Gu ◽  
Chang Jin Park ◽  
Sang Keun Han ◽  
...  

It is difficult to forecast vegetable prices because they are affected by numerous factors, such as weather and crop production, and the time-series data have strong non-linear and non-stationary characteristics. To address these issues, we propose the STL-ATTLSTM (STL-Attention-based LSTM) model, which integrates the seasonal trend decomposition using the Loess (STL) preprocessing method and attention mechanism based on long short-term memory (LSTM). The proposed STL-ATTLSTM forecasts monthly vegetable prices using various types of information, such as vegetable prices, weather information of the main production areas, and market trading volumes. The STL method decomposes time-series vegetable price data into trend, seasonality, and remainder components. It uses the remainder component by removing the trend and seasonality components. In the model training process, attention weights are assigned to all input variables; thus, the model’s prediction performance is improved by focusing on the variables that affect the prediction results. The proposed STL-ATTLSTM was applied to five crops, namely cabbage, radish, onion, hot pepper, and garlic, and its performance was compared to three benchmark models (i.e., LSTM, attention LSTM, and STL-LSTM). The performance results show that the LSTM model combined with the STL method (STL-LSTM) achieved a 12% higher prediction accuracy than the attention LSTM model that did not use the STL method and solved the prediction lag arising from high seasonality. The attention LSTM model improved the prediction accuracy by approximately 4% to 5% compared to the LSTM model. The STL-ATTLSTM model achieved the best performance, with an average root mean square error (RMSE) of 380, and an average mean absolute percentage error (MAPE) of 7%.


Computers ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 99
Author(s):  
Sultan Daud Khan ◽  
Louai Alarabi ◽  
Saleh Basalamah

COVID-19 caused the largest economic recession in the history by placing more than one third of world’s population in lockdown. The prolonged restrictions on economic and business activities caused huge economic turmoil that significantly affected the financial markets. To ease the growing pressure on the economy, scientists proposed intermittent lockdowns commonly known as “smart lockdowns”. Under smart lockdown, areas that contain infected clusters of population, namely hotspots, are placed on lockdown, while economic activities are allowed to operate in un-infected areas. In this study, we proposed a novel deep learning prediction framework for the accurate prediction of hotpots. We exploit the benefits of two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) and propose a hybrid framework that has the ability to extract multi time-scale features from convolutional layers of CNN. The multi time-scale features are then concatenated and provide as input to 2-layers LSTM model. The LSTM model identifies short, medium and long-term dependencies by learning the representation of time-series data. We perform a series of experiments and compare the proposed framework with other state-of-the-art statistical and machine learning based prediction models. From the experimental results, we demonstrate that the proposed framework beats other existing methods with a clear margin.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4466
Author(s):  
Li Guo ◽  
Runze Li ◽  
Bin Jiang

The monitoring of electrical equipment and power grid systems is very essential and important for power transmission and distribution. It has great significances for predicting faults based on monitoring a long sequence in advance, so as to ensure the safe operation of the power system. Many studies such as recurrent neural network (RNN) and long short-term memory (LSTM) network have shown an outstanding ability in increasing the prediction accuracy. However, there still exist some limitations preventing those methods from predicting long time-series sequences in real-world applications. To address these issues, a data-driven method using an improved stacked-Informer network is proposed, and it is used for electrical line trip faults sequence prediction in this paper. This method constructs a stacked-Informer network to extract underlying features of long sequence time-series data well, and combines the gradient centralized (GC) technology with the optimizer to replace the previously used Adam optimizer in the original Informer network. It has a superior generalization ability and faster training efficiency. Data sequences used for the experimental validation are collected from the wind and solar hybrid substation located in Zhangjiakou city, China. The experimental results and concrete analysis prove that the presented method can improve fault sequence prediction accuracy and achieve fast training in real scenarios.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yao Li

Faults occurring in the production line can cause many losses. Predicting the fault events before they occur or identifying the causes can effectively reduce such losses. A modern production line can provide enough data to solve the problem. However, in the face of complex industrial processes, this problem will become very difficult depending on traditional methods. In this paper, we propose a new approach based on a deep learning (DL) algorithm to solve the problem. First, we regard these process data as a spatial sequence according to the production process, which is different from traditional time series data. Second, we improve the long short-term memory (LSTM) neural network in an encoder-decoder model to adapt to the branch structure, corresponding to the spatial sequence. Meanwhile, an attention mechanism (AM) algorithm is used in fault detection and cause identification. Third, instead of traditional biclassification, the output is defined as a sequence of fault types. The approach proposed in this article has two advantages. On the one hand, treating data as a spatial sequence rather than a time sequence can overcome multidimensional problems and improve prediction accuracy. On the other hand, in the trained neural network, the weight vectors generated by the AM algorithm can represent the correlation between faults and the input data. This correlation can help engineers identify the cause of faults. The proposed approach is compared with some well-developed fault diagnosing methods in the Tennessee Eastman process. Experimental results show that the approach has higher prediction accuracy, and the weight vector can accurately label the factors that cause faults.


2021 ◽  
Vol 2068 (1) ◽  
pp. 012041
Author(s):  
Lingyun Duan ◽  
Ziyuan Liu ◽  
Wen Yu ◽  
Wei Chen ◽  
Dongyan Jin ◽  
...  

Abstract Comparing the prediction effects of traditional econometric algorithm model and deep learning algorithm model, taking regional GDP as an example, two prediction models of ARMA-ECM and LSTM-SVR are established for prediction, and the prediction results of different models are compared and analyzed. The results show that there are some deviations in the prediction results of the two models, but the prediction trends are the same. The prediction accuracy of LSTM-SVR model will decrease significantly with the reduction of time series data samples, while ARMA-ECM model is not so sensitive.


Author(s):  
Takeru Aoki ◽  
◽  
Keiki Takadama ◽  
Hiroyuki Sato

The cortical learning algorithm (CLA) is a time-series data prediction method that is designed based on the human neocortex. The CLA has multiple columns that are associated with the input data bits by synapses. The input data is then converted into an internal column representation based on the synapse relation. Because the synapse relation between the columns and input data bits is fixed during the entire prediction process in the conventional CLA, it cannot adapt to input data biases. Consequently, columns not used for internal representations arise, resulting in a low prediction accuracy in the conventional CLA. To improve the prediction accuracy of the CLA, we propose a CLA that self-adaptively arranges the column synapses according to the input data tendencies and verify its effectiveness with several artificial time-series data and real-world electricity load prediction data from New York City. Experimental results show that the proposed CLA achieves higher prediction accuracy than the conventional CLA and LSTMs with different network optimization algorithms by arranging column synapses according to the input data tendency.


Electronics ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 876 ◽  
Author(s):  
Renzhuo Wan ◽  
Shuping Mei ◽  
Jun Wang ◽  
Min Liu ◽  
Fan Yang

Multivariable time series prediction has been widely studied in power energy, aerology, meteorology, finance, transportation, etc. Traditional modeling methods have complex patterns and are inefficient to capture long-term multivariate dependencies of data for desired forecasting accuracy. To address such concerns, various deep learning models based on Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) methods are proposed. To improve the prediction accuracy and minimize the multivariate time series data dependence for aperiodic data, in this article, Beijing PM2.5 and ISO-NE Dataset are analyzed by a novel Multivariate Temporal Convolution Network (M-TCN) model. In this model, multi-variable time series prediction is constructed as a sequence-to-sequence scenario for non-periodic datasets. The multichannel residual blocks in parallel with asymmetric structure based on deep convolution neural network is proposed. The results are compared with rich competitive algorithms of long short term memory (LSTM), convolutional LSTM (ConvLSTM), Temporal Convolution Network (TCN) and Multivariate Attention LSTM-FCN (MALSTM-FCN), which indicate significant improvement of prediction accuracy, robust and generalization of our model.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Rusul L. Abduljabbar ◽  
Hussein Dia ◽  
Pei-Wei Tsai

This paper presents the development and evaluation of short-term traffic prediction models using unidirectional and bidirectional deep learning long short-term memory (LSTM) neural networks. The unidirectional LSTM (Uni-LSTM) model provides high performance through its ability to recognize longer sequences of traffic time series data. In this work, Uni-LSTM is extended to bidirectional LSTM (BiLSTM) networks which train the input data twice through forward and backward directions. The paper presents a comparative evaluation of the two models for short-term speed and traffic flow prediction using a common dataset of field observations collected from multiple freeways in Australia. The results showed BiLSTM performed better for variable prediction horizons for both speed and flow. Stacked and mixed Uni-LSTM and BiLSTM models were also investigated for 15-minute prediction horizons resulting in improved accuracy when using 4-layer BiLSTM networks. The optimized 4-layer BiLSTM model was then calibrated and validated for multiple prediction horizons using data from three different freeways. The validation results showed a high degree of prediction accuracy exceeding 90% for speeds up to 60-minute prediction horizons. For flow, the model achieved accuracies above 90% for 5- and 10-minute prediction horizons and more than 80% accuracy for 15- and 30-minute prediction horizons. These findings extend the set of AI models available for road operators and provide them with confidence in applying robust models that have been tested and evaluated on different freeways in Australia.


2020 ◽  
Vol 9 (2) ◽  
Author(s):  
Dev Patel ◽  
Krish Patel ◽  
Charles Dela Cuesta

The US stock market is an integral part of modern society. Nearly 55% of Americans  own corporate shares in the US stock market (What Percentage of Americans Own Stock?, 2019), and as of June 30th, 2020, the total value of the US stock market was over 35 trillion USD (Total Market Value of U.S. Stock Market, 2020). The stock market is also extremely volatile, and many people have gone bankrupt from poor investments. To minimize the risk and capitalize off the massive amounts of data on corporations and share prices present in the world, algorithmic trading began to rise. Trading algorithms have the potential for huge returns, and while many algorithms employ strategies like Long-Short Equity, very few attempt to use machine learning due to the unpredictable nature of the stock market. Many time series prediction models like autoregressive integrated moving average (ARIMA), and even neural networks like long short term memory (LSTMs) often fail when predicting stock market data, because unlike other time series data, the stock market is almost never univariate, or follows seasonal trends. However, where other models come short, echo state networks (ESNs) excel, due to their reservoir like computing model, which allows them to perform better on messy, non traditional time series data. Using a combination of ESNs to predict prices, and clustering we created an algorithm model that can predict trends with over 95% confidence, but had mixed results accurately predicting returns.


Sign in / Sign up

Export Citation Format

Share Document