scholarly journals Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 713 ◽  
Author(s):  
Yeonguk Yu ◽  
Yoon-Joong Kim

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.

2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Linqin Cai ◽  
Yaxin Hu ◽  
Jiangong Dong ◽  
Sitong Zhou

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3678
Author(s):  
Dongwon Lee ◽  
Minji Choi ◽  
Joohyun Lee

In this paper, we propose a prediction algorithm, the combination of Long Short-Term Memory (LSTM) and attention model, based on machine learning models to predict the vision coordinates when watching 360-degree videos in a Virtual Reality (VR) or Augmented Reality (AR) system. Predicting the vision coordinates while video streaming is important when the network condition is degraded. However, the traditional prediction models such as Moving Average (MA) and Autoregression Moving Average (ARMA) are linear so they cannot consider the nonlinear relationship. Therefore, machine learning models based on deep learning are recently used for nonlinear predictions. We use the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural network methods, originated in Recurrent Neural Networks (RNN), and predict the head position in the 360-degree videos. Therefore, we adopt the attention model to LSTM to make more accurate results. We also compare the performance of the proposed model with the other machine learning models such as Multi-Layer Perceptron (MLP) and RNN using the root mean squared error (RMSE) of predicted and real coordinates. We demonstrate that our model can predict the vision coordinates more accurately than the other models in various videos.


Author(s):  
Azim Heydari ◽  
Meysam Majidi Nezhad ◽  
Davide Astiaso Garcia ◽  
Farshid Keynia ◽  
Livio De Santoli

AbstractAir pollution monitoring is constantly increasing, giving more and more attention to its consequences on human health. Since Nitrogen dioxide (NO2) and sulfur dioxide (SO2) are the major pollutants, various models have been developed on predicting their potential damages. Nevertheless, providing precise predictions is almost impossible. In this study, a new hybrid intelligent model based on long short-term memory (LSTM) and multi-verse optimization algorithm (MVO) has been developed to predict and analysis the air pollution obtained from Combined Cycle Power Plants. In the proposed model, long short-term memory model is a forecaster engine to predict the amount of produced NO2 and SO2 by the Combined Cycle Power Plant, where the MVO algorithm is used to optimize the LSTM parameters in order to achieve a lower forecasting error. In addition, in order to evaluate the proposed model performance, the model has been applied using real data from a Combined Cycle Power Plant in Kerman, Iran. The datasets include wind speed, air temperature, NO2, and SO2 for five months (May–September 2019) with a time step of 3-h. In addition, the model has been tested based on two different types of input parameters: type (1) includes wind speed, air temperature, and different lagged values of the output variables (NO2 and SO2); type (2) includes just lagged values of the output variables (NO2 and SO2). The obtained results show that the proposed model has higher accuracy than other combined forecasting benchmark models (ENN-PSO, ENN-MVO, and LSTM-PSO) considering different network input variables. Graphic abstract


2021 ◽  
pp. 1-10
Author(s):  
Hye-Jeong Song ◽  
Tak-Sung Heo ◽  
Jong-Dae Kim ◽  
Chan-Young Park ◽  
Yu-Seop Kim

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.


2021 ◽  
pp. 1-17
Author(s):  
Enda Du ◽  
Yuetian Liu ◽  
Ziyan Cheng ◽  
Liang Xue ◽  
Jing Ma ◽  
...  

Summary Accurate production forecasting is an essential task and accompanies the entire process of reservoir development. With the limitation of prediction principles and processes, the traditional approaches are difficult to make rapid predictions. With the development of artificial intelligence, the data-driven model provides an alternative approach for production forecasting. To fully take the impact of interwell interference on production into account, this paper proposes a deep learning-based hybrid model (GCN-LSTM), where graph convolutional network (GCN) is used to capture complicated spatial patterns between each well, and long short-term memory (LSTM) neural network is adopted to extract intricate temporal correlations from historical production data. To implement the proposed model more efficiently, two data preprocessing procedures are performed: Outliers in the data set are removed by using a box plot visualization, and measurement noise is reduced by a wavelet transform. The robustness and applicability of the proposed model are evaluated in two scenarios of different data types with the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE). The results show that the proposed model can effectively capture spatial and temporal correlations to make a rapid and accurate oil production forecast.


Author(s):  
Preethi D. ◽  
Neelu Khare

This chapter presents an ensemble-based feature selection with long short-term memory (LSTM) model. A deep recurrent learning model is proposed for classifying network intrusion. This model uses ensemble-based feature selection (EFS) for selecting the appropriate features from the dataset and long short-term memory for the classification of network intrusions. The EFS combines five feature selection techniques, namely information gain, gain ratio, chi-square, correlation-based feature selection, and symmetric uncertainty-based feature selection. The experiments were conducted using the standard benchmark NSL-KDD dataset and implemented using tensor flow and python. The proposed model is evaluated using the classification performance metrics and also compared with all the 41 features without any feature selection as well as with each individual feature selection technique and classified using LSTM. The performance study showed that the proposed model performs better, with 99.8% accuracy, with a higher detection and lower false alarm rates.


2020 ◽  
pp. 1-15
Author(s):  
Hongchang Sun ◽  
Yadong wang ◽  
Lanqiang Niu ◽  
Fengyu Zhou ◽  
Heng Li

Building energy consumption (BEC) prediction is very important for energy management and conservation. This paper presents a short-term energy consumption prediction method that integrates the Fuzzy Rough Set (FRS) theory and the Long Short-Term Memory (LSTM) model, and is thus named FRS-LSTM. This method can find the most directly related factors from the complex and diverse factors influencing the energy consumption, which improves the prediction accuracy and efficiency. First, the FRS is used to reduce the redundancy of the input features by the attribute reduction of the factors affecting the energy consumption forecasting, and solves the data loss problem caused by the data discretization of a classical rough set. Then, the final attribute set after reduction is taken as the input of the LSTM networks to obtain the final prediction results. To validate the effectiveness of the proposed model, this study used the actual data of a public building to predict the building’s energy consumption, and compared the proposed model with the LSTM, Levenberg-Marquardt Back Propagation (LM-BP), and Support Vector Regression (SVR) models. The experimental results reveal that the presented FRS-LSTM model achieves higher prediction accuracy compared with other comparative models.


Sign in / Sign up

Export Citation Format

Share Document