Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Yeonguk Yu; Yoon-Joong Kim

doi:10.3390/electronics9050713

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Electronics ◽

10.3390/electronics9050713 ◽

2020 ◽

Vol 9 (5) ◽

pp. 713 ◽

Cited By ~ 3

Author(s):

Yeonguk Yu ◽

Yoon-Joong Kim

Keyword(s):

Emotion Recognition ◽

Motion Capture ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Main Study ◽

Short Term ◽

Attention Model ◽

Proposed Model ◽

Long Short Term Memory ◽

Spectrogram Feature

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.

Download Full-text

Audio-Textual Emotion Recognition Based on Improved Neural Networks

Mathematical Problems in Engineering ◽

10.1155/2019/2593036 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Linqin Cai ◽

Yaxin Hu ◽

Jiangong Dong ◽

Sitong Zhou

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Short Term Memory ◽

Recognition Accuracy ◽

Recognition System ◽

Speech Emotion Recognition ◽

Short Term ◽

Term Memory ◽

Emotional Recognition ◽

Long Short Term Memory

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.

Download Full-text

Attention-based convolution skip bidirectional long short-term memory network for speech emotion recognition

IEEE Access ◽

10.1109/access.2020.3047395 ◽

2020 ◽

pp. 1-1

Author(s):

Huiyun Zhang ◽

Heming Huang ◽

Henry Han

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

Download Full-text

Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory

2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA) ◽

10.1109/ic3ina.2018.8629525 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jeremia Jason Lasiman ◽

Dessi Puji Lestari

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipa.2017.8282315 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nattapong Kurpukdee ◽

Tomoki Koriyama ◽

Takao Kobayashi ◽

Sawit Kasuriya ◽

Chai Wutiwiwatchai ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machines ◽

Emotion Recognition ◽

Short Term Memory ◽

Speech Emotion Recognition ◽

Support Vector ◽

Short Term ◽

Term Memory ◽

Vector Machines ◽

Long Short Term Memory

Download Full-text

Prediction of Head Movement in 360-Degree Videos Using Attention Model

Sensors ◽

10.3390/s21113678 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3678

Author(s):

Dongwon Lee ◽

Minji Choi ◽

Joohyun Lee

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Moving Average ◽

The Other ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Attention Model ◽

Long Short Term Memory ◽

Machine Learning Models

In this paper, we propose a prediction algorithm, the combination of Long Short-Term Memory (LSTM) and attention model, based on machine learning models to predict the vision coordinates when watching 360-degree videos in a Virtual Reality (VR) or Augmented Reality (AR) system. Predicting the vision coordinates while video streaming is important when the network condition is degraded. However, the traditional prediction models such as Moving Average (MA) and Autoregression Moving Average (ARMA) are linear so they cannot consider the nonlinear relationship. Therefore, machine learning models based on deep learning are recently used for nonlinear predictions. We use the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural network methods, originated in Recurrent Neural Networks (RNN), and predict the head position in the 360-degree videos. Therefore, we adopt the attention model to LSTM to make more accurate results. We also compare the performance of the proposed model with the other machine learning models such as Multi-Layer Perceptron (MLP) and RNN using the root mean squared error (RMSE) of predicted and real coordinates. We demonstrate that our model can predict the vision coordinates more accurately than the other models in various videos.

Download Full-text

Air pollution forecasting application based on deep learning model and optimization algorithm

Clean Technologies and Environmental Policy ◽

10.1007/s10098-021-02080-5 ◽

2021 ◽

Author(s):

Azim Heydari ◽

Meysam Majidi Nezhad ◽

Davide Astiaso Garcia ◽

Farshid Keynia ◽

Livio De Santoli

Keyword(s):

Air Pollution ◽

Wind Speed ◽

Power Plant ◽

Air Temperature ◽

Short Term Memory ◽

Combined Cycle ◽

Short Term ◽

Term Memory ◽

Proposed Model ◽

Long Short Term Memory

AbstractAir pollution monitoring is constantly increasing, giving more and more attention to its consequences on human health. Since Nitrogen dioxide (NO2) and sulfur dioxide (SO2) are the major pollutants, various models have been developed on predicting their potential damages. Nevertheless, providing precise predictions is almost impossible. In this study, a new hybrid intelligent model based on long short-term memory (LSTM) and multi-verse optimization algorithm (MVO) has been developed to predict and analysis the air pollution obtained from Combined Cycle Power Plants. In the proposed model, long short-term memory model is a forecaster engine to predict the amount of produced NO2 and SO2 by the Combined Cycle Power Plant, where the MVO algorithm is used to optimize the LSTM parameters in order to achieve a lower forecasting error. In addition, in order to evaluate the proposed model performance, the model has been applied using real data from a Combined Cycle Power Plant in Kerman, Iran. The datasets include wind speed, air temperature, NO2, and SO2 for five months (May–September 2019) with a time step of 3-h. In addition, the model has been tested based on two different types of input parameters: type (1) includes wind speed, air temperature, and different lagged values of the output variables (NO2 and SO2); type (2) includes just lagged values of the output variables (NO2 and SO2). The obtained results show that the proposed model has higher accuracy than other combined forecasting benchmark models (ENN-PSO, ENN-MVO, and LSTM-PSO) considering different network input variables. Graphic abstract

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

Production Forecasting with the Interwell Interference by Integrating Graph Convolutional and Long Short-Term Memory Neural Network

SPE Reservoir Evaluation & Engineering ◽

10.2118/208596-pa ◽

2021 ◽

pp. 1-17

Author(s):

Enda Du ◽

Yuetian Liu ◽

Ziyan Cheng ◽

Liang Xue ◽

Jing Ma ◽

...

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Production Forecasting ◽

Temporal Correlations ◽

Proposed Model ◽

The Mean ◽

Long Short Term Memory ◽

The Impact

Summary Accurate production forecasting is an essential task and accompanies the entire process of reservoir development. With the limitation of prediction principles and processes, the traditional approaches are difficult to make rapid predictions. With the development of artificial intelligence, the data-driven model provides an alternative approach for production forecasting. To fully take the impact of interwell interference on production into account, this paper proposes a deep learning-based hybrid model (GCN-LSTM), where graph convolutional network (GCN) is used to capture complicated spatial patterns between each well, and long short-term memory (LSTM) neural network is adopted to extract intricate temporal correlations from historical production data. To implement the proposed model more efficiently, two data preprocessing procedures are performed: Outliers in the data set are removed by using a box plot visualization, and measurement noise is reduced by a wavelet transform. The robustness and applicability of the proposed model are evaluated in two scenarios of different data types with the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE). The results show that the proposed model can effectively capture spatial and temporal correlations to make a rapid and accurate oil production forecast.

Download Full-text

Ensemble-Based Feature Selection With Long Short-Term Memory for Classification of Network Intrusion

Advances in Social Networking and Online Communities - E-Collaboration Technologies and Strategies for Competitive Advantage Amid Challenging Times ◽

10.4018/978-1-7998-7764-6.ch008 ◽

2021 ◽

pp. 228-245

Author(s):

Preethi D. ◽

Neelu Khare

Keyword(s):

Feature Selection ◽

Performance Metrics ◽

Short Term Memory ◽

Short Term ◽

Chi Square ◽

Term Memory ◽

Network Intrusion ◽

Proposed Model ◽

Long Short Term Memory

This chapter presents an ensemble-based feature selection with long short-term memory (LSTM) model. A deep recurrent learning model is proposed for classifying network intrusion. This model uses ensemble-based feature selection (EFS) for selecting the appropriate features from the dataset and long short-term memory for the classification of network intrusions. The EFS combines five feature selection techniques, namely information gain, gain ratio, chi-square, correlation-based feature selection, and symmetric uncertainty-based feature selection. The experiments were conducted using the standard benchmark NSL-KDD dataset and implemented using tensor flow and python. The proposed model is evaluated using the classification performance metrics and also compared with all the 41 features without any feature selection as well as with each individual feature selection technique and classified using LSTM. The performance study showed that the proposed model performs better, with 99.8% accuracy, with a higher detection and lower false alarm rates.

Download Full-text

A novel fuzzy rough set based long short-term memory integration model for energy consumption prediction of public buildings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201857 ◽

2020 ◽

pp. 1-15

Author(s):

Hongchang Sun ◽

Yadong wang ◽

Lanqiang Niu ◽

Fengyu Zhou ◽

Heng Li

Keyword(s):

Energy Consumption ◽

Rough Set ◽

Prediction Accuracy ◽

Short Term Memory ◽

Short Term ◽

Fuzzy Rough Set ◽

Proposed Model ◽

Long Short Term Memory ◽

Energy Consumption Prediction ◽

Consumption Prediction

Building energy consumption (BEC) prediction is very important for energy management and conservation. This paper presents a short-term energy consumption prediction method that integrates the Fuzzy Rough Set (FRS) theory and the Long Short-Term Memory (LSTM) model, and is thus named FRS-LSTM. This method can find the most directly related factors from the complex and diverse factors influencing the energy consumption, which improves the prediction accuracy and efficiency. First, the FRS is used to reduce the redundancy of the input features by the attribute reduction of the factors affecting the energy consumption forecasting, and solves the data loss problem caused by the data discretization of a classical rough set. Then, the final attribute set after reduction is taken as the input of the LSTM networks to obtain the final prediction results. To validate the effectiveness of the proposed model, this study used the actual data of a public building to predict the building’s energy consumption, and compared the proposed model with the LSTM, Levenberg-Marquardt Back Propagation (LM-BP), and Support Vector Regression (SVR) models. The experimental results reveal that the presented FRS-LSTM model achieves higher prediction accuracy compared with other comparative models.

Download Full-text