RNN-based Dimensional Speech Emotion Recognition

Mapping Intimacies ◽

10.31227/osf.io/wa3vp ◽

2020 ◽

Author(s):

Bagus Tris Atmaja

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Mean Squared Error ◽

Absolute Error ◽

Recognition System ◽

Speech Emotion Recognition ◽

Percentage Error ◽

Concordance Correlation ◽

Acoustic Feature ◽

Dense System

◆ A speech emotion recognition system based on recurrent neural networks is developed using long short-term memory networks.◆ Two of acoustic feature sets are evaluated: 31 Features (3 time-domain features, 5 frequency-domain features, 13 MFCCs, 5 F0s, and 5 Harmonics) and eGeMaps feature set (23 features).◆ To evaluate the performance, some metrics are used i.e. mean squared error (MSE), mean absolute percentage error (MAPE), mean absolute error (MAE) and concordance correlation coefficient (CCC). Among those metrics, CCC is main focus as it is used by other researchers.◆ The developed system used multi-task learning to maximize arousal, valence, and dominance at the same time using CCC loss (1 - CCC). The result shows using LSTM networks improve the CCC score compared to baseline dense system. The best CCC score isobtained on arousal followed by dominance and valence.

Download Full-text

Audio-Textual Emotion Recognition Based on Improved Neural Networks

Mathematical Problems in Engineering ◽

10.1155/2019/2593036 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Linqin Cai ◽

Yaxin Hu ◽

Jiangong Dong ◽

Sitong Zhou

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Short Term Memory ◽

Recognition Accuracy ◽

Recognition System ◽

Speech Emotion Recognition ◽

Short Term ◽

Term Memory ◽

Emotional Recognition ◽

Long Short Term Memory

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.

Download Full-text

COVID-19 Infection Forecasting based on Deep Learning in Iran

10.1101/2020.05.16.20104182 ◽

2020 ◽

Cited By ~ 3

Author(s):

Mehdi Azarafza ◽

Mohammad Azarafza ◽

Jafar Tanha

Keyword(s):

Short Term Memory ◽

Mean Squared Error ◽

Moving Average ◽

National Level ◽

Absolute Error ◽

Percentage Error ◽

Autoregressive Integrated Moving Average ◽

Error Metrics ◽

Squared Error ◽

Better Than

Since December 2019 coronavirus disease (COVID-19) is outbreak from China and infected more than 4,666,000 people and caused thousands of deaths. Unfortunately, the infection numbers and deaths are still increasing rapidly which has put the world on the catastrophic abyss edge. Application of artificial intelligence and spatiotemporal distribution techniques can play a key role to infection forecasting in national and province levels in many countries. As methodology, the presented study employs long short-term memory-based deep for time series forecasting, the confirmed cases in both national and province levels, in Iran. The data were collected from February 19, to March 22, 2020 in provincial level and from February 19, to May 13, 2020 in national level by nationally recognised sources. For justification, we use the recurrent neural network, seasonal autoregressive integrated moving average, Holt winter's exponential smoothing, and moving averages approaches. Furthermore, the mean absolute error, mean squared error, and mean absolute percentage error metrics are used as evaluation factors with associate the trend analysis. The results of our experiments show that the LSTM model is performed better than the other methods on the collected COVID-19 dataset in Iran

Download Full-text

Analysis of canopy phenology in man-made forests using near-earth remote sensing

Plant Methods ◽

10.1186/s13007-021-00803-9 ◽

2021 ◽

Vol 17 (1) ◽

Author(s):

Peng Guan ◽

Yili Zheng ◽

Guannan Lei

Keyword(s):

Remote Sensing ◽

Color Index ◽

Short Term Memory ◽

Mean Squared Error ◽

Absolute Error ◽

Percentage Error ◽

Earth Remote Sensing ◽

Squared Error ◽

Highly Sensitive ◽

Different Color

Abstract Background Forest canopies are highly sensitive to their growth, health, and climate change. The study aims to obtain time sequence images in mix foresters using a near-earth remote sensing method to track the seasonal variation in the color index and select the optimal color index. Three different regions of interest (RIOs) were defined and six color indexes (GRVI, HUE, GGR, RCC, GCC, and GEI) were calculated to analyze the microenvironment difference. The key phenological phase was identified using the double logistic model and the derivative method, and the phenology forecast of color indexes was performed based on the long short-term memory (LSTM) model. Results The results showed that the same color index in different RIOs and different color indexes in the same RIO present a slight difference in the days of growth and the days corresponding to the peak value, exhibiting different phenological phases; the mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) of the LSTM model was 0.0016, 0.0405, 0.0334, and 12.55%, respectively, indicating that this model has a good forecast effect. Conclusions In different areas of the same forest, differences in the micro-ecological environment in the canopies were prevalent, with their internal growth mechanism being affected by different cultivation ways and the external environment. Besides, the optimal color index also varies with species in phenological response, that is, different color indexes are used for different forests. With the data of color indexes as the training set and forecast set, the feasibility of the LSTM model in phenology forecast is verified.

Download Full-text

Speech Emotion Recognition using Time Distributed CNN and LSTM

ITM Web of Conferences ◽

10.1051/itmconf/20214003006 ◽

2021 ◽

Vol 40 ◽

pp. 03006

Author(s):

Beenaa Salian ◽

Omkar Narvade ◽

Rujuta Tambewagh ◽

Smita Bharne

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Short Term Memory ◽

Recognition System ◽

Speech Emotion Recognition ◽

Audio Analysis ◽

The Neural Network ◽

Testing Accuracy ◽

Characteristic Features ◽

Four Blocks

Speech has several distinguishing characteristic features which has remained a state-of-the-art tool for extracting valuable information from audio samples. Our aim is to develop a emotion recognition system using these speech features, which would be able to accurately and efficiently recognize emotions through audio analysis. In this article, we have employed a hybrid neural network comprising four blocks of time distributed convolutional layers followed by a layer of Long Short Term Memory to achieve the same.The audio samples for the speech dataset are collectively assembled from RAVDESS, TESS and SAVEE audio datasets and are further augmented by injecting noise. Mel Spectrograms are computed from audio samples and are used to train the neural network. We have been able to achieve a testing accuracy of about 89.26%.

Download Full-text

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Sensors ◽

10.3390/s21051579 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1579 ◽

Cited By ~ 1

Author(s):

Kyoung Ju Noh ◽

Chi Yoon Jeong ◽

Jiyoun Lim ◽

Seungeun Chung ◽

Gague Kim ◽

...

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Domain Adaptation ◽

Classification Model ◽

Speech Emotion Recognition ◽

Target Domain ◽

Model Generalization ◽

Speech Database ◽

Emotion Labels ◽

Temporal Feature

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

Download Full-text

Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features

10.1109/isaect53699.2021.9668480 ◽

2021 ◽

Author(s):

Muhammad Zeeshan ◽

Huma Qayoom ◽

Farman Hassan

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Speech Emotion Recognition ◽

Spectral Features

Download Full-text

An Enhanced Speech Emotion Recognition System Based on Discourse Information

Computational Science – ICCS 2006 - Lecture Notes in Computer Science ◽

10.1007/11758501_62 ◽

2006 ◽

pp. 449-456 ◽

Cited By ~ 9

Author(s):

Chun Chen ◽

Mingyu You ◽

Mingli Song ◽

Jiajun Bu ◽

Jia Liu

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition System With Librosa

2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT) ◽

10.1109/csnt51715.2021.9509714 ◽

2021 ◽

Author(s):

P. Ashok Babu ◽

V. Siva Nagaraju ◽

Rajeev Ratna Vallabhuni

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition System With Librosa

2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT) ◽

10.1109/csnt51715.2021.9509690 ◽

2021 ◽

Author(s):

P. Ashok Babu ◽

V. Siva Nagaraju ◽

Rajeev Ratna Vallabhuni

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Speech Emotion Recognition

Download Full-text

Important Attributes Selection Based on Rough Set for Speech Emotion Recognition

Transdisciplinary Advancements in Cognitive Mechanisms and Human Information Processing ◽

10.4018/978-1-60960-553-7.ch016 ◽

2011 ◽

pp. 262-271

Author(s):

Jian Zhou ◽

Guoyin Wang ◽

Yong Yang

Keyword(s):

Emotion Recognition ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Recognition Rate ◽

Feature Selection Method ◽

Recognition System ◽

Attribute Selection ◽

Computer Application ◽

Speech Emotion Recognition

Speech emotion recognition is becoming more and more important in such computer application fields as health care, children education, etc. In order to improve the prediction performance or providing faster and more cost-effective recognition system, an attribute selection is often carried out beforehand to select the important attributes from the input attribute sets. However, it is time-consuming for traditional feature selection method used in speech emotion recognition to determine an optimum or suboptimum feature subset. Rough set theory offers an alternative, formal and methodology that can be employed to reduce the dimensionality of data. The purpose of this study is to investigate the effectiveness of Rough Set Theory in identifying important features in speech emotion recognition system. The experiments on CLDC emotion speech database clearly show this approach can reduce the calculation cost while retaining a suitable high recognition rate.

Download Full-text