Improving speech emotion recognition based on acoustic words emotion dictionary

2020 ◽  
pp. 1-15
Author(s):  
Wang Wei ◽  
Xinyi Cao ◽  
He Li ◽  
Lingjie Shen ◽  
Yaqin Feng ◽  
...  

Abstract To improve speech emotion recognition, a U-acoustic words emotion dictionary (AWED) features model is proposed based on an AWED. The method models emotional information from acoustic words level in different emotion classes. The top-list words in each emotion are selected to generate the AWED vector. Then, the U-AWED model is constructed by combining utterance-level acoustic features with the AWED features. Support vector machine and convolutional neural network are employed as the classifiers in our experiment. The results show that our proposed method in four tasks of emotion classification all provides significant improvement in unweighted average recall.

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6008 ◽  
Author(s):  
Misbah Farooq ◽  
Fawad Hussain ◽  
Naveed Khan Baloch ◽  
Fawad Riasat Raja ◽  
Heejung Yu ◽  
...  

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.


2021 ◽  
Vol 7 ◽  
pp. e766
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.


Author(s):  
Niha Kamal Basha ◽  
Aisha Banu Wahab

: Absence seizure is a type of brain disorder in which subject get into sudden lapses in attention. Which means sudden change in brain stimulation. Most of this type of disorder is widely found in children’s (5-18 years). These Electroencephalogram (EEG) signals are captured with long term monitoring system and are analyzed individually. In this paper, a Convolutional Neural Network to extract single channel EEG seizure features like Power, log sum of wavelet transform, cross correlation, and mean phase variance of each frame in a windows are extracted after pre-processing and classify them into normal or absence seizure class, is proposed as an empowerment of monitoring system by automatic detection of absence seizure. The training data is collected from the normal and absence seizure subjects in the form of Electroencephalogram. The objective is to perform automatic detection of absence seizure using single channel electroencephalogram signal as input. Here the data is used to train the proposed Convolutional Neural Network to extract and classify absence seizure. The Convolutional Neural Network consist of three layers 1] convolutional layer – which extract the features in the form of vector 2] Pooling layer – the dimensionality of output from convolutional layer is reduced and 3] Fully connected layer–the activation function called soft-max is used to find the probability distribution of output class. This paper goes through the automatic detection of absence seizure in detail and provide the comparative analysis of classification between Support Vector Machine and Convolutional Neural Network. The proposed approach outperforms the performance of Support Vector Machine by 80% in automatic detection of absence seizure and validated using confusion matrix.


Author(s):  
Wanli Wang ◽  
Botao Zhang ◽  
Kaiqi Wu ◽  
Sergey A Chepinskiy ◽  
Anton A Zhilenkov ◽  
...  

In this paper, a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Considering the limited computing resource on mobile robots and the requirement for high classification accuracy, the proposed hybrid method combines a convolutional neural network with a support vector machine to keep a high classification accuracy while improve work efficiency. The key idea is that the convolutional neural network is used to finish a multi-class classification and simultaneously the support vector machine is used to make a two-class classification. The two-class classification performed by the support vector machine is aimed at one kind of terrain that users are mostly concerned with. Results of the two classifications will be consolidated to get the final classification result. The convolutional neural network used in this method is modified for the on-board usage of mobile robots. In order to enhance efficiency, the convolutional neural network has a simple architecture. The convolutional neural network and the support vector machine are trained and tested by using RGB images of six kinds of common terrains. Experimental results demonstrate that this method can help robots classify terrains accurately and efficiently. Therefore, the proposed method has a significant potential for being applied to the on-board usage of mobile robots.


Metals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 639
Author(s):  
Chen Ma ◽  
Haifei Dang ◽  
Jun Du ◽  
Pengfei He ◽  
Minbo Jiang ◽  
...  

This paper proposes a novel metal additive manufacturing process, which is a composition of gas tungsten arc (GTA) and droplet deposition manufacturing (DDM). Due to complex physical metallurgical processes involved, such as droplet impact, spreading, surface pre-melting, etc., defects, including lack of fusion, overflow and discontinuity of deposited layers always occur. To assure the quality of GTA-assisted DDM-ed parts, online monitoring based on visual sensing has been implemented. The current study also focuses on automated defect classification to avoid low efficiency and bias of manual recognition by the way of convolutional neural network-support vector machine (CNN-SVM). The best accuracy of 98.9%, with an execution time of about 12 milliseconds to handle an image, proved our model can be enough to use in real-time feedback control of the process.


Sign in / Sign up

Export Citation Format

Share Document