GMM FOR EMOTION RECOGNITION OF VIETNAMESE

This paper presents the results of GMM-based recognition for four basic emotions of Vietnamese such as neutral, sadness, anger and happiness. The characteristic parameters of these emotions are extracted from speech signals and divided into different parameter sets for experiments. The experiments are carried out according to speaker-dependent or speaker-independent and content-dependent or content-independent recognitions. The results showed that the recognition scores are rather high with the case for which there is a full combination of parameters as MFCC and its first and second derivatives, fundamental frequency, energy, formants and its correspondent bandwidths, spectral characteristics and F0 variants. In average, the speaker-dependent and content-dependent recognition scrore is 89.21%. Next, the average score is 82.27% for the speaker-dependent and content-independent recognition. For the speaker-independent and content-dependent recognition, the average score is 70.35%. The average score is 66.99% for speaker-independent and content-independent recognition. Information on F0 has significantly increased the score of recognition

Download Full-text

A Study on Speaker Independent Emotion Recognition from Speech Signals

Advanced Aspects of Engineering Research Vol.12 ◽

10.9734/bpi/aaer/v12/9383d ◽

2021 ◽

pp. 13-19

Author(s):

B. Rajasekhar ◽

M. Kamaraju ◽

V. Sumalatha

Keyword(s):

Emotion Recognition ◽

Speech Signals ◽

Speaker Independent

Download Full-text

A State-Space Approach to the Synthesis of Random Vertical and Crosslevel Rail Irregularities

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.2894143 ◽

1990 ◽

Vol 112 (1) ◽

pp. 83-87 ◽

Cited By ~ 25

Author(s):

R. H. Fries ◽

B. M. Coffey

Keyword(s):

Numerical Simulation ◽

White Noise ◽

State Space ◽

Spectral Characteristics ◽

The State ◽

Second Derivatives ◽

Time Histories ◽

Sample Interval ◽

Rail Irregularities ◽

Derivatives Of

Solution of rail vehicle dynamics models by means of numerical simulation has become more prevalent and more sophisticated in recent years. At the same time, analysts and designers are increasingly interested in the response of vehicles to random rail irregularities. The work described in this paper provides a convenient method to generate random vertical and crosslevel irregularities when their time histories are required as inputs to a numerical simulation. The solution begins with mathematical models of vertical and crosslevel power spectral densities (PSDs) representing PSDs of track classes 4, 5, and 6. The method implements state-space models of shape filters whose frequency response magnitude squared matches the desired PSDs. The shape filters give time histories possessing the proper spectral content when driven by white noise inputs. The state equations are solved directly under the assumption that the white noise inputs are constant between time steps. Thus, the state transition matrix and the forcing matrix are obtained in closed form. Some simulations require not only vertical and crosslevel alignments, but also the first and occasionally the second derivatives of these signals. To accommodate these requirements, the first and second derivatives of the signals are also generated. The responses of the random vertical and crosslevel generators depend upon vehicle speed, sample interval, and track class. They possess the desired PSDs over wide ranges of speed and sample interval. The paper includes a comparison between synthetic and measured spectral characteristics of class 4 track. The agreement is very good.

Download Full-text

The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ◽

10.1109/icassp.2007.367152 ◽

2007 ◽

Cited By ~ 43

Author(s):

Marko Lugger ◽

Bin Yang

Keyword(s):

Emotion Recognition ◽

Voice Quality ◽

Speaker Independent ◽

Quality Features

Download Full-text

Speech-based human emotion recognition

10.32920/ryerson.14651964 ◽

2021 ◽

Author(s):

Talieh Seyed Tabtabae

Keyword(s):

Emotion Recognition ◽

Emotional State ◽

Speaker Identification ◽

Recognition System ◽

Research Area ◽

Speech Signals ◽

Spectral Features ◽

Emotional States ◽

Human Communication ◽

Set Up

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.

Download Full-text

Comparison of fundamental frequency and formants frequency measurements in two speech tasks

Revista CEFAC ◽

10.1590/1982-0216/201921612819 ◽

2019 ◽

Vol 21 (6) ◽

Author(s):

Flávia Viegas ◽

Danieli Viegas ◽

Glaucio Serra Guimarães ◽

Margareth Maria Gomes de Souza ◽

Ronir Raggio Luiz ◽

...

Keyword(s):

Fundamental Frequency ◽

Effect Size ◽

Oral Communication ◽

Speech Disorders ◽

Brazilian Portuguese ◽

T Test ◽

Speech Signals ◽

Frequency Measurements ◽

Age Range ◽

Voice And Speech

ABSTRACT Purpose: to compare the measurements of fundamental frequency (F0) and frequency of the first two formants (F1 and F2) of the seven oral vowels of the Brazilian Portuguese in two speech tasks, in adults without voice and speech disorders. Methods: eighty participants in the age range 18 and 40 years, paired by gender, were selected after orofacial, orthodontic and auditory-perceptual assessments of voice and speech. The speech signals were obtained from carrier phrases and sustained vowels and the values of the F0 and frequencies of F1 and F2 were estimated. The differences were verified through the t Test, and the effect size was calculated. Results: differences were found in the F0 measurements between the two speech tasks, in two vowels in males, and in five vowels, in females. In the F1 frequencies, differences were noted in six vowels, in men, and in two, in women. In the F2 frequencies, there was a difference in four vowels, in men, and three, in women. Conclusion: based on the differences found, it is concluded that the speech task for evaluation of fundamental frequency and formants’ frequencies, in the Brazilian Portuguese, can show distinct results in both glottal and supraglottal measures in the production of different oral vowels of this language. Thus, it is suggested that clinicians and researchers consider both forms of emission for a more accurate interpretation of the implications of these data in the evaluation of oral communication and therapeutic conducts.

Download Full-text

IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION

Journal of Information and Communication Technology ◽

10.32890/jict2015.14.4 ◽

2015 ◽

Vol 14 ◽

pp. 57-76

Author(s):

Hasrul Mohd Nazid ◽

Hariharan Muthusamy ◽

Vikneswaran Vijean ◽

Sazali Yaacob

Keyword(s):

Emotion Recognition ◽

Feature Reduction ◽

Two Stage ◽

Speaker Independent

Download Full-text

Emotion recognition from speech signals using new harmony features

Signal Processing ◽

10.1016/j.sigpro.2009.09.009 ◽

2010 ◽

Vol 90 (5) ◽

pp. 1415-1423 ◽

Cited By ~ 75

Author(s):

B. Yang ◽

M. Lugger

Keyword(s):

Emotion Recognition ◽

Speech Signals ◽

New Harmony

Download Full-text

No effect of cognitive performance on post-intervention improvement in emotion recognition

European Psychiatry ◽

10.1016/j.eurpsy.2017.01.2119 ◽

2017 ◽

Vol 41 (S1) ◽

pp. S190-S190

Author(s):

V.P. Bozikas ◽

S. Tsotsi ◽

A. Dardagani ◽

E. Dandi ◽

E.I. Nazlidou ◽

...

Keyword(s):

Emotion Recognition ◽

Cognitive Performance ◽

Neuropsychological Test ◽

Emotion Perception ◽

First Episode ◽

Psychotic Episode ◽

Post Intervention ◽

Basic Emotions ◽

The Difference ◽

Competing Interest

Deficits in emotion perception in patients with first episode of psychosis have been reported by many researchers. Till now, training programs have focused mainly in patients with schizophrenia and not in first psychotic episode (FEP) patients. We used a new intervention for facial affect recognition in a group of 35 FEP patients (26 male). The emotion recognition intervention included coloured pictures of individuals expressing six basic emotions (happiness, sadness, anger, disgust, surprise, fear) and a neutral emotion. The patients were trained to detect changes in facial features, according to the emotion displayed. A comprehensive battery of neuropsychological tests was also administered, measuring attention, memory, working memory, visuospatial ability and executive function by using specific tests of the Cambridge Neuropsychological Test Automated Battery (CANTAB). We tried to explore whether cognitive performance can explain the difference noted between the original assessment of emotion recognition and the post-intervention assessment. According to our data, overall cognitive performance did not correlate with post-intervention change in emotion recognition. Specific cognitive domains did not correlate with this change, either. According the above mentioned results, no significant correlation between neuropsychological performance and post-intervention improvement in emotion recognition was noted. This finding may suggest that interventions for emotion recognition may target specific processes that underlie emotion perception and their effect can be independent of general cognitive function.Disclosure of interestThe authors have not supplied their declaration of competing interest.

Download Full-text

Impaired Perception of Unintentional Transgression of Social Norms after Prefrontal Cortex Damage: Relationship to Decision Making, Emotion Recognition, and Executive Functions

Archives of Clinical Neuropsychology ◽

10.1093/arclin/acab078 ◽

2021 ◽

Author(s):

Riadh Ouerchefani ◽

Naoufel Ouerchefani ◽

Mohamed Riadh Ben Rejeb ◽

Didier Le Gall

Keyword(s):

Decision Making ◽

Prefrontal Cortex ◽

Social Behavior ◽

Theory Of Mind ◽

Social Norms ◽

Executive Functions ◽

Emotion Recognition ◽

Iowa Gambling Task ◽

Gambling Task ◽

Basic Emotions

Abstract Objective Patients with prefrontal cortex damage often transgress social rules and show lower accuracy in identifying and explaining inappropriate social behavior. The objective of this study was to examine the relationship between the ability to perceive other unintentional transgressions of social norms and both decision making and emotion recognition as these abilities are critical for appropriate social behavior. Method We examined a group of patients with focal prefrontal cortex damage (N = 28) and a group of matched control participants (N = 28) for their abilities to detect unintentional transgression of social norms using the “Faux-Pas” task of theory of mind, to make advantageous decisions on the Iowa gambling task, and to recognize basic emotions on the Ekman facial affect test. Results The group of patients with frontal lobe damage was impaired in all of these tasks compared with control participants. Moreover, all the “Faux-Pas”, Iowa gambling, and emotion recognition tasks were significantly associated and predicted by executive measures of inhibition, flexibility, or planning. However, only measures from the Iowa gambling task were associated and predicted performance on the “Faux-Pas” task. These tasks were not associated with performance in recognition of basic emotions. These findings suggest that theory of mind, executive functions, and decision-making abilities act in an interdependent way for appropriate social behavior. However, theory of mind and emotion recognition seem to have distinct but additive effects upon social behavior. Results from VLSM analysis also corroborate these data by showing a partially overlapped prefrontal circuitry underlying these cognitive domains.

Download Full-text

On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech

10.21437/interspeech.2009-483 ◽

2009 ◽

Author(s):

Marko Lugger ◽

Bin Yang

Keyword(s):

Emotion Recognition ◽

Spontaneous Speech ◽

Speaker Independent ◽

High Level

Download Full-text