emotional speech
Recently Published Documents


TOTAL DOCUMENTS

584
(FIVE YEARS 152)

H-INDEX

27
(FIVE YEARS 2)

Author(s):  
Chieh Kao ◽  
Maria D. Sera ◽  
Yang Zhang

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.


2022 ◽  
Vol 3 (4) ◽  
pp. 295-307
Author(s):  
Subarna Shakya

Personal computer-based data collection and analysis systems may now be more resilient due to the recent advances in digital signal processing technology. The signal processing approach known as Speaker Recognition, uses the specific information contained in voice waves to automatically identify the speaker. For a single source, this study examines systems that can recognize a wide range of emotional states in speech. Since it offers insight into human brain states, it's a hot issue in the development during the interface between human and computer arrangement for speech processing. Mostly, it is necessary to recognize the emotional state of people in the arrangement. This research analyses an effort to discern various emotional stages such as anger, joy, neutral, fear and sadness by classification methods. The acoustic feature, a measure of unpredictability, is used in conjunction with a non-linear signal quantification approach to identify emotions. The unpredictability of all the emotional signals is included in a feature vector constructed from the calculated entropy measurements. In the next step, the acoustic features through speech signal are used for the training in the proposed neural network that are given to linear discriminator analysis approach for further greater classification with acoustic feature extraction. Besides, this research article compares the proposed work with various modern classifiers such as K- nearest neighbor, support vector machine and linear discriminator approach. Moreover, this proposed algorithm is based on acoustic features in Linear Discriminant Analysis (LDA) with acoustic feature extraction machine algorithm. The great advantage of this proposed algorithm is that it separates negative and positive features of emotions and provides good results during classification. According to the results from efficient cross-validation in the proposed framework, accessible sample of dataset of Emotional Speech, a single-source LDA classifier can recognize emotions in speech signals with above 90 percent of accuracy for various emotional stages.


2021 ◽  
Vol 11 (24) ◽  
pp. 11748
Author(s):  
Jiří Přibil ◽  
Anna Přibilová ◽  
Ivan Frollo

This paper deals with two modalities for stress detection and evaluation—vowel phonation speech signal and photo-plethysmography (PPG) signal. The main measurement is carried out in four phases representing different stress conditions for the tested person. The first and last phases are realized in laboratory conditions. The PPG and phonation signals are recorded inside the magnetic resonance imaging scanner working with a weak magnetic field up to 0.2 T in a silent state and/or with a running scan sequence during the middle two phases. From the recorded phonation signal, different speech features are determined for statistical analysis and evaluation by the Gaussian mixture models (GMM) classifier. A database of affective sounds and two databases of emotional speech were used for GMM creation and training. The second part of the developed method gives comparison of results obtained from the statistical description of the sensed PPG wave together with the determined heart rate and Oliva–Roztocil index values. The fusion of results obtained from both modalities gives the final stress level. The performed experiments confirm our working assumption that a fusion of both types of analysis is usable for this task—the final stress level values give better results than the speech or PPG signals alone.


Data ◽  
2021 ◽  
Vol 6 (12) ◽  
pp. 130
Author(s):  
Mathilde Marie Duville ◽  
Luz María Alonso-Valerdi ◽  
David I. Ibarra-Zarate

In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available.


Author(s):  
C. Revathy ◽  
R. Sureshbabu

Speech processing is one of the required fields in digital signal processing that helps in processing the speech signals. The speech process is utilized in different fields such as emotion recognition, virtual assistants, voice identification, etc. Among the various applications, emotion recognition is one of the critical areas because it is used to recognize people’s exact emotions and eliminate physiological issues. Several researchers utilize signal processing and machine learning techniques together to find the exact human emotions. However, they fail to attain their feelings with less computational complexity and high accuracy. This paper introduces the intelligent computational technique called cat swarm optimized spiking neural network (CSSPNN). Initially, the emotional speech signal is collected from the Toronto emotional speech set (TESS) dataset, which is then processed by applying a wavelet approach to extract the features. The derived features are further examined using the defined classifier CSSPNN, which recognizes human emotions due to the effective training and learning process. Finally, the proficiency of the system is determined using experimental results and discussions. The proposed system recognizes the speech emotions up to 99.3% accuracy compared to recurrent neural networks (RNNs), deep neural networks (DNNs) and deep shallow neural networks (DSNNs).


Author(s):  
Igor Mandaric ◽  
Mia Vujovic ◽  
Sinisa Suzic ◽  
Tijana Nosek ◽  
Nikola Simic ◽  
...  

2021 ◽  
Author(s):  
Hubert Nourtel ◽  
Pierre Champion ◽  
Denis Jouvet ◽  
Anthony Larcher ◽  
Marie Tahon
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document