scholarly journals Emotional Speech Processing in 3- to 12-Month-Old Infants: Influences of Emotion Categories and Acoustic Parameters

Author(s):  
Chieh Kao ◽  
Maria D. Sera ◽  
Yang Zhang

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.

2012 ◽  
Vol 24 (1) ◽  
pp. 1-31 ◽  
Author(s):  
Jacqui Nokes ◽  
Jennifer Hay

AbstractThis paper reports on a large-scale diachronic investigation into the timing of New Zealand English (NZE), which points to changes in its rhythmic structure. The Pairwise Variability Index (PVI) was used to measure the mean variation in duration, intensity, and pitch of successive vowels in the speech of over 500 New Zealanders, born between 1851 and 1988. Normalized vocalic PVIs for duration have reduced over time, after allowing for changes in speech rate, supporting existing findings that stressed and unstressed vowels are less differentiated by duration in modern NZE than in other varieties of English. Rhythmically, syllable duration may be playing a reduced role in signalling prominence in NZE. This is supported by the finding that there have been contemporaneous changes in pitch and intensity variation. We discuss external and internal influences on the timing of NZE, including contact with Māori, the emergence of Māori English, and diachronic vowel shift.


2021 ◽  
pp. 1-26
Author(s):  
Teresa Pratt

Abstract This article argues for a focus on affect in sociolinguistic style. I integrate recent scholarship on affective practice (Wetherell 2015) and the circulation of affective value (Ahmed 2004b) in order to situate the linguistic and bodily semiotics of affect as components of stylistic practice. At a Bay Area public arts high school, ideologically distinct affects of chill or high-energy are co-constructed across signs and subjects. I analyze a group of cisgender young men's use of creaky voice quality, speech rate, and bodily hexis in enacting and circulating these affective values. Crucially, affect co-constructs students’ positioning within the high school political economy (as college-bound or not, artistically driven or not), highlighting the ideological motivations of stylistic practice. Building on recent scholarship, I propose that a more thorough consideration of affect can deepen our understanding of meaning-making as it occurs in everyday interaction in institutional settings. (Affect, political economy, embodiment, bricolage, voice quality, speech rate, high school)


2021 ◽  
pp. 2150022
Author(s):  
Caio Cesar Enside de Abreu ◽  
Marco Aparecido Queiroz Duarte ◽  
Bruno Rodrigues de Oliveira ◽  
Jozue Vieira Filho ◽  
Francisco Villarreal

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.


2021 ◽  
Author(s):  
Joshua Penney ◽  
Andy Gibson ◽  
Felicity Cox ◽  
Michael Proctor ◽  
Anita Szakay

2002 ◽  
Vol 45 (4) ◽  
pp. 689-699 ◽  
Author(s):  
Donald G. Jamieson ◽  
Vijay Parsa ◽  
Moneca C. Price ◽  
James Till

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.


1998 ◽  
Vol 41 (6) ◽  
pp. 1265-1281 ◽  
Author(s):  
Ludo Max ◽  
Anthony J. Caruso

This study is part of a series investigating the hypothesis that stuttering adaptation is a result of motor learning. Previous investigations indicate that nonspeech motor learning typically is associated with an increase in speed of performance. Previous investigations of stuttering, on the other hand, indicate that improvements in fluency during most fluency-enhancing conditions or after stuttering treatment tend to be associated with decreased speech rate, increased duration of specific acoustic segments, and decreased vowel duration variability. The present acoustic findings, obtained from 8 individuals who stutter, reveal that speech adjustments occurring during adaptation differ from those reported for other fluency-enhancing conditions or stuttering treatment. Instead, the observed changes are consistent with those occurring during skill improvements for nonspeech motor tasks and, thus, with a motor learning hypothesis of stuttering adaptation. During the last of 6 repeated readings, a statistically significant increase in articulation rate was observed, together with a decrease in word duration, vowel duration, and consonant-vowel (CV) transition extent. Other adjustments showing relatively consistent trends across individual subjects included decreased CV transition rate and duration, and increased variability of both CV transition extent and vowel duration.


2014 ◽  
Vol 5 (1) ◽  
pp. 1-11 ◽  
Author(s):  
Mohammad Rabiei ◽  
Alessandro Gasparetto

AbstractA system for recognition of emotions based on speech analysis can have interesting applications in human-robot interaction. In this paper, we carry out an exploratory study on the possibility to use a proposed methodology to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust) based on phonetic and acoustic properties of emotive speech with the minimal use of signal processing algorithms. We set up an experimental test, consisting of choosing three types of speakers, namely: (i) five adult European speakers, (ii) five Asian (Middle East) adult speakers and (iii) five adult American speakers. The speakers had to repeat 6 sentences in English (with durations typically between 1 s and 3 s) in order to emphasize rising-falling intonation and pitch movement. Intensity, peak and range of pitch and speech rate have been evaluated. The proposed methodology consists of generating and analyzing a graph of formant, pitch and intensity, using the open-source PRAAT program. From the experimental results, it was possible to recognize the basic emotions in most of the cases


2021 ◽  
Vol 26 (1) ◽  
pp. 7-19
Author(s):  
Valentina Baić ◽  
Zvonimir Ivanović ◽  
Milan Veljković

The paper presents research aimed at analysing the frequency of verbal and vocal signs in a situation of false and true statements, by introducing a secondary task. The research involved 100 students (47 men and 53 women) of the master's studies of criminal investigation at the University of Criminal Investigation and Police Studies, aged 23-44. Students had the task, based on the observation of twenty selected videos (10 true statements and 10 false statements), to mark the frequency of each individual verbal and vocal sign, on a previously generated and prepared list. The results show that there is a statistically significant difference in terms of the frequency of all verbal and vocal signs in a false or true statement: response latency, speech hesitation, speech errors, speech rate, number of spoken words in the utterance, and length of utterance. Response latency, speech hesitation, and speech errors have higher median values in false utterances than in true ones, while speech rate, number of words spoken, and length of utterance show higher median values in true than false utterances.


Sign in / Sign up

Export Citation Format

Share Document