Emotional Speech Processing in 3- to 12-Month-Old Infants: Influences of Emotion Categories and Acoustic Parameters

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.

Download Full-text

Acoustic correlates of rhythm in New Zealand English: A diachronic study

Language Variation and Change ◽

10.1017/s0954394512000051 ◽

2012 ◽

Vol 24 (1) ◽

pp. 1-31 ◽

Cited By ~ 7

Author(s):

Jacqui Nokes ◽

Jennifer Hay

Keyword(s):

New Zealand ◽

Large Scale ◽

Speech Rate ◽

Intensity Variation ◽

Acoustic Correlates ◽

Syllable Duration ◽

Diachronic Study ◽

Variability Index ◽

The Mean ◽

Vowel Shift

AbstractThis paper reports on a large-scale diachronic investigation into the timing of New Zealand English (NZE), which points to changes in its rhythmic structure. The Pairwise Variability Index (PVI) was used to measure the mean variation in duration, intensity, and pitch of successive vowels in the speech of over 500 New Zealanders, born between 1851 and 1988. Normalized vocalic PVIs for duration have reduced over time, after allowing for changes in speech rate, supporting existing findings that stressed and unstressed vowels are less differentiated by duration in modern NZE than in other varieties of English. Rhythmically, syllable duration may be playing a reduced role in signalling prominence in NZE. This is supported by the finding that there have been contemporaneous changes in pitch and intensity variation. We discuss external and internal influences on the timing of NZE, including contact with Māori, the emergence of Māori English, and diachronic vowel shift.

Download Full-text

Affect in sociolinguistic style

Language in Society ◽

10.1017/s0047404521000774 ◽

2021 ◽

pp. 1-26

Author(s):

Teresa Pratt

Keyword(s):

High School ◽

Political Economy ◽

Meaning Making ◽

Speech Rate ◽

Voice Quality ◽

High Energy ◽

Institutional Settings ◽

Bay Area ◽

Recent Scholarship ◽

Affective Value

Abstract This article argues for a focus on affect in sociolinguistic style. I integrate recent scholarship on affective practice (Wetherell 2015) and the circulation of affective value (Ahmed 2004b) in order to situate the linguistic and bodily semiotics of affect as components of stylistic practice. At a Bay Area public arts high school, ideologically distinct affects of chill or high-energy are co-constructed across signs and subjects. I analyze a group of cisgender young men's use of creaky voice quality, speech rate, and bodily hexis in enacting and circulating these affective values. Crucially, affect co-constructs students’ positioning within the high school political economy (as college-bound or not, artistically driven or not), highlighting the ideological motivations of stylistic practice. Building on recent scholarship, I propose that a more thorough consideration of affect can deepen our understanding of meaning-making as it occurs in everyday interaction in institutional settings. (Affect, political economy, embodiment, bricolage, voice quality, speech rate, high school)

Download Full-text

Acoustic correlates of voice quality improvement by voice training

10.21437/interspeech.2010-750 ◽

2010 ◽

Author(s):

Kiyoaki Aikawa ◽

Junko Uenuma ◽

Tomoko Akitake

Keyword(s):

Quality Improvement ◽

Voice Quality ◽

Voice Training ◽

Acoustic Correlates

Download Full-text

Regression-Based Noise Modeling for Speech Signal Processing

Fluctuation and Noise Letters ◽

10.1142/s021947752150022x ◽

2021 ◽

pp. 2150022

Author(s):

Caio Cesar Enside de Abreu ◽

Marco Aparecido Queiroz Duarte ◽

Bruno Rodrigues de Oliveira ◽

Jozue Vieira Filho ◽

Francisco Villarreal

Keyword(s):

Speech Enhancement ◽

Speech Processing ◽

Acoustic Analysis ◽

Voice Quality ◽

Wiener Filter ◽

Processing System ◽

Speech Quality ◽

Speech Signals ◽

Speech Signal Processing ◽

Acoustic Environment

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

Download Full-text

A Comparison of Acoustic Correlates of Voice Quality Across Different Recording Devices: A Cautionary Tale

10.21437/interspeech.2021-729 ◽

2021 ◽

Author(s):

Joshua Penney ◽

Andy Gibson ◽

Felicity Cox ◽

Michael Proctor ◽

Anita Szakay

Keyword(s):

Voice Quality ◽

Cautionary Tale ◽

Acoustic Correlates

Download Full-text

Interaction of Speech Coders and Atypical Speech II

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2002/055) ◽

2002 ◽

Vol 45 (4) ◽

pp. 689-699 ◽

Cited By ~ 2

Author(s):

Donald G. Jamieson ◽

Vijay Parsa ◽

Moneca C. Price ◽

James Till

Keyword(s):

Communication Systems ◽

Speech Rate ◽

Voice Quality ◽

Voice Disorders ◽

Normal Hearing ◽

Speech Quality ◽

Degraded Speech ◽

Before And After ◽

Subband Processing

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.

Download Full-text

Adaptation of Stuttering Frequency During Repeated Readings

Journal of Speech Language and Hearing Research ◽

10.1044/jslhr.4106.1265 ◽

1998 ◽

Vol 41 (6) ◽

pp. 1265-1281 ◽

Cited By ~ 21

Author(s):

Ludo Max ◽

Anthony J. Caruso

Keyword(s):

Motor Learning ◽

Transition Rate ◽

Speech Rate ◽

The Other ◽

Repeated Readings ◽

Vowel Duration ◽

Articulation Rate ◽

Motor Tasks ◽

Learning Hypothesis ◽

Word Duration

This study is part of a series investigating the hypothesis that stuttering adaptation is a result of motor learning. Previous investigations indicate that nonspeech motor learning typically is associated with an increase in speed of performance. Previous investigations of stuttering, on the other hand, indicate that improvements in fluency during most fluency-enhancing conditions or after stuttering treatment tend to be associated with decreased speech rate, increased duration of specific acoustic segments, and decreased vowel duration variability. The present acoustic findings, obtained from 8 individuals who stutter, reveal that speech adjustments occurring during adaptation differ from those reported for other fluency-enhancing conditions or stuttering treatment. Instead, the observed changes are consistent with those occurring during skill improvements for nonspeech motor tasks and, thus, with a motor learning hypothesis of stuttering adaptation. During the last of 6 repeated readings, a statistically significant increase in articulation rate was observed, together with a decrease in word duration, vowel duration, and consonant-vowel (CV) transition extent. Other adjustments showing relatively consistent trends across individual subjects included decreased CV transition rate and duration, and increased variability of both CV transition extent and vowel duration.

Download Full-text

A Methodology for Recognition of Emotions Based on Speech Analysis, for Applications to Human-Robot Interaction. An Exploratory Study

Paladyn Journal of Behavioral Robotics ◽

10.2478/pjbr-2014-0001 ◽

2014 ◽

Vol 5 (1) ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Mohammad Rabiei ◽

Alessandro Gasparetto

Keyword(s):

Exploratory Study ◽

Speech Rate ◽

Human Robot Interaction ◽

Acoustic Properties ◽

Speech Analysis ◽

Robot Interaction ◽

Basic Emotions ◽

Signal Processing Algorithms ◽

Recognition Of Emotions ◽

Set Up

AbstractA system for recognition of emotions based on speech analysis can have interesting applications in human-robot interaction. In this paper, we carry out an exploratory study on the possibility to use a proposed methodology to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust) based on phonetic and acoustic properties of emotive speech with the minimal use of signal processing algorithms. We set up an experimental test, consisting of choosing three types of speakers, namely: (i) five adult European speakers, (ii) five Asian (Middle East) adult speakers and (iii) five adult American speakers. The speakers had to repeat 6 sentences in English (with durations typically between 1 s and 3 s) in order to emphasize rising-falling intonation and pitch movement. Intensity, peak and range of pitch and speech rate have been evaluated. The proposed methodology consists of generating and analyzing a graph of formant, pitch and intensity, using the open-source PRAAT program. From the experimental results, it was possible to recognize the basic emotions in most of the cases

Download Full-text

Analysis of the frequency of verbal and vocal signs in true and false statements

Nauka bezbednost policija ◽

10.5937/nabepo26-30783 ◽

2021 ◽

Vol 26 (1) ◽

pp. 7-19

Author(s):

Valentina Baić ◽

Zvonimir Ivanović ◽

Milan Veljković

Keyword(s):

Response Latency ◽

Secondary Task ◽

Speech Rate ◽

Speech Errors ◽

Criminal Investigation ◽

True Statement ◽

Significant Difference ◽

Spoken Words ◽

The University ◽

Police Studies

The paper presents research aimed at analysing the frequency of verbal and vocal signs in a situation of false and true statements, by introducing a secondary task. The research involved 100 students (47 men and 53 women) of the master's studies of criminal investigation at the University of Criminal Investigation and Police Studies, aged 23-44. Students had the task, based on the observation of twenty selected videos (10 true statements and 10 false statements), to mark the frequency of each individual verbal and vocal sign, on a previously generated and prepared list. The results show that there is a statistically significant difference in terms of the frequency of all verbal and vocal signs in a false or true statement: response latency, speech hesitation, speech errors, speech rate, number of spoken words in the utterance, and length of utterance. Response latency, speech hesitation, and speech errors have higher median values in false utterances than in true ones, while speech rate, number of words spoken, and length of utterance show higher median values in true than false utterances.

Download Full-text

Voice Quality of European Portuguese Emotional Speech

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/978-3-642-12320-7_19 ◽

2010 ◽

pp. 142-151 ◽

Cited By ~ 2

Author(s):

Ana Nunes ◽

Rosa Lídia Coimbra ◽

António Teixeira

Keyword(s):

Voice Quality ◽

Emotional Speech ◽

European Portuguese

Download Full-text