Neurocomputational Models of Voice and Speech Perception

Author(s):  
Bernd J. Kröger

This chapter outlines a comprehensive neurocomputational model of voice and speech perception based on (i) already established computational models, as well as on (ii) neurophysiological data of the underlying neural processes. Neurocomputational models of speech perception comprise auditory as well as cognitive modules, in order to extract sound features as well as linguistic information (linguistic content). A model of voice and speech perception in addition needs to process paralinguistic information like gender, age, emotional or affective state of speaker, etc. It is argued here that modules of a neurocomputational model of voice and speech perception need to interact with modules which go beyond unimodal auditory processing because, for example, processing of paralinguistic information is closely related to such as visual facial perception. Thus, this chapter describes neural modelling of voice and speech perception in relation to general communication and social-interaction processes, which makes it necessary to develop a hypermodal processing approach.

2011 ◽  
Vol 32 (2) ◽  
pp. 560-570 ◽  
Author(s):  
Bart Boets ◽  
Maaike Vandermosten ◽  
Hanne Poelmans ◽  
Heleen Luts ◽  
Jan Wouters ◽  
...  

2014 ◽  
Vol 281 (1787) ◽  
pp. 20140480 ◽  
Author(s):  
Michelle J. Spierings ◽  
Carel ten Cate

Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception.


2021 ◽  
Author(s):  
Shannon L.M. Heald ◽  
Stephen C. Van Hedger ◽  
John Veillette ◽  
Katherine Reis ◽  
Joel S. Snyder ◽  
...  

AbstractThe ability to generalize rapidly across specific experiences is vital for robust recognition of new patterns, especially in speech perception considering acoustic-phonetic pattern variability. Behavioral research has demonstrated that listeners are rapidly able to generalize their experience with a talker’s speech and quickly improve understanding of a difficult-to-understand talker without prolonged practice, e.g., even after a single training session. Here, we examine the differences in neural responses to generalized versus rote learning in auditory cortical processing by training listeners to understand a novel synthetic talker using a Pretest-Posttest design with electroencephalography (EEG). Participants were trained using either (1) a large inventory of words where no words repeated across the experiment (generalized learning) or (2) a small inventory of words where words repeated (rote learning). Analysis of long-latency auditory evoked potentials at Pretest and Posttest revealed that while rote and generalized learning both produce rapid changes in auditory processing, the nature of these changes differed. In the context of adapting to a talker, generalized learning is marked by an amplitude reduction in the N1-P2 complex and by the presence of a late-negative (LN) wave in the auditory evoked potential following training. Rote learning, however, is marked only by temporally later source configuration changes. The early N1-P2 change, found only for generalized learning, suggests that generalized learning relies on the attentional system to reorganize the way acoustic features are selectively processed. This change in relatively early sensory processing (i.e. during the first 250ms) is consistent with an active processing account of speech perception, which proposes that the ability to rapidly adjust to the specific vocal characteristics of a new talker (for which rote learning is rare) relies on attentional mechanisms to adaptively tune early auditory processing sensitivity.Statement of SignificancePrevious research on perceptual learning has typically examined neural responses during rote learning: training and testing is carried out with the same stimuli. As a result, it is not clear that findings from these studies can explain learning that generalizes to novel patterns, which is critical in speech perception. Are neural responses to generalized learning in auditory processing different from neural responses to rote learning? Results indicate rote learning of a particular talker’s speech involves brain regions focused on the memory encoding and retrieving of specific learned patterns, whereas generalized learning involves brain regions involved in reorganizing attention during early sensory processing. In learning speech from a novel talker, only generalized learning is marked by changes in the N1-P2 complex (reflective of secondary auditory cortical processing). The results are consistent with the view that robust speech perception relies on the fast adjustment of attention mechanisms to adaptively tune auditory sensitivity to cope with acoustic variability.


2016 ◽  
Author(s):  
Jennifer Padilla ◽  
Thierry Morlet ◽  
Kyoko Nagao ◽  
Rachel Crum ◽  
L. Ashleigh Greenwood ◽  
...  

1998 ◽  
Vol 21 (2) ◽  
pp. 280-281
Author(s):  
Athanassios Protopapas ◽  
Paula Tallal

The arguments for the orderly output constraint concern phylogenetic matters and do not address the ontogeny of combination-specific neurons and the corresponding processing mechanisms. Locus equations are too variable to be strongly predetermined and too inconsistent to be easily learned. Findings on the development of speech perception and underlying auditory processing must be taken into account in the formulation of neural encoding theories.


2020 ◽  
Vol 6 (30) ◽  
pp. eaba7830
Author(s):  
Laurianne Cabrera ◽  
Judit Gervain

Speech perception is constrained by auditory processing. Although at birth infants have an immature auditory system and limited language experience, they show remarkable speech perception skills. To assess neonates’ ability to process the complex acoustic cues of speech, we combined near-infrared spectroscopy (NIRS) and electroencephalography (EEG) to measure brain responses to syllables differing in consonants. The syllables were presented in three conditions preserving (i) original temporal modulations of speech [both amplitude modulation (AM) and frequency modulation (FM)], (ii) both fast and slow AM, but not FM, or (iii) only the slowest AM (<8 Hz). EEG responses indicate that neonates can encode consonants in all conditions, even without the fast temporal modulations, similarly to adults. Yet, the fast and slow AM activate different neural areas, as shown by NIRS. Thus, the immature human brain is already able to decompose the acoustic components of speech, laying the foundations of language learning.


1996 ◽  
Vol 39 (2) ◽  
pp. 278-297 ◽  
Author(s):  
Susan Nittrouer

Studies of children’s speech perception have shown that young children process speech signals differently than adults. Specifically, the relative contributions made by various acoustic parameters to some linguistic decisions seem to differ for children and adults. Such findings have led to the hypothesis that there is a developmental shift in the perceptual weighting of acoustic parameters that results from experience with a native language (i.e., the Developmental Weighting Shift). This developmental shift eventually leads the child to adopt the optimal perceptual weighting strategy for the native language being learned (i.e., one that allows the listener to make accurate decisions about the phonemic structure of his or her native language). Although this proposal has intuitive appeal, there is at least one serious challenge that can be leveled against it: Perhaps age-related differences inspeech perception can more appropriately be explained by age-related differences in basic auditory-processing abilities. That is, perhaps children are not as sensitive as adults to subtle differences in acoustic structure and so make linguistic decisions based on the acoustic information that is most perceptually salient. The present study tested this hypothesis for the acoustic cues relevant to fricative identity in fricative-vowel syllables. Results indicated that 3-year-olds were not as sensitive to changes in these acoustic cues as adults are, but that these age-related differences in auditory sensitivity could not entirely account for age-related differences in perceptual weighting strategies.


Sign in / Sign up

Export Citation Format

Share Document