Independent mechanisms of temporal and linguistic cue correspondence benefiting audiovisual speech processing

When listening is difficult, seeing the face of the talker aids speech comprehension. Faces carry both temporal (low-level physical correspondence of mouth movement and auditory speech) and linguistic (learned physical correspondences of mouth shape (viseme) and speech sound (phoneme)) cues. Listeners participated in two experiments investigating how these cues may be used to process sentences when maskers are present. In Experiment I, faces were rotated to disrupt linguistic but not temporal cue correspondence. Listeners suffered a deficit in speech comprehension when the faces were rotated, indicating that visemes are processed in a rotation-dependent manner, and that linguistic cues aid comprehension. In Experiment II, listeners were asked to detect pitch modulation in the target speech with upright and inverted faces that either matched the target or masker speech such that performance differences could be explained by binding, an early multisensory integration mechanism distinct from traditional late integration. Performance in this task replicated previous findings that temporal integration induces binding, but there was no behavioral evidence for a role of linguistic cues in binding. Together these experiments point to temporal cues providing a speech processing benefit through binding and linguistic cues providing a benefit through late integration.

Download Full-text

Attention to speaking mouth is reduced by face inversion in infants only from 9 months of age

10.21203/rs.3.rs-352367/v1 ◽

2021 ◽

Author(s):

Zuzanna Laudańska ◽

Aleksandra Dopierała ◽

Magdalena Szmytke ◽

Dianna Ilyka ◽

Anna Malinowska-Korczak ◽

...

Keyword(s):

Face Processing ◽

Speech Processing ◽

Speech Sound ◽

Early Integration ◽

Face Inversion ◽

Facial Information ◽

The Face ◽

Relationship Of ◽

Configural Face Processing ◽

Processing Mechanisms

Abstract Configural processing is a specialised perceptual mechanism that allows adult humans to quickly process facial information. It emerges before the first birthday and can be disrupted by upside-down presentation of the face (inversion). To date, little is known about the relationship of configural face processing to the emerging knowledge of audiovisual (AV) speech in infancy. Using eye-tracking we measured attention to speaking mouth in upright and inverted faces that were either congruent or incongruent with the speech sound. Face inversion affected looking at AV speech only in older infants (9- to 11- and 12- to 14-month-olds). The youngest group of infants (5- to 7-month-olds) did not show any differences in looking durations between upright and inverted faces, while in both older groups face inversion led to reduced looking at the articulating mouth. We also observed a stronger interest in the eyes in the youngest infants, followed by an increase in looking time to the mouth in both older groups. Our findings suggest that configural face processing is involved in AV speech processing already in infancy, indicating early integration of face and speech processing mechanisms in cognitive development.

Download Full-text

Speech perception in individuals with auditory dys-synchrony

The Journal of Laryngology & Otology ◽

10.1017/s0022215110001854 ◽

2010 ◽

Vol 125 (3) ◽

pp. 236-245 ◽

Cited By ~ 5

Author(s):

U A Kumar ◽

M Jayaram

Keyword(s):

Speech Processing ◽

Speech Sound ◽

Just Noticeable Difference ◽

Auditory Information ◽

Speech Sounds ◽

Transition Duration ◽

Processing Strategies ◽

Temporal Cues ◽

Speech Segments ◽

Difference Time

AbstractObjective:This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony.Methods:Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant–vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the ‘just noticeable difference’ time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed.Results:Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used.Conclusion:These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.

Download Full-text

The signing body: extensive sign language practice shapes the size of hands and face

Experimental Brain Research ◽

10.1007/s00221-021-06121-9 ◽

2021 ◽

Author(s):

Laura Mora ◽

Anna Sedda ◽

Teresa Esteban ◽

Gianna Cocchini

Keyword(s):

Sign Language ◽

Body Part ◽

Spatial Context ◽

Dependent Manner ◽

Body Parts ◽

Baseball Players ◽

Space Condition ◽

Hand Size ◽

Stable Representation ◽

The Face

AbstractThe representation of the metrics of the hands is distorted, but is susceptible to malleability due to expert dexterity (magicians) and long-term tool use (baseball players). However, it remains unclear whether modulation leads to a stable representation of the hand that is adopted in every circumstance, or whether the modulation is closely linked to the spatial context where the expertise occurs. To this aim, a group of 10 experienced Sign Language (SL) interpreters were recruited to study the selective influence of expertise and space localisation in the metric representation of hands. Experiment 1 explored differences in hands’ size representation between the SL interpreters and 10 age-matched controls in near-reaching (Condition 1) and far-reaching space (Condition 2), using the localisation task. SL interpreters presented reduced hand size in near-reaching condition, with characteristic underestimation of finger lengths, and reduced overestimation of hands and wrists widths in comparison with controls. This difference was lost in far-reaching space, confirming the effect of expertise on hand representations is closely linked to the spatial context where an action is performed. As SL interpreters are also experts in the use of their face with communication purposes, the effects of expertise in the metrics of the face were also studied (Experiment 2). SL interpreters were more accurate than controls, with overall reduction of width overestimation. Overall, expertise modifies the representation of relevant body parts in a specific and context-dependent manner. Hence, different representations of the same body part can coexist simultaneously.

Download Full-text

Transient neuronal suppression for exploitation of new sensory evidence

Nature Communications ◽

10.1038/s41467-021-27697-4 ◽

2022 ◽

Vol 13 (1) ◽

Author(s):

Maxwell Shinn ◽

Daeyeol Lee ◽

John D. Murray ◽

Hyojung Seo

Keyword(s):

Decision Making ◽

Neural Activity ◽

Temporal Integration ◽

Motor Preparation ◽

Stimulus Onset ◽

Frontal Eye Field ◽

Perceptual Decision Making ◽

Behavioral Studies ◽

The Face ◽

Sensory Evidence

AbstractIn noisy but stationary environments, decisions should be based on the temporal integration of sequentially sampled evidence. This strategy has been supported by many behavioral studies and is qualitatively consistent with neural activity in multiple brain areas. By contrast, decision-making in the face of non-stationary sensory evidence remains poorly understood. Here, we trained monkeys to identify and respond via saccade to the dominant color of a dynamically refreshed bicolor patch that becomes informative after a variable delay. Animals’ behavioral responses were briefly suppressed after evidence changes, and many neurons in the frontal eye field displayed a corresponding dip in activity at this time, similar to that frequently observed after stimulus onset but sensitive to stimulus strength. Generalized drift-diffusion models revealed consistency of behavior and neural activity with brief suppression of motor output, but not with pausing or resetting of evidence accumulation. These results suggest that momentary arrest of motor preparation is important for dynamic perceptual decision making.

Download Full-text

Word recognition in aphasia

The Oxford Handbook of Psycholinguistics ◽

10.1093/oxfordhb/9780198568971.013.0009 ◽

2007 ◽

pp. 140-156

Author(s):

Sheila Blumstein

Keyword(s):

Word Recognition ◽

Language Processing ◽

Speech Processing ◽

Current Knowledge ◽

Recognition System ◽

Speech And Language ◽

Neural Basis ◽

Sentential Context ◽

The Face ◽

Speech And Language Processing

This article reviews current knowledge about the nature of auditory word recognition deficits in aphasia. It assumes that the language functioning of adults with aphasia was normal prior to sustaining brain injury, and that their word recognition system was intact. As a consequence, the study of aphasia provides insight into how damage to particular areas of the brain affects speech and language processing, and thus provides a crucial step in mapping out the neural systems underlying speech and language processing. To this end, much of the discussion focuses on word recognition deficits in Broca's and Wernicke's aphasics, two clinical syndromes that have provided the basis for much of the study of the neural basis of language. Clinically, Broca's aphasics have a profound expressive impairment in the face of relatively good auditory language comprehension. This article also considers deficits in processing the sound structure of language, graded activation of the lexicon, lexical competition, influence of word recognition on speech processing, and influence of sentential context on word recognition.

Download Full-text

Intraword Variability in French-Speaking Monolingual and Bilingual Children

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-20-00558 ◽

2021 ◽

pp. 1-19

Author(s):

Margaret M. Kehoe ◽

Emilie Cretton

Keyword(s):

Word Frequency ◽

Speech Processing ◽

Sound Production ◽

Speech Sound ◽

Processing System ◽

Neighborhood Density ◽

Bilingual Children ◽

Expressive Vocabulary ◽

French Speaking ◽

Phonological Complexity

Purpose This study examines intraword variability in 40 typically developing French-speaking monolingual and bilingual children, aged 2;6–4;8 (years;months). Specifically, it measures rate of intraword variability and investigates which factors best account for it. They include child-specific ones such as age, expressive vocabulary, gender, bilingual status, and speech sound production ability, and word-specific factors, such as phonological complexity (including number of syllables), phonological neighborhood density (PND), and word frequency. Method A variability test was developed, consisting of 25 words, which differed in terms of phonological complexity, PND, and word frequency. Children produced three exemplars of each word during a single session, and productions of words were coded as variable or not variable. In addition, children were administered an expressive vocabulary test and two tests tapping speech motor ability (oral motor assessment and diadochokinetic test). Speech sound ability was also assessed by measuring percent consonants correct on all words produced by the children during the session. Data were entered into a binomial logistic regression. Results Average intraword variability was 29% across all children. Several factors were found to predict intraword variability including age, gender, bilingual status, speech sound production ability, phonological complexity, and PND. Conclusions Intraword variability was found to be lower in French than what has been reported in English, consistent with phonological differences between French and English. Our findings support those of other investigators in indicating that the factors influencing intraword variability are multiple and reflect sources at various levels in the speech processing system.

Download Full-text

Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

Advances in Visual Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-76856-2_59 ◽

2007 ◽

pp. 602-610

Author(s):

Osamu Ikeda

Keyword(s):

Speech Sound ◽

Combined Analysis ◽

Mouth Movement

Download Full-text

THE ADVANTAGE OFTHEWAVELET TRANSFORM IN PROCESSING OF SPEECH SIGNALS

TECHNICAL SCIENCES ◽

10.26739/2181-9696-2021-3-6 ◽

2021 ◽

Vol 4 (3) ◽

pp. 37-41

Author(s):

Sayora Ibragimova ◽

Keyword(s):

Fourier Transform ◽

Wavelet Transform ◽

Wavelet Analysis ◽

Speech Processing ◽

High Performance ◽

Speech Sound ◽

Digital Signal ◽

Speech Signals ◽

Multi Scale ◽

Processing Of Speech Signals

This work deals with basic theory of wavelet transform and multi-scale analysis of speech signals, briefly reviewed the main differences between wavelet transform and Fourier transform in the analysis of speech signals. The possibilities to use the method of wavelet analysis to speech recognition systems and its main advantages. In most existing systems of recognition and analysis of speech sound considered as a stream of vectors whose elements are some frequency response. Therefore, the speech processing in real time using sequential algorithms requires computing resources with high performance. Examples of how this method can be used when processing speech signals and build standards for systems of recognition.Key words: digital signal processing, Fourier transform, wavelet analysis, speech signal, wavelet transform

Download Full-text

Infants' use of isolated and combined temporal cues in speech sound segregation

The Journal of the Acoustical Society of America ◽

10.1121/10.0001582 ◽

2020 ◽

Vol 148 (1) ◽

pp. 401-413

Author(s):

Monika-Maria Oster ◽

Lynne A. Werner

Keyword(s):

Speech Sound ◽

Sound Segregation ◽

Temporal Cues

Download Full-text

Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication

Cerebral Cortex ◽

10.1093/cercor/bhz138 ◽

2019 ◽

Vol 30 (3) ◽

pp. 942-951 ◽

Cited By ~ 5

Author(s):

Lanfang Liu ◽

Yuxuan Zhang ◽

Qi Zhou ◽

Douglas D Garrett ◽

Chunming Lu ◽

...

Keyword(s):

Speech Processing ◽

Auditory Processing ◽

Information Transfer ◽

Temporal Cortex ◽

Real Life ◽

Hierarchical Organization ◽

Speech Comprehension ◽

Level Information ◽

Motor Information ◽

High Level

Abstract Whether auditory processing of speech relies on reference to the articulatory motor information of speaker remains elusive. Here, we addressed this issue under a two-brain framework. Functional magnetic resonance imaging was applied to record the brain activities of speakers when telling real-life stories and later of listeners when listening to the audio recordings of these stories. Based on between-brain seed-to-voxel correlation analyses, we revealed that neural dynamics in listeners’ auditory temporal cortex are temporally coupled with the dynamics in the speaker’s larynx/phonation area. Moreover, the coupling response in listener’s left auditory temporal cortex follows the hierarchical organization for speech processing, with response lags in A1+, STG/STS, and MTG increasing linearly. Further, listeners showing greater coupling responses understand the speech better. When comprehension fails, such interbrain auditory-articulation coupling vanishes substantially. These findings suggest that a listener’s auditory system and a speaker’s articulatory system are inherently aligned during naturalistic verbal interaction, and such alignment is associated with high-level information transfer from the speaker to the listener. Our study provides reliable evidence supporting that references to the articulatory motor information of speaker facilitate speech comprehension under a naturalistic scene.

Download Full-text