Hearing voices then seeing lips: Fragmentation and renormalisation of subjective timing in the McGurk illusion

2012 ◽  
Vol 25 (0) ◽  
pp. 9
Author(s):  
Elliot D. Freeman ◽  
Alberta Ipser

Due to physical and neural delays, the sight and sound of a person speaking causes a cachophony of asynchronous events in the brain. How can we still perceive them as simultaneous? Our converging evidence suggests that actually, we do not. Patient PH, with midbrain and auditory brainstem lesions, experiences voices leading lipmovements by approximately 200 ms. In temporal order judgements (TOJ) he experiences simultaneity only when voices physically lag lips. In contrast, he requires the opposite visual lag (again of about 200 ms) to experience the classic McGurk illusion (e.g., hearing ‘da’ when listening to /ba/ and watching lips say [ga]), consistent with pathological auditory slowing. These delays seem to be specific to speech stimuli. Is PH just an anomaly? Surprisingly, neuro-typical individual differences between temporal tuning of McGurk integration and TOJ are actually negatively correlated. Thus some people require a small auditory lead for optimal McG but an auditory lag for subjective simultaneity (like PH but not as extreme), while others show the opposite pattern. Evidently, any individual can concurrently experience the same external events as happening at different times. These dissociative patterns confirm that distinct mechanisms for audiovisual synchronization versus integration are each subject to different neural delays. To explain the apparent repulsion of their respective timings, we propose that multimodal synchronization is achieved by discounting the average neural event time within each modality. Lesions or individual differences which slow the propagation of neural signals will then attract the average, so that relatively undelayed neural signals will be experienced as occurring relatively early.

2012 ◽  
Vol 25 (0) ◽  
pp. 14-15
Author(s):  
Alberta Ipser ◽  
Diana Paunoiu ◽  
Elliot D. Freeman

It has often been claimed that there is mutual dependence between the perceived synchrony of auditory and visual sources, and the extent to which they perceptually integrate (‘unity assumption’: Vroomen and Keetels, 2010; Welsh and Warren, 1980). However subjective audiovisual synchrony can vary widely between subjects (Stone, 2001) and between paradigms (van Eijk, 2008). Do such individual differences in subjective synchrony correlate positively with individual differences in optimal timing for integration, as expected under the unity assumption? In separate experiments we measured the optimal audiovisual asynchrony for the McGurk illusion (McGurk and MacDonald, 1976), and the stream-bounce illusion (Sekuler et al., 1997). We concurrently elicited either temporal order judgements (TOJ) or simultaneity judgements (SJ), in counterbalanced sessions, from which we derived the point of subjective simultaneity (PSS). For both experiments, the asynchrony for maximum illusion showed a significant positive correlation with PSS derived from SJ, following the unity assumption. But surprisingly, the analogous correlation with PSS derived from TOJ was significantly negative. The temporal mechanisms for this pairing of tasks seem neither unitary nor fully independent, but apparently antagonistic. A tentative temporal renormalisation mechanism explains these paradoxical results as follows: (1) subjective timing in our different tasks can depend on independent mechanisms subject to their own neural delays; (2) inter-modal synchronization is achieved by first discounting the mean neural delay within each modality; and (3) apparent antagonism between estimates of subjective timing emerges as the mean is attracted towards deviants in the unimodal temporal distribution.


2012 ◽  
Vol 36 (1) ◽  
pp. 26-27 ◽  
Author(s):  
Sara Konrath ◽  
Irene Cheung

AbstractWe review two subjective (mis)perceptions that influence revenge and forgiveness systems. Individual differences predict more (e.g., narcissism) or less (e.g., empathy) revenge, with the opposite pattern for forgiveness. Moreover, differences in victim versus perpetrator perceptions can influence revenge and forgiveness systems, perpetuating never-ending cycles of revenge. These two examples point to the need for theories of revenge and forgiveness to address the role of cognitive and motivational biases in the functionality of such behavioral responses.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260090
Author(s):  
Emanuele Perugia ◽  
Ghada BinKhamis ◽  
Josef Schlittenlacher ◽  
Karolina Kluk

Current clinical strategies to assess benefits from hearing aids (HAs) are based on self-reported questionnaires and speech-in-noise (SIN) tests; which require behavioural cooperation. Instead, objective measures based on Auditory Brainstem Responses (ABRs) to speech stimuli would not require the individuals’ cooperation. Here, we re-analysed an existing dataset to predict behavioural measures with speech-ABRs using regression trees. Ninety-two HA users completed a self-reported questionnaire (SSQ-Speech) and performed two aided SIN tests: sentences in noise (BKB-SIN) and vowel-consonant-vowels (VCV) in noise. Speech-ABRs were evoked by a 40 ms [da] and recorded in 2x2 conditions: aided vs. unaided and quiet vs. background noise. For each recording condition, two sets of features were extracted: 1) amplitudes and latencies of speech-ABR peaks, 2) amplitudes and latencies of speech-ABR F0 encoding. Two regression trees were fitted for each of the three behavioural measures with either feature set and age, digit-span forward and backward, and pure tone average (PTA) as possible predictors. The PTA was the only predictor in the SSQ-Speech trees. In the BKB-SIN trees, performance was predicted by the aided latency of peak F in quiet for participants with PTAs between 43 and 61 dB HL. In the VCV trees, performance was predicted by the aided F0 encoding latency and the aided amplitude of peak VA in quiet for participants with PTAs ≤ 47 dB HL. These findings indicate that PTA was more informative than any speech-ABR measure, as these were relevant only for a subset of the participants. Therefore, speech-ABRs evoked by a 40 ms [da] are not a clinical predictor of behavioural measures in HA users.


2021 ◽  
Vol 15 ◽  
Author(s):  
Florine L. Bachmann ◽  
Ewen N. MacDonald ◽  
Jens Hjortkjær

Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.


2021 ◽  
Author(s):  
Jonathan Wilbiks ◽  
Julia Feld Strand ◽  
Violet Aurora Brown

Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as “measures of audiovisual integration,” the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the convergent validity of four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure, but found no significant correlations between any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may not be tapping into the same underlying construct.


Sign in / Sign up

Export Citation Format

Share Document