auditory signals
Recently Published Documents


TOTAL DOCUMENTS

206
(FIVE YEARS 25)

H-INDEX

28
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Mate Aller ◽  
Heidi Solberg Okland ◽  
Lucy J MacGregor ◽  
Helen Blank ◽  
Matthew H. Davis

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase locking to auditory and visual signals in MEG recordings from 14 human participants (6 female) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared to audio-only (AO) speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus (STG) did not show above-chance partial coherence with visual speech signals during AV conditions, but did show partial coherence in VO conditions. Hence, visual speech enabled stronger phase locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bruno Laeng ◽  
Sarjo Kuyateh ◽  
Tejaswinee Kelkar

AbstractCross-modal integration is ubiquitous within perception and, in humans, the McGurk effect demonstrates that seeing a person articulating speech can change what we hear into a new auditory percept. It remains unclear whether cross-modal integration of sight and sound generalizes to other visible vocal articulations like those made by singers. We surmise that perceptual integrative effects should involve music deeply, since there is ample indeterminacy and variability in its auditory signals. We show that switching videos of sung musical intervals changes systematically the estimated distance between two notes of a musical interval so that pairing the video of a smaller sung interval to a relatively larger auditory led to compression effects on rated intervals, whereas the reverse led to a stretching effect. In addition, after seeing a visually switched video of an equally-tempered sung interval and then hearing the same interval played on the piano, the two intervals were judged often different though they differed only in instrument. These findings reveal spontaneous, cross-modal, integration of vocal sounds and clearly indicate that strong integration of sound and sight can occur beyond the articulations of natural speech.


Author(s):  
xu chen ◽  
Shibo Wang ◽  
Houguang Liu ◽  
Jianhua Yang ◽  
Songyong Liu ◽  
...  

Abstract Many data-driven coal gangue recognition (CGR) methods based on the vibration or sound of collapsed coal and gangue have been proposed to achieve automatic CGR, which is important for realizing intelligent top-coal caving. However, the strong background noise and complex environment in underground coal mines render this task challenging in practical applications. Inspired by the fact that workers distinguish coal and gangue from underground noise by listening to the hydraulic support sound, we propose an auditory model based CGR method that simulates human auditory recognition by combining an auditory spectrogram with a convolutional neural network (CNN). First, we adjust the characteristic frequency (CF) distribution of the auditory peripheral model (APM) based on the spectral characteristics of collapsed sound signals from coal and gangue and then process the sound signals using the adjusted APM to obtain inferior colliculus auditory signals with multiple CFs. Subsequently, the auditory signals of all CFs are converted into gray images separately and then concatenated into a multichannel auditory spectrum along the channel dimension. Finally, we input the multichannel auditory spectrum as a feature map to the two-dimensional CNN, whose convolutional layers are used to automatically extract features, and the fully connected layer and softmax layer are used to flatten features and predict the recognition result, respectively. The CNN is optimized for the CGR based on a comparison study of four typical types of CNN structures with different network training hyperparameters. The experimental results show that this method affords an accurate CGR with a recognition accuracy of 99.5%. Moreover, this method offers excellent noise immunity compared with typically used CGR methods under various noisy conditions.


2021 ◽  
Author(s):  
Armein Z. R. Langi ◽  
Marco William Langi ◽  
Kusprasapta Mutijarsa ◽  
Yoanes Bandung
Keyword(s):  

2021 ◽  
Author(s):  
Marco William Langi ◽  
Kusprasapta Mutijarsa ◽  
Yoanes Bandung ◽  
Armein Z. R. Langi

2021 ◽  
Vol 263 (2) ◽  
pp. 4388-4393
Author(s):  
Rikako Abe ◽  
Sho Otsuka ◽  
Seiji Nakagawa

Disaster alerts are usually accompanied by auditory signals at the beginning. It is to be desired that the auditory signal itself produces the sense of warning. Effects of (1) degree of consonance and (2) temporal pattern of the auditory signal on the auditory impression of warning were investigated using paired-comparison tests. In the both tests, sequences of 3 triads were used as stimuli. First, 7 types of stimuli were generated by varying the degree of consonance of the triad (frequency ratio of sinusoids was varied systematically from 2:3:4, 4:5:6, 6:7:8, 8:9:10, 10:11:12, 12:13:14 through to 14:15:16). Each subject showed changes of the auditory impression of warning depending on the degree of consonance, however, variation among subjects were observed. Second, 21 types of stimuli were generated in total by changing several temporal parameters (duration of the triad, interval between the triads, duty rate of the sequence). The results indicated that the auditory impression of warning increased as the duration of the triad increased the interval between the triads decreased.


2021 ◽  
Vol 15 ◽  
Author(s):  
Timothy S. Balmer ◽  
Laurence O. Trussell

The dorsal cochlear nucleus (DCN) is the first site of multisensory integration in the auditory pathway of mammals. The DCN circuit integrates non-auditory information, such as head and ear position, with auditory signals, and this convergence may contribute to the ability to localize sound sources or to suppress perceptions of self-generated sounds. Several extrinsic sources of these non-auditory signals have been described in various species, and among these are first- and second-order trigeminal axonal projections. Trigeminal sensory signals from the face and ears could provide the non-auditory information that the DCN requires for its role in sound source localization and cancelation of self-generated sounds, for example, head and ear position or mouth movements that could predict the production of chewing or licking sounds. There is evidence for these axonal projections in guinea pigs and rats, although the size of the pathway is smaller than might be expected for a function essential for a prey animals’ survival. However, evidence for these projections in mice, an increasingly important species in auditory neuroscience, is lacking, raising questions about the universality of such proposed functions. We therefore investigated the presence of trigeminal projections to the DCN in mice, using viral and transgenic approaches. We found that the spinal trigeminal nucleus indeed projects to DCN, targeting granule cells and unipolar brush cells. However, direct axonal projections from the trigeminal ganglion itself were undetectable. Thus, secondary brainstem sources carry non-auditory signals to the DCN in mice that could provide a processed trigeminal signal to the DCN, but primary trigeminal afferents are not integrated directly by DCN.


2021 ◽  
Author(s):  
Jonathan Wilbiks ◽  
Julia Feld Strand ◽  
Violet Aurora Brown

Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as “measures of audiovisual integration,” the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the convergent validity of four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure, but found no significant correlations between any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may not be tapping into the same underlying construct.


Author(s):  
Valeria C Caruso ◽  
Daniel S Pages ◽  
Marc A. Sommer ◽  
Jennifer M Groh

Stimulus locations are detected differently by different sensory systems, but ultimately they yield similar percepts and behavioral responses. How the brain transcends initial differences to compute similar codes is unclear. We quantitatively compared the reference frames of two sensory modalities, vision and audition, across three interconnected brain areas involved in generating saccades, namely the frontal eye fields (FEF), lateral and medial parietal cortex (M/LIP), and superior colliculus (SC). We recorded from single neurons in head-restrained monkeys performing auditory- and visually-guided saccades from variable initial fixation locations, and evaluated whether their receptive fields were better described as eye-centered, head-centered, or hybrid (i.e. not anchored uniquely to head- or eye-orientation). We found a progression of reference frames across areas and across time, with considerable hybrid-ness and persistent differences between modalities during most epochs/brain regions. For both modalities, the SC was more eye-centered than the FEF, which in turn was more eye-centered than the predominantly hybrid M/LIP. In all three areas and temporal epochs from stimulus onset to movement, visual signals were more eye-centered than auditory signals. In the SC and FEF, auditory signals became more eye-centered at the time of the saccade than they were initially after stimulus onset, but only in the SC at the time of the saccade did the auditory signals become predominantly eye-centered. The results indicate that visual and auditory signals both undergo transformations, ultimately reaching the same final reference frame but via different dynamics across brain regions and time.


Sign in / Sign up

Export Citation Format

Share Document