auditory scene
Recently Published Documents


TOTAL DOCUMENTS

405
(FIVE YEARS 57)

H-INDEX

38
(FIVE YEARS 3)

Author(s):  
Lisa Straetmans ◽  
B. Holtze ◽  
Stefan Debener ◽  
Manuela Jaeger ◽  
Bojana Mirkovic

Abstract Objective. Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. Approach. Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. Main results. The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. Conclusion. Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Sihua Sun

Audio scene recognition is a task that enables devices to understand their environment through digital audio analysis. It belongs to a branch of the field of computer auditory scene. At present, this technology has been widely used in intelligent wearable devices, robot sensing services, and other application scenarios. In order to explore the applicability of machine learning technology in the field of digital audio scene recognition, an audio scene recognition method based on optimized audio processing and convolutional neural network is proposed. Firstly, different from the traditional audio feature extraction method using mel-frequency cepstrum coefficient, the proposed method uses binaural representation and harmonic percussive source separation method to optimize the original audio and extract the corresponding features, so that the system can make use of the spatial features of the scene and then improve the recognition accuracy. Then, an audio scene recognition system with two-layer convolution module is designed and implemented. In terms of network structure, we try to learn from the VGGNet structure in the field of image recognition to increase the network depth and improve the system flexibility. Experimental data analysis shows that compared with traditional machine learning methods, the proposed method can greatly improve the recognition accuracy of each scene and achieve better generalization effect on different data.


2021 ◽  
Vol 126 (4) ◽  
pp. 1314-1325
Author(s):  
Michaela Warnecke ◽  
James A. Simmons ◽  
Andrea Megela Simmons

Echolocating bats navigate through cluttered environments that return cascades of echoes in response to the bat’s broadcasts. We show that local field potentials from the big brown bat’s auditory midbrain have consistent responses to a simulated echo cascade varying across echo delays and stimulus amplitudes, despite different underlying individual neuronal selectivities. These results suggest that population activity in the midbrain can build a cohesive percept of an auditory scene by aggregating activity over neuronal subpopulations.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lars Hausfeld ◽  
Niels R. Disbergen ◽  
Giancarlo Valente ◽  
Robert J. Zatorre ◽  
Elia Formisano

Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument’s representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.


2021 ◽  
pp. 420-436
Author(s):  
Sue Denham ◽  
István Winkler

Our perceptual systems provide us with information about the world around us and the things within it. However, understanding this apparently simple function is surprisingly difficult. In this chapter we focus on auditory perception and the ways in which we use sound to obtain information of the behaviour of objects in our environment. After a brief description of the auditory system, we discuss auditory scene analysis and the problem of partitioning the combined information from an unknown number of sources into the discrete perceptual objects with which we interact. Through this discussion, we conclude that auditory processing is shaped by the need to flexibly engage with the rhythms of living organisms and temporal regularities in the world.


2021 ◽  
Vol 15 ◽  
Author(s):  
Natsumi Y. Homma ◽  
Victoria M. Bajo

Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ivine Kuruvila ◽  
Jan Muncke ◽  
Eghart Fischer ◽  
Ulrich Hoppe

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.


Litera ◽  
2021 ◽  
pp. 70-80
Author(s):  
Tamara Anikyan

The subject of this research is the expressive potential of prosody in the 2021 inaugural speech of Joe Biden. Analysis is conducted on the peculiarities of functionality of such prosodic means as melody, accentuation, pausation, rhythm and other. Assessment is given to their interaction with the widespread stylistic techniques, as well as their role in carrying out the traditional functions for inaugural rhetoric that determine its genre distinctness. The article employs the method of auditory scene analysis of speech of the political, which vividly illustrates the significance of modifications of suprasegmental parameters for conveying the communicative intent of the speech. The scientific novelty lies in studying the expressive capabilities of prosodic means within a specific variety of political discourse – the inaugural speech as a genre of epideictic rhetoric, viewing the implementation of specific functions in the unity of linguistic and extralinguistic factors. Attention is given to the general peculiarities of discursive practice of the inaugural speeches, as well as the context of a specific communicative situation – unprecedented circumstances of delivering speech by the the 46th President of the United States, as well as the personal traits of the speaker. The acquired results demonstrate the expressive potential of prosodic modifications in the oral speech, which can be used in teaching students majoring in philology the principles of text analysis of the political discourse through the prism of prosody, expressive syntax, stylistics, and rhetoric.


Sign in / Sign up

Export Citation Format

Share Document