auditory scene analysis
Recently Published Documents


TOTAL DOCUMENTS

315
(FIVE YEARS 49)

H-INDEX

36
(FIVE YEARS 3)

Author(s):  
Henri Pöntynen ◽  
Nelli Salminen

AbstractSpatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.


2021 ◽  
Vol 15 ◽  
Author(s):  
Lars Hausfeld ◽  
Niels R. Disbergen ◽  
Giancarlo Valente ◽  
Robert J. Zatorre ◽  
Elia Formisano

Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument’s representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.


2021 ◽  
pp. 420-436
Author(s):  
Sue Denham ◽  
István Winkler

Our perceptual systems provide us with information about the world around us and the things within it. However, understanding this apparently simple function is surprisingly difficult. In this chapter we focus on auditory perception and the ways in which we use sound to obtain information of the behaviour of objects in our environment. After a brief description of the auditory system, we discuss auditory scene analysis and the problem of partitioning the combined information from an unknown number of sources into the discrete perceptual objects with which we interact. Through this discussion, we conclude that auditory processing is shaped by the need to flexibly engage with the rhythms of living organisms and temporal regularities in the world.


2021 ◽  
Author(s):  
Vibha Viswanathan ◽  
Barbara G. Shinn-Cunningham ◽  
Michael G. Heinz

AbstractTemporal coherence of sound fluctuations across spectral channels is thought to aid auditory grouping and scene segregation. Although prior studies on the neural bases of temporal-coherence processing focused mostly on cortical contributions, neurophysiological evidence suggests that temporal-coherence-based scene analysis may start as early as the cochlear nucleus (i.e., the first auditory region supporting cross-channel processing over a wide frequency range). Accordingly, we hypothesized that aspects of temporal-coherence processing that could be realized in early auditory areas may shape speech understanding in noise. We then explored whether physiologically plausible computational models could account for results from a behavioral experiment that measured consonant categorization in different masking conditions. We tested whether within-channel masking of target-speech modulations predicted consonant confusions across the different conditions, and whether predicted performance was improved by adding across-channel temporal-coherence processing mirroring the computations known to exist in the cochlear nucleus. Consonant confusions provide a rich characterization of error patterns in speech categorization, and are thus crucial for rigorously testing models of speech perception; however, to the best of our knowledge, they have not been utilized in prior studies of scene analysis. We find that within-channel modulation masking can reasonably account for category confusions, but that it fails when temporal fine structure (TFS) cues are unavailable. However, the addition of across-channel temporal-coherence processing significantly improves confusion predictions across all tested conditions. Our results suggest that temporal-coherence processing strongly shapes speech understanding in noise, and that physiological computations that exist early along the auditory pathway may contribute to this process.


2021 ◽  
Vol 15 ◽  
Author(s):  
Natsumi Y. Homma ◽  
Victoria M. Bajo

Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.


Litera ◽  
2021 ◽  
pp. 70-80
Author(s):  
Tamara Anikyan

The subject of this research is the expressive potential of prosody in the 2021 inaugural speech of Joe Biden. Analysis is conducted on the peculiarities of functionality of such prosodic means as melody, accentuation, pausation, rhythm and other. Assessment is given to their interaction with the widespread stylistic techniques, as well as their role in carrying out the traditional functions for inaugural rhetoric that determine its genre distinctness. The article employs the method of auditory scene analysis of speech of the political, which vividly illustrates the significance of modifications of suprasegmental parameters for conveying the communicative intent of the speech. The scientific novelty lies in studying the expressive capabilities of prosodic means within a specific variety of political discourse – the inaugural speech as a genre of epideictic rhetoric, viewing the implementation of specific functions in the unity of linguistic and extralinguistic factors. Attention is given to the general peculiarities of discursive practice of the inaugural speeches, as well as the context of a specific communicative situation – unprecedented circumstances of delivering speech by the the 46th President of the United States, as well as the personal traits of the speaker. The acquired results demonstrate the expressive potential of prosodic modifications in the oral speech, which can be used in teaching students majoring in philology the principles of text analysis of the political discourse through the prism of prosody, expressive syntax, stylistics, and rhetoric.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5005
Author(s):  
Caleb Rascon

Beamforming is a type of audio array processing techniques used for interference reduction, sound source localization, and as pre-processing stage for audio event classification and speaker identification. The auditory scene analysis community can benefit from a systemic evaluation and comparison between different beamforming techniques. In this paper, five popular beamforming techniques are evaluated in two different acoustic environments, while varying the number of microphones, the number of interferences, and the direction-of-arrival error, by using the Acoustic Interactions for Robot Audition (AIRA) corpus and a common software framework. Additionally, a highly efficient phase-based frequency masking beamformer is also evaluated, which is shown to outperform all five techniques. Both the evaluation corpus and the beamforming implementations are freely available and provided for experiment repeatability and transparency. Raw results are also provided as a complement to this work to the reader, to facilitate an informed decision of which technique to use. Finally, the insights and tendencies observed from the evaluation results are presented.


2021 ◽  
Vol 15 (3-4) ◽  
pp. 202-222
Author(s):  
Finn Upham ◽  
Julie Cumming

How did Renaissance listeners experience the polyphonic mass ordinary cycle in the soundscape of the church? We hypothesize that the textural differences in complexity between mass movements allowed listeners to track the progress of the service, regardless of intelligibility of the text or sophisticated musical knowledge.  Building on the principles of auditory scene analysis, this article introduces the Auditory Streaming Complexity Estimate, a measure to evaluate the blending or separation of each part in polyphony, resulting in a moment-by-moment tally of how many independent streams or sound objects might be heard. When applied to symbolic scores for a corpus of 216 polyphonic mass ordinary cycles composed between c. 1450 and 1600, we show that the Streaming Complexity Estimate captures information distinct from the number of parts in the score or the distribution of voices active through the piece. While composers did not all follow the same relative complexity strategy for mass ordinary movements, there is a robust hierarchy emergent from the corpus as a whole: a shallow V shape with the Credo as the least complex and the Agnus Dei as the most. The streaming complexity of masses also significantly increased over the years represented in this corpus.


Sign in / Sign up

Export Citation Format

Share Document