cocktail party problem
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 26)

H-INDEX

20
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Peter C. Bermant

AbstractWe introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust U-Net-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.


2021 ◽  
Vol 17 (8) ◽  
pp. e1009356
Author(s):  
Kenny F. Chou ◽  
Kamal Sen

Attentional modulation of cortical networks is critical for the cognitive flexibility required to process complex scenes. Current theoretical frameworks for attention are based almost exclusively on studies in visual cortex, where attentional effects are typically modest and excitatory. In contrast, attentional effects in auditory cortex can be large and suppressive. A theoretical framework for explaining attentional effects in auditory cortex is lacking, preventing a broader understanding of cortical mechanisms underlying attention. Here, we present a cortical network model of attention in primary auditory cortex (A1). A key mechanism in our network is attentional inhibitory modulation (AIM) of cortical inhibitory neurons. In this mechanism, top-down inhibitory neurons disinhibit bottom-up cortical circuits, a prominent circuit motif observed in sensory cortex. Our results reveal that the same underlying mechanisms in the AIM network can explain diverse attentional effects on both spatial and frequency tuning in A1. We find that a dominant effect of disinhibition on cortical tuning is suppressive, consistent with experimental observations. Functionally, the AIM network may play a key role in solving the cocktail party problem. We demonstrate how attention can guide the AIM network to monitor an acoustic scene, select a specific target, or switch to a different target, providing flexible outputs for solving the cocktail party problem.


Author(s):  
Alistair J. Harvey ◽  
C. Philip Beaman

Abstract Rationale To test the notion that alcohol impairs auditory attentional control by reducing the listener’s cognitive capacity. Objectives We examined the effect of alcohol consumption and working memory span on dichotic speech shadowing and the cocktail party effect—the ability to focus on one of many simultaneous speakers yet still detect mention of one’s name amidst the background speech. Alcohol was expected either to increase name detection, by weakening the inhibition of irrelevant speech, or reduce name detection, by restricting auditory attention on to the primary input channel. Low-span participants were expected to show larger drug impairments than high-span counterparts. Methods On completion of the working memory span task, participants (n = 81) were randomly assigned to an alcohol or placebo beverage treatment. After alcohol absorption, they shadowed speech presented to one ear while ignoring the synchronised speech of a different speaker presented to the other. Each participant’s first name was covertly embedded in to-be-ignored speech. Results The “cocktail party effect” was not affected by alcohol or working memory span, though low-span participants made more shadowing errors and recalled fewer words from the primary channel than high-span counterparts. Bayes factors support a null effect of alcohol on the cocktail party phenomenon, on shadowing errors and on memory for either shadowed or ignored speech. Conclusion Findings suggest that an alcoholic beverage producing a moderate level of intoxication (M BAC ≈ 0.08%) neither enhances nor impairs the cocktail party effect.


Author(s):  
Stephen Grossberg

This far-ranging chapter provides unified explanations of data about audition, speech, and language, and the general cognitive processes that they specialize. The ventral What stream and dorsal Where cortical stream in vision have analogous ventral sound-to-meaning and dorsal sound-to-action streams in audition. Circular reactions for learning to reach using vision are homologous to circular reactions for learning to speak using audition. VITE circuits control arm movement properties of synergy, synchrony, and speed. Volitional basal ganglia GO signals choose which limb to move and how fast it moves. VAM models use a circular reaction to calibrate VITE circuit signals. VITE is joined with the FLETE model to compensate for variable loads, unexpected perturbations, and obstacles. Properties of cells in cortical areas 4 and 5, spinal cord, and cerebellum are quantitatively simulated. Motor equivalent reaching using clamped joints or tools arises from circular reactions that learn representations of space around an actor. Homologous circuits model motor-equivalent speech production, including coarticulation. Stream-shroud resonances play the role for audition that surface-shroud resonances play in vision. They support auditory consciousness and speech production. Strip maps and spectral-pitch resonances cooperate to solve the cocktail party problem whereby humans track voices of speakers in noisy environments with multiple sources. Auditory streaming and speaker normalization use networks with similar designs. Item-Order-Rank working memories and Masking Field networks temporarily store sequences of events while categorizing them into list chunks. Analog numerical representations and place-value number systems emerge from phylogenetically earlier Where and What stream spatial and categorical processes.


2021 ◽  
Author(s):  
Peter C Bermant

We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust UNet-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.


2021 ◽  
Author(s):  
Emma Holmes ◽  
Thomas Parr ◽  
Timothy D Griffiths ◽  
Karl Friston

In this paper, we introduce a new generative model for an active inference account of preparatory and selective attention, in the context of a classic ‘cocktail party’ paradigm. In this setup, two talkers speak simultaneously and an instructive spatial cue directs attention to the left or right talker. We use this generative model to test competing hypotheses about the way that human listeners direct preparatory and selective attention. We show that assigning low precision to words at attended—relative to unattended—locations can explain why a listener reports words from a competing sentence. Under this model, temporal changes in sensory precision were not needed to account for faster reaction times with longer cue-target intervals, but were necessary to explain ramping effects on event-related potentials—resembling the contingent negative variation (CNV)—during the preparatory interval. These simulations demonstrate that behavioural and electrophysiological correlates of voluntary attention emerge from neuronally plausible belief updating or message passing and, crucially, distinguish between the effects of deploying precision in different parts of a generative model.


Sign in / Sign up

Export Citation Format

Share Document