Computational models of neural auditory processing

Author(s):  
R. Lyon
Author(s):  
Bernd J. Kröger

This chapter outlines a comprehensive neurocomputational model of voice and speech perception based on (i) already established computational models, as well as on (ii) neurophysiological data of the underlying neural processes. Neurocomputational models of speech perception comprise auditory as well as cognitive modules, in order to extract sound features as well as linguistic information (linguistic content). A model of voice and speech perception in addition needs to process paralinguistic information like gender, age, emotional or affective state of speaker, etc. It is argued here that modules of a neurocomputational model of voice and speech perception need to interact with modules which go beyond unimodal auditory processing because, for example, processing of paralinguistic information is closely related to such as visual facial perception. Thus, this chapter describes neural modelling of voice and speech perception in relation to general communication and social-interaction processes, which makes it necessary to develop a hypermodal processing approach.


2020 ◽  
Author(s):  
F. Cervantes Constantino ◽  
T. Sánchez-Costa ◽  
G.A. Cipriani ◽  
A. Carboni

AbstractSurroundings continually propagate audiovisual (AV) signals, and by attending we make clear and precise sense of those that matter at any given time. In such cases, parallel visual and auditory contributions may jointly serve as a basis for selection. It is unclear what hierarchical effects arise when initial selection criteria are unimodal, or involve uncertainty. Uncertainty in sensory information is a factor considered in computational models of attention proposing precision weighting as a primary mechanism for selection. The effects of visuospatial selection on auditory processing were investigated here with electroencephalography (EEG). We examined the encoding of random tone pips probabilistically associated to spatially-attended visual changes, via a temporal response function model (TRF) of the auditory EEG timeseries. AV precision, or temporal uncertainty, was manipulated across stimuli while participants sustained endogenous visuospatial attention. TRF data showed that cross-modal modulations were dominated by AV precision between auditory and visual onset times. The roles of unimodal (visuospatial and auditory) uncertainties, each a consequence of non-synchronous AV presentations, were further investigated. The TRF data demonstrated that visuospatial uncertainty in attended sector size determines transfer effects by enabling the visual priming of tones when relevant for auditory segregation, in line with top-down processing timescales. Auditory uncertainty in distractor proportion, on the other hand, determined susceptibility of early tone encoding to automatic change by incoming visual update processing. The findings provide a hierarchical account of the role of uni- and cross-modal sources of uncertainty on the neural encoding of sound dynamics in a multimodal attention task.


2020 ◽  
Author(s):  
Chen Ming ◽  
Stephanie Haro ◽  
Andrea Megela Simmons ◽  
James A. Simmons

AbstractComputational models of animal biosonar seek to identify critical aspects of echo processing responsible for the superior, real-time performance of echolocating bats and dolphins in target tracking and clutter rejection. The Spectrogram Correlation and Transformation (SCAT) model replicates aspects of biosonar imaging in both species by processing wideband biosonar sounds and echoes with auditory mechanisms identified from experiments with bats. The model acquires broadband biosonar broadcasts and echoes, represents them as time-frequency spectrograms using parallel bandpass filters, translates the filtered signals into ten parallel amplitude threshold levels, and then operates on the resulting time-of-occurrence values at each frequency to estimate overall echo range delay. It uses the structure of the echo spectrum by depicting it as a series of local frequency nulls arranged regularly along the frequency axis of the spectrograms after dechirping them relative to the broadcast. Computations take place entirely on the timing of threshold-crossing events for each echo relative to threshold-events for the broadcast. Threshold-crossing times take into account amplitude-latency trading, a physiological feature absent from conventional digital signal processing. Amplitude-latency trading transposes the profile of amplitudes across frequencies into a profile of time-registrations across frequencies. Target shape is extracted from the spacing of the object’s individual acoustic reflecting points, or glints, using the mutual interference pattern of peaks and nulls in the echo spectrum. These are merged with the overall range-delay estimate to produce a delay-based reconstruction of the object’s distance as well as its glints. Clutter echoes indiscriminately activate multiple parts in the null-detecting system, which then produces the equivalent glint-delay spacings in images, thus blurring the overall echo-delay estimates by adding spurious glint delays to the image. Blurring acts as an anticorrelation process that rejects clutter intrusion into perceptions.Author summaryBats and dolphins use their biological sonar as a versatile, high-resolution perceptual system that performs at levels desirable in man-made sonar or radar systems. To capture the superior real-time capabilities of biosonar so they can be imported into the design of new man-made systems, we developed a computer model of the sonar receiver used by echolocating bats and dolphins. Our intention was to discover the processing methods responsible for the animals’ ability to find and identify targets, guide locomotion, and prevent classic types of sonar or radar interference that hamper performance of man-made systems in complex, rapidly-changing surroundings. We have identified several features of the ears, hearing, time-frequency representation, and auditory processing that are critical for organizing echo-processing methods and display manifested in the animals’ perceptions.


2021 ◽  
Author(s):  
Aviv Dotan ◽  
Oren Shriki

AbstractSensory deprivation has long been known to cause hallucinations or “phantom” sensations, the most common of which is tinnitus induced by hearing loss, affecting 10–20% of the population. An observable hearing loss, causing auditory sensory deprivation over a band of frequencies, is present in over 90% of people with tinnitus. Existing plasticity-based computational models for tinnitus are usually driven by homeostasis mechanisms, modeled to fit phenomenological findings. Here, we use an objective-driven learning algorithm to model an early auditory processing neuronal network, e.g., in the dorsal cochlear nucleus. The learning algorithm maximizes the network’s output entropy by learning the feed-forward and recurrent interactions in the model. We show that the connectivity patterns and responses learned by the model display several hallmarks of early auditory neuronal networks. We further demonstrate that attenuation of peripheral inputs drives the recurrent network towards its critical point and transition into a tinnitus-like state. In this state, the network activity resembles responses to genuine inputs even in the absence of external stimulation, namely, it “hallucinates” auditory responses. These findings demonstrate how objective-driven plasticity mechanisms that normally act to optimize the network’s input representation can also elicit pathologies such as tinnitus as a result of sensory deprivation.Author summaryTinnitus or “ringing in the ears” is a common pathology. It may result from mechanical damage in the inner ear, as well as from certain drugs such as salicylate (aspirin). A common approach toward a computational model for tinnitus is to use a neural network model with inherent plasticity applied to early auditory processing, where the input layer models the auditory nerve and the output layer models a nucleus in the brain stem. However, most of the existing computational models are phenomenological in nature, driven by a homeostatic principle. Here, we use an objective-driven learning algorithm based on information theory to learn the feed-forward interactions between the layers, as well as the recurrent interactions within the output layer. Through numerical simulations of the learning process, we show that attenuation of peripheral inputs drives the network into a tinnitus-like state, where the network activity resembles responses to genuine inputs even in the absence of external stimulation; namely, it “hallucinates” auditory responses. These findings demonstrate how plasticity mechanisms that normally act to optimize network performance can also lead to undesired outcomes, such as tinnitus, as a result of reduced peripheral hearing.


2021 ◽  
Vol 17 (12) ◽  
pp. e1008664
Author(s):  
Aviv Dotan ◽  
Oren Shriki

Sensory deprivation has long been known to cause hallucinations or “phantom” sensations, the most common of which is tinnitus induced by hearing loss, affecting 10–20% of the population. An observable hearing loss, causing auditory sensory deprivation over a band of frequencies, is present in over 90% of people with tinnitus. Existing plasticity-based computational models for tinnitus are usually driven by homeostatic mechanisms, modeled to fit phenomenological findings. Here, we use an objective-driven learning algorithm to model an early auditory processing neuronal network, e.g., in the dorsal cochlear nucleus. The learning algorithm maximizes the network’s output entropy by learning the feed-forward and recurrent interactions in the model. We show that the connectivity patterns and responses learned by the model display several hallmarks of early auditory neuronal networks. We further demonstrate that attenuation of peripheral inputs drives the recurrent network towards its critical point and transition into a tinnitus-like state. In this state, the network activity resembles responses to genuine inputs even in the absence of external stimulation, namely, it “hallucinates” auditory responses. These findings demonstrate how objective-driven plasticity mechanisms that normally act to optimize the network’s input representation can also elicit pathologies such as tinnitus as a result of sensory deprivation.


Author(s):  
Jacob Pennington ◽  
Stephen David

AbstractAn important step toward understanding how the brain represents complex natural sounds is to develop accurate models of auditory coding by single neurons. A common model for auditory coding is the linear-nonlinear spectro-temporal receptive field (LN model). The LN model accounts for many features of auditory tuning, but it cannot account for long-lasting effects of sensory context on sound-evoked activity. Two mechanisms that may support these contextual effects are short-term plasticity (STP) and contrast-dependent gain control (GC), each of which has inspired an expanded version of the LN model. Both of these models improve performance over the LN model, but they have never been compared directly. Thus, it is unclear whether they account for distinct processes or describe the same phenomenon in different ways. To address this question, we recorded activity of neurons in primary auditory cortex of awake ferrets during presentation of natural sounds. We then fit models incorporating one nonlinear mechanism (GC or STP) or both (GC+STP) using this single dataset, and measured the correlation between the models’ predictions and the recorded neural activity. Both the STP and GC models performed significantly better than the LN model, but the GC+STP model performed better than either individual model. We also quantified the similarity between STP and GC model predictions and found only modest equivalence between them. Similar results were observed for a smaller dataset collected in clean and noisy acoustic contexts. These results suggest that the STP and GC models describe distinct, complementary processes in the auditory system.Significance StatementComputational models are used widely to study neural sensory coding. However, models developed in separate studies are often difficult to compare because of differences in stimuli and experimental preparation. This study develops an approach for making systematic comparisons between models that measures the net benefit of incorporating additional nonlinear elements into models of auditory encoding. This approach was then used to compare two different hypotheses for how sensory context, that is, slow changes in the statistics of the acoustic environment, influences activity in auditory cortex. Both models accounted for complementary aspects of the neural response, indicating that a hybrid model incorporating elements of both models provides the most complete characterization of auditory processing.


2020 ◽  
Vol 29 (4) ◽  
pp. 710-727
Author(s):  
Beula M. Magimairaj ◽  
Naveen K. Nagaraj ◽  
Alexander V. Sergeev ◽  
Natalie J. Benafield

Objectives School-age children with and without parent-reported listening difficulties (LiD) were compared on auditory processing, language, memory, and attention abilities. The objective was to extend what is known so far in the literature about children with LiD by using multiple measures and selective novel measures across the above areas. Design Twenty-six children who were reported by their parents as having LiD and 26 age-matched typically developing children completed clinical tests of auditory processing and multiple measures of language, attention, and memory. All children had normal-range pure-tone hearing thresholds bilaterally. Group differences were examined. Results In addition to significantly poorer speech-perception-in-noise scores, children with LiD had reduced speed and accuracy of word retrieval from long-term memory, poorer short-term memory, sentence recall, and inferencing ability. Statistically significant group differences were of moderate effect size; however, standard test scores of children with LiD were not clinically poor. No statistically significant group differences were observed in attention, working memory capacity, vocabulary, and nonverbal IQ. Conclusions Mild signal-to-noise ratio loss, as reflected by the group mean of children with LiD, supported the children's functional listening problems. In addition, children's relative weakness in select areas of language performance, short-term memory, and long-term memory lexical retrieval speed and accuracy added to previous research on evidence-based areas that need to be evaluated in children with LiD who almost always have heterogenous profiles. Importantly, the functional difficulties faced by children with LiD in relation to their test results indicated, to some extent, that commonly used assessments may not be adequately capturing the children's listening challenges. Supplemental Material https://doi.org/10.23641/asha.12808607


2019 ◽  
Vol 28 (4) ◽  
pp. 834-842
Author(s):  
Harini Vasudevan ◽  
Hari Prakash Palaniswamy ◽  
Ramaswamy Balakrishnan

Purpose The main purpose of the study is to explore the auditory selective attention abilities (using event-related potentials) and the neuronal oscillatory activity in the default mode network sites (using electroencephalogram [EEG]) in individuals with tinnitus. Method Auditory selective attention was measured using P300, and the resting state EEG was assessed using the default mode function analysis. Ten individuals with continuous and bothersome tinnitus along with 10 age- and gender-matched control participants underwent event-related potential testing and 5 min of EEG recording (at wakeful rest). Results Individuals with tinnitus were observed to have larger N1 and P3 amplitudes along with prolonged P3 latency. The default mode function analysis revealed no significant oscillatory differences between the groups. Conclusion The current study shows changes in both the early sensory and late cognitive components of auditory processing. The change in the P3 component is suggestive of selective auditory attention deficit, and the sensory component (N1) suggests an altered bottom-up processing in individuals with tinnitus.


Sign in / Sign up

Export Citation Format

Share Document