speech categorization
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 23)

H-INDEX

13
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Gavin Bidelman ◽  
Jared Carter

Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context; listeners’ percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category (i.e., hysteresis). Here, we investigated whether such perceptual nonlinearities—which amplify categorical hearing—might further aid speech processing in noise-degraded listening scenarios. We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Listeners’ responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. These data describe a new functional benefit of perceptual nonlinearities to speech perception yet undocumented: warping strengthens the behavioral attraction to relevant speech categories while simultaneously assisting perception in degraded acoustic environments.


2021 ◽  
Author(s):  
Jared A. Carter ◽  
Eugene H. Buder ◽  
Gavin Bidelman

Surrounding context influences speech listening, resulting in dynamic shifts to category percepts. To examine its neural basis, event-related potentials (ERPs) were recorded during vowel identification with continua presented in random, forward, and backward orders to induce perceptual nonlinearities. Behaviorally, sequential order shifted listeners' categorical boundary vs. random delivery revealing perceptual warping (biasing) of the heard phonetic category dependent on recent stimulus history. ERPs revealed later (~300 ms) activity localized to superior temporal and middle/inferior frontal gyri that predicted listeners' hysteresis magnitudes. Findings demonstrate that top-down, stimulus history effects on speech categorization are governed by interactions between frontotemporal brain regions.


2021 ◽  
Author(s):  
Rakib Al-Fahad ◽  
Mohammed Yeasin ◽  
Kazi Ashraf Moinuddin ◽  
Gavin M Bidelman

Understanding the many-to-many mapping between patterns of functional brain connectivity and discrete behavioral responses is critical for speech-language processing. We present a microstate-based analysis of EEG recordings to characterize spatio-temporal dynamics of neural activities that underly rapid speech categorization decisions. We implemented a data driven approach using Bayesian non-parametrics to capture the mapping between EEG and the speed of listeners phoneme identification [i.e., response time (RT)] during speech labeling tasks. Based on our empirical analyses, we show task-relevant events such as resting-state, stimulus coding, auditory-perceptual object (category) formation, and response selection can be explained using patterns of micro-state dwell-time and are decodable as unique time segments during speech perception. State-dependent activities localize to a fronto-temporo-parietal circuit (superior temporal, supramarginal, inferior frontal gyri) exposing a core decision brain network (DN) underlying rapid speech categorization. Furthermore, RTs were inversely proportional to the frequency of state transitions, such that the rate of change between brain microstates was higher for trials with slower compared to faster RTs. Our findings imply that during rapid speech perception, higher uncertainty producing prolonged RTs (slower decision-making) is associated with staying in the DN longer compared lower RTs (faster decisions). We also show that listeners perceptual RTs are highly sensitive to individual differences. Our computational method opens a new avenue in segmentation and dynamic brain connectivity for modeling neuroimaging data and understanding task-related cognitive events.


2021 ◽  
Author(s):  
Ashley E Symons ◽  
Adam Tierney

Speech perception requires the integration of evidence from acoustic cues across multiple dimensions. Individuals differ in their cue weighting strategies, i.e. the weight they assign to different acoustic dimensions during speech categorization. In two experiments, we investigate musical training as one potential predictor of individual differences in prosodic cue weighting strategies. Attentional theories of speech categorization suggest that prior experience with the task-relevance of a particular acoustic dimensions leads that dimension to attract attention. Therefore, Experiment 1 tested whether musicians and non-musicians differed in their ability to selectively attend to pitch and loudness in speech. Compared to non-musicians, musicians showed enhanced dimension-selective attention to pitch but not loudness. In Experiment 2, we tested the hypothesis that musicians would show greater pitch weighting during prosodic categorization due to prior experience with the task-relevance of pitch cues in music. In this experiment, listeners categorized phrases that varied in the extent to which pitch and duration signaled the location of linguistic focus and phrase boundaries. During linguistic focus categorization only, musicians up-weighted pitch compared to non-musicians. These results suggest that musical training is linked with domain-general enhancements of the salience of pitch cues, and that this increase in pitch salience may lead to to an up-weighting of pitch during some prosodic categorization tasks. These findings also support attentional theories of cue weighting, in which more salient acoustic dimensions are given more importance during speech categorization.


2021 ◽  
Author(s):  
Vibha Viswanathan ◽  
Barbara G. Shinn-Cunningham ◽  
Michael G. Heinz

AbstractTemporal coherence of sound fluctuations across spectral channels is thought to aid auditory grouping and scene segregation. Although prior studies on the neural bases of temporal-coherence processing focused mostly on cortical contributions, neurophysiological evidence suggests that temporal-coherence-based scene analysis may start as early as the cochlear nucleus (i.e., the first auditory region supporting cross-channel processing over a wide frequency range). Accordingly, we hypothesized that aspects of temporal-coherence processing that could be realized in early auditory areas may shape speech understanding in noise. We then explored whether physiologically plausible computational models could account for results from a behavioral experiment that measured consonant categorization in different masking conditions. We tested whether within-channel masking of target-speech modulations predicted consonant confusions across the different conditions, and whether predicted performance was improved by adding across-channel temporal-coherence processing mirroring the computations known to exist in the cochlear nucleus. Consonant confusions provide a rich characterization of error patterns in speech categorization, and are thus crucial for rigorously testing models of speech perception; however, to the best of our knowledge, they have not been utilized in prior studies of scene analysis. We find that within-channel modulation masking can reasonably account for category confusions, but that it fails when temporal fine structure (TFS) cues are unavailable. However, the addition of across-channel temporal-coherence processing significantly improves confusion predictions across all tested conditions. Our results suggest that temporal-coherence processing strongly shapes speech understanding in noise, and that physiological computations that exist early along the auditory pathway may contribute to this process.


2021 ◽  
Author(s):  
Jinghua Ou ◽  
Alan Yu

Categorization is a fundamental cognitive ability to group different objects as the same. This ability is particularly indispensable for human speech perception, yet individual differences in speech categorization are nonetheless ubiquitous. The present study investigates the neurophysiological mechanisms underlying the variability in categorization of voice-onset time (VOT). Subcortical and cortical speech-evoked responses are recorded to investigate speech representations at two functional levels of auditory processing. Individual differences in psychometric functions correlate positively with how faithfully subcortical responses encode VOT differences. Moreover, individuals also differ in how strongly the subcortical and cortical representations correlate with each other. Listeners with gradient categorization show higher correspondences between the two representations, indicating that acoustic information is relayed faithfully from the subcortical to the cortical level; listeners with discrete categorization exhibit decreased similarity between the two representations, suggesting that the subcortical acoustic encoding is transformed at the cortical level to reflect phonetic category information.


2021 ◽  
Author(s):  
Efthymia C Kapnoula ◽  
Bob McMurray

Listeners vary in how they categorize speech sounds: some are more step-like, while others are more gradient. Recent work suggests that gradient listeners are more flexible in cue integration and recovery from misperceptions (Kapnoula et al., 2017, 2021). We investigated the source of these differences and asked how they cascade to lexical processing. Individual differences in speech categorization were assessed via a visual analogue scaling (VAS) task. Following Toscano et al. (2010), we used the N1 ERP component to track pre-categorical encoding of speech cues. Separate tasks were used to measure inhibitory control and lexical processes. The N1 linearly tracked the continuum, reflecting a fundamentally gradient speechperception; however, for step-like listeners this linearity was disrupted near the boundary. This suggests that, while all listeners are generally gradient, there are individual differences deriving from the idiosyncratic encoding of specific cues, and that cue-level gradiency cascadesthroughout the system.


Sign in / Sign up

Export Citation Format

Share Document