speech encoding
Recently Published Documents


TOTAL DOCUMENTS

89
(FIVE YEARS 13)

H-INDEX

17
(FIVE YEARS 0)

2021 ◽  
Author(s):  
T. Christina Zhao ◽  
Fernando Llanos ◽  
Bharath Chandrasekaran ◽  
Patricia K. Kuhl

The sensitive period for phonetic learning (6~12 months), evidenced by increases in native and declines in nonnative speech processing, represents an early milestone in language acquisition. We examined the extent that sensory encoding of speech is altered by experience during this period by testing two hypotheses: 1) early sensory encoding of nonnative speech declines as infants gain native-language experience, and 2) music intervention reverses this decline. We longitudinally measured the frequency-following response (FFR), a robust indicator of early sensory encoding along the auditory pathway, to a Mandarin lexical tone in 7- and 11-months-old monolingual English-learning infants. Infants received music intervention (music-intervention group) or no intervention (language-experience group) randomly between FFR recordings. The language-experience group exhibited the expected decline in FFR pitch-tracking accuracy to the Mandarin tone while the music-intervention group did not. Our results support both hypotheses and demonstrate that both language and music experience alter infants’ speech encoding.


Author(s):  
Patrick C. M. Wong ◽  
Ching Man Lai ◽  
Peggy H. Y. Chan ◽  
Ting Fan Leung ◽  
Hugh Simon Lam ◽  
...  

Purpose This study aimed to construct an objective and cost-effective prognostic tool to forecast the future language and communication abilities of individual infants. Method Speech-evoked electroencephalography (EEG) data were collected from 118 infants during the first year of life during the exposure to speech stimuli that differed principally in fundamental frequency. Language and communication outcomes, namely four subtests of the MacArthur–Bates Communicative Development Inventories (MCDI)–Chinese version, were collected between 3 and 16 months after initial EEG testing. In the two-way classification, children were classified into those with future MCDI scores below the 25th percentile for their age group and those above the same percentile, while the three-way classification classified them into < 25th, 25th–75th, and > 75th percentile groups. Machine learning (support vector machine classification) with cross validation was used for model construction. Statistical significance was assessed. Results Across the four MCDI measures of early gestures, later gestures, vocabulary comprehension, and vocabulary production, the areas under the receiver-operating characteristic curve of the predictive models were respectively .92 ± .031, .91 ± .028, .90 ± .035, and .89 ± .039 for the two-way classification, and .88 ± .041, .89 ± .033, .85 ± .047, and .85 ± .050 for the three-way classification ( p < .01 for all models). Conclusions Future language and communication variability can be predicted by an objective EEG method that indicates the function of the auditory neural pathway foundational to spoken language development, with precision sufficient for individual predictions. Longer-term research is needed to assess predictability of categorical diagnostic status. Supplemental Material https://doi.org/10.23641/asha.15138546


2021 ◽  
pp. 1-59
Author(s):  
Nikolay Novitskiy ◽  
Akshay R Maggu ◽  
Ching Man Lai ◽  
Peggy H Y Chan ◽  
Kay H Y Wong ◽  
...  

Abstract We investigated the development of early-latency and long-latency brain responses to native and non-native speech to shed light on the neurophysiological underpinnings of perceptual narrowing and early language development. Specifically, we postulated a two-level process to explain the decrease in sensitivity to non-native phonemes towards the end of infancy. Neurons at the earlier stages of the ascending auditory pathway mature rapidly during infancy facilitating the encoding of both native and non-native sounds. This growth enables neurons at the later stages of the auditory pathway to assign phonological status to speech according to the infant’s native language environment. To test this hypothesis, we collected early- latency and long-latency neural responses to native and non-native lexical tones from 85 Cantoneselearning children aged between 23 days and 24 months and 16 days. As expected, a broad range of presumably subcortical early-latency neural encoding measures grew rapidly and substantially during the first two years for both native and non-native tones. By contrast, longlatency cortical electrophysiological changes occurred on a much slower scale and showed sensitivity to nativeness at around six months. Our study provided a comprehensive understanding of early language development by revealing the complementary roles of earlier and later stages of speech processing in the developing brain.


2021 ◽  
Author(s):  
Vibha Viswanathan ◽  
Hari M Bharadwaj ◽  
Barbara G Shinn-Cunningham ◽  
Michael G Heinz

A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography (EEG) with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain SNR in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech intelligibility models and technologies attempting to improve real-world speech communication.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Melissa J Polonenko ◽  
Ross K Maddox

Speech processing is built upon encoding by the auditory nerve and brainstem, yet we know very little about how these processes unfold in specific subcortical structures. These structures are deep and respond quickly, making them difficult to study during ongoing speech. Recent techniques begin to address this problem, but yield temporally broad responses with consequently ambiguous neural origins. Here we describe a method that pairs re-synthesized 'peaky' speech with deconvolution analysis of EEG recordings. We show that in adults with normal hearing, the method quickly yields robust responses whose component waves reflect activity from distinct subcortical structures spanning auditory nerve to rostral brainstem. We further demonstrate the versatility of peaky speech by simultaneously measuring bilateral and ear-specific responses across different frequency bands, and discuss important practical considerations such as talker choice. The peaky speech method holds promise as a tool for investigating speech encoding and processing, and for clinical applications.


2021 ◽  
Author(s):  
Sung-Joo Lim ◽  
Yaminah D. Carter ◽  
J. Michelle Njoroge ◽  
Barbara G. Shinn-Cunningham ◽  
Tyler K. Perrachione

AbstractSpeech is processed less efficiently from discontinuous, mixed talkers than one consistent talker, but little is known about the neural mechanisms for processing talker variability. Here, we measured psychophysiological responses to talker variability using electroencephalography (EEG) and pupillometry while listeners performed a delayed recall of digit span task. Listeners heard and recalled seven-digit sequences with both talker (single- vs. mixed-talker digits) and temporal (0- vs. 500-ms inter-digit intervals) discontinuities. Talker discontinuity reduced serial recall accuracy. Both talker and temporal discontinuities elicited P3a-like neural evoked response, while rapid processing of mixed-talkers’ speech led to increased phasic pupil dilation. Furthermore, mixed-talkers’ speech produced less alpha oscillatory power during working memory maintenance, but not during speech encoding. Overall, these results are consistent with an auditory attention and streaming framework in which talker discontinuity leads to involuntary, stimulus-driven attentional reorientation to novel speech sources, resulting in the processing interference classically associated with talker variability.


2021 ◽  
Vol 14 ◽  
Author(s):  
Heather R. Dial ◽  
G. Nike Gnanateja ◽  
Rachel S. Tessmer ◽  
Maria Luisa Gorno-Tempini ◽  
Bharath Chandrasekaran ◽  
...  

Logopenic variant primary progressive aphasia (lvPPA) is a neurodegenerative language disorder primarily characterized by impaired phonological processing. Sentence repetition and comprehension deficits are observed in lvPPA and linked to impaired phonological working memory, but recent evidence also implicates impaired speech perception. Currently, neural encoding of the speech envelope, which forms the scaffolding for perception, is not clearly understood in lvPPA. We leveraged recent analytical advances in electrophysiology to examine speech envelope encoding in lvPPA. We assessed cortical tracking of the speech envelope and in-task comprehension of two spoken narratives in individuals with lvPPA (n = 10) and age-matched (n = 10) controls. Despite markedly reduced narrative comprehension relative to controls, individuals with lvPPA had increased cortical tracking of the speech envelope in theta oscillations, which track low-level features (e.g., syllables), but not delta oscillations, which track speech units that unfold across a longer time scale (e.g., words, phrases, prosody). This neural signature was highly correlated across narratives. Results indicate an increased reliance on acoustic cues during speech encoding. This may reflect inefficient encoding of bottom-up speech cues, likely as a consequence of dysfunctional temporoparietal cortex.


2021 ◽  
Vol 25 ◽  
pp. 233121652110132
Author(s):  
Frauke Kraus ◽  
Sarah Tune ◽  
Anna Ruhe ◽  
Jonas Obleser ◽  
Malte Wöstmann

Hearing loss is often asymmetric such that hearing thresholds differ substantially between the two ears. The extreme case of such asymmetric hearing is single-sided deafness. A unilateral cochlear implant (CI) on the more severely impaired ear is an effective treatment to restore hearing. The interactive effects of unilateral acoustic degradation and spatial attention to one sound source in multitalker situations are at present unclear. Here, we simulated some features of listening with a unilateral CI in young, normal-hearing listeners ( N =  22) who were presented with 8-band noise-vocoded speech to one ear and intact speech to the other ear. Neural responses were recorded in the electroencephalogram to obtain the spectrotemporal response function to speech. Listeners made more mistakes when answering questions about vocoded (vs. intact) attended speech. At the neural level, we asked how unilateral acoustic degradation would impact the attention-induced amplification of tracking target versus distracting speech. Interestingly, unilateral degradation did not per se reduce the attention-induced amplification but instead delayed it in time: Speech encoding accuracy, modelled on the basis of the spectrotemporal response function, was significantly enhanced for attended versus ignored intact speech at earlier neural response latencies (<∼250 ms). This attentional enhancement was not absent but delayed for vocoded speech. These findings suggest that attentional selection of unilateral, degraded speech is feasible but induces delayed neural separation of competing speech, which might explain listening challenges experienced by unilateral CI users.


2020 ◽  
Author(s):  
Frauke Kraus ◽  
Sarah Tune ◽  
Anna Ruhe ◽  
Jonas Obleser ◽  
Malte Wöstmann

AbstractHearing loss is often asymmetric, such that hearing thresholds differ substantially between the two ears. The extreme case of such asymmetric hearing is single-sided deafness. A unilateral cochlear implant (CI) on the more severely impaired ear is an effective treatment to restore hearing. The neuro-cognitive cost of listening with a unilateral CI in multi-talker situations is at present unclear. Here, we simulated listening with a unilateral CI in young, normal-hearing listeners (N = 22) who were presented with 8-band noise-vocoded speech to one ear and intact speech to the other ear. Neural responses were recorded in the electroencephalogram (EEG) to obtain the spectro-temporal response function (sTRF) to speech. Listeners made more mistakes when answering questions about vocoded (versus intact) attended speech, indicating the behavioural cost of attending to spectrally degraded speech. At the neural level, we asked how unilateral acoustic degradation would impact the attention-induced amplification of tracking target versus distracting speech. Interestingly, unilateral degradation did not per se reduce the attention-induced amplification but instead delayed in time: Speech encoding accuracy, modelled on the basis of the sTRF, was significantly enhanced for attended versus ignored intact speech at earlier neural response latencies (<~250 ms). Notably, this attentional enhancement was not absent but delayed for vocoded speech. These findings suggest that attentional selection of unilateral, degraded speech is feasible but comes at the cost of delayed neural separation of competing speech, which might explain listening challenges experienced by unilateral CI users.


Sign in / Sign up

Export Citation Format

Share Document