scholarly journals Signal envelope and speech intelligibility differentially impact auditory motion perception

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Michaela Warnecke ◽  
Ruth Y. Litovsky

AbstractOur acoustic environment contains a plethora of complex sounds that are often in motion. To gauge approaching danger and communicate effectively, listeners need to localize and identify sounds, which includes determining sound motion. This study addresses which acoustic cues impact listeners’ ability to determine sound motion. Signal envelope (ENV) cues are implicated in both sound motion tracking and stimulus intelligibility, suggesting that these processes could be competing for sound processing resources. We created auditory chimaera from speech and noise stimuli and varied the number of frequency bands, effectively manipulating speech intelligibility. Normal-hearing adults were presented with stationary or moving chimaeras and reported perceived sound motion and content. Results show that sensitivity to sound motion is not affected by speech intelligibility, but shows a clear difference for original noise and speech stimuli. Further, acoustic chimaera with speech-like ENVs which had intelligible content induced a strong bias in listeners to report sounds as stationary. Increasing stimulus intelligibility systematically increased that bias and removing intelligible content reduced it, suggesting that sound content may be prioritized over sound motion. These findings suggest that sound motion processing in the auditory system can be biased by acoustic parameters related to speech intelligibility.

2015 ◽  
Vol 27 (3) ◽  
pp. 533-545 ◽  
Author(s):  
Rebecca E. Millman ◽  
Sam R. Johnson ◽  
Garreth Prendergast

The temporal envelope of speech is important for speech intelligibility. Entrainment of cortical oscillations to the speech temporal envelope is a putative mechanism underlying speech intelligibility. Here we used magnetoencephalography (MEG) to test the hypothesis that phase-locking to the speech temporal envelope is enhanced for intelligible compared with unintelligible speech sentences. Perceptual “pop-out” was used to change the percept of physically identical tone-vocoded speech sentences from unintelligible to intelligible. The use of pop-out dissociates changes in phase-locking to the speech temporal envelope arising from acoustical differences between un/intelligible speech from changes in speech intelligibility itself. Novel and bespoke whole-head beamforming analyses, based on significant cross-correlation between the temporal envelopes of the speech stimuli and phase-locked neural activity, were used to localize neural sources that track the speech temporal envelope of both intelligible and unintelligible speech. Location-of-interest analyses were carried out in a priori defined locations to measure the representation of the speech temporal envelope for both un/intelligible speech in both the time domain (cross-correlation) and frequency domain (coherence). Whole-brain beamforming analyses identified neural sources phase-locked to the temporal envelopes of both unintelligible and intelligible speech sentences. Crucially there was no difference in phase-locking to the temporal envelope of speech in the pop-out condition in either the whole-brain or location-of-interest analyses, demonstrating that phase-locking to the speech temporal envelope is not enhanced by linguistic information.


2016 ◽  
Vol 25 (4) ◽  
pp. 561-575 ◽  
Author(s):  
Paul M. Evitts ◽  
Heather Starmer ◽  
Kristine Teets ◽  
Christen Montgomery ◽  
Lauren Calhoun ◽  
...  

Purpose There is currently minimal information on the impact of dysphonia secondary to phonotrauma on listeners. Considering the high incidence of voice disorders with professional voice users, it is important to understand the impact of a dysphonic voice on their audiences. Methods Ninety-one healthy listeners (39 men, 52 women; mean age = 23.62 years) were presented with speech stimuli from 5 healthy speakers and 5 speakers diagnosed with dysphonia secondary to phonotrauma. Dependent variables included processing speed (reaction time [RT] ratio), speech intelligibility, and listener comprehension. Voice quality ratings were also obtained for all speakers by 3 expert listeners. Results Statistical results showed significant differences between RT ratio and number of speech intelligibility errors between healthy and dysphonic voices. There was not a significant difference in listener comprehension errors. Multiple regression analyses showed that voice quality ratings from the Consensus Assessment Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were able to predict RT ratio and speech intelligibility but not listener comprehension. Conclusions Results of the study suggest that although listeners require more time to process and have more intelligibility errors when presented with speech stimuli from speakers with dysphonia secondary to phonotrauma, listener comprehension may not be affected.


2019 ◽  
Author(s):  
Steven Losorelli ◽  
Blair Kaneshiro ◽  
Gabriella A. Musacchia ◽  
Nikolas H. Blevins ◽  
Matthew B. Fitzgerald

AbstractThe ability to differentiate complex sounds is essential for communication. Here, we propose using a machine-learning approach, called classification, to objectively evaluate auditory perception. In this study, we recorded frequency following responses (FFRs) from 13 normal-hearing adult participants to six short music and speech stimuli sharing similar fundamental frequencies but varying in overall spectral and temporal characteristics. Each participant completed a perceptual identification test using the same stimuli. We used linear discriminant analysis to classify FFRs. Results showed statistically significant FFR classification accuracies using both the full response epoch in the time domain (72.3% accuracy, p < 0.001) as well as real and imaginary Fourier coefficients up to 1 kHz (74.6%, p < 0.001). We classified decomposed versions of the responses in order to examine which response features contributed to successful decoding. Classifier accuracies using Fourier magnitude and phase alone in the same frequency range were lower but still significant (58.2% and 41.3% respectively, p < 0.001). Classification of overlapping 20-msec subsets of the FFR in the time domain similarly produced reduced but significant accuracies (42.3%–62.8%, p < 0.001). Participants’ mean perceptual responses were most accurate (90.6%, p < 0.001). Confusion matrices from FFR classifications and perceptual responses were converted to distance matrices and visualized as dendrograms. FFR classifications and perceptual responses demonstrate similar patterns of confusion across the stimuli. Our results demonstrate that classification can differentiate auditory stimuli from FFR responses with high accuracy. Moreover, the reduced accuracies obtained when the FFR is decomposed in the time and frequency domains suggest that different response features contribute complementary information, similar to how the human auditory system is thought to rely on both timing and frequency information to accurately process sound. Taken together, these results suggest that FFR classification is a promising approach for objective assessment of auditory perception.


2016 ◽  
Vol 37 ◽  
pp. 1-10 ◽  
Author(s):  
Renee P. Clapham ◽  
Jean-Pierre Martens ◽  
Rob J.J.H. van Son ◽  
Frans J.M. Hilgers ◽  
Michiel M.W. van den Brekel ◽  
...  

2019 ◽  
Author(s):  
Arijit Chakraborty ◽  
Tiffany T. Tran ◽  
Andrew E. Silva ◽  
Deborah Giaschi ◽  
Benjamin Thompson

AbstractAttentive motion tracking deficits, measured using multiple object tracking (MOT) tasks, have been identified in a number of visual and neurodevelopmental disorders such as amblyopia and autism. These deficits are often attributed to the abnormal development of high-level attentional networks. However, neuroimaging evidence from amblyopia suggests that reduced MOT performance can be explained by impaired function in motion sensitive area MT+ alone. To test the hypothesis that MT+ plays an important role in MOT, we assessed whether modulation of MT+ activity using continuous theta burst stimulation (cTBS) influenced MOT performance in participants with normal vision. An additional experiment involving numerosity judgements of MOT stimulus elements was conducted to control for non-specific effects of MT+ cTBS on psychophysical task performance. The MOT stimulus consisted of 4 target and 4 distractor dots and was presented at 10° eccentricity in the right or left hemifield. Functional MRI-guided cTBS was applied to left MT+. Participants (n = 13, age:27 ± 3) attended separate active and sham cTBS sessions where the MOT task was completed before, 5 mins post and 30 mins post cTBS. Active cTBS significantly impaired MOT task accuracy relative to baseline for the right (stimulated) hemifield 5 mins (10 ± 2% reduction; t12 = 1.95, p = 0.03) and 30 mins (14 ± 3% reduction; t12 = 2.96, p = 0.01) post stimulation. No impairment occurred within the left (control) hemifield after active cTBS or for either hemifield after sham cTBS. Numerosity task performance was unaffected by cTBS. These results highlight the importance of lower-level motion processing for MOT and suggest that abnormal function of MT+ alone is sufficient to cause a deficit in MOT task performance.


2016 ◽  
Vol 16 (12) ◽  
pp. 182 ◽  
Author(s):  
Devon Greer ◽  
Sung Jun Joo ◽  
Lawrence Cormack ◽  
Alexander Huk

2018 ◽  
Vol 4 (1) ◽  
pp. 501-523 ◽  
Author(s):  
Shin'ya Nishida ◽  
Takahiro Kawabe ◽  
Masataka Sawayama ◽  
Taiki Fukiage

Visual motion processing can be conceptually divided into two levels. In the lower level, local motion signals are detected by spatiotemporal-frequency-selective sensors and then integrated into a motion vector flow. Although the model based on V1-MT physiology provides a good computational framework for this level of processing, it needs to be updated to fully explain psychophysical findings about motion perception, including complex motion signal interactions in the spatiotemporal-frequency and space domains. In the higher level, the velocity map is interpreted. Although there are many motion interpretation processes, we highlight the recent progress in research on the perception of material (e.g., specular reflection, liquid viscosity) and on animacy perception. We then consider possible linking mechanisms of the two levels and propose intrinsic flow decomposition as the key problem. To provide insights into computational mechanisms of motion perception, in addition to psychophysics and neurosciences, we review machine vision studies seeking to solve similar problems.


2018 ◽  
Author(s):  
Jonas Vanthornhout ◽  
Lien Decruy ◽  
Jan Wouters ◽  
Jonathan Z. Simon ◽  
Tom Francart

AbstractSpeech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation and memory. Very often electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.


2021 ◽  
Vol 2069 (1) ◽  
pp. 012165
Author(s):  
G Minelli ◽  
G E Puglisi ◽  
A Astolfi ◽  
C Hauth ◽  
A Warzybok

Abstract Since the fundamental phases of the learning process take place in elementary classrooms, it is necessary to guarantee a proper acoustic environment for the listening activity to children immersed in them. In this framework, speech intelligibility is especially important. In order to better understand and objectively quantify the effect of background noise and reverberation on speech intelligibility various models have been developed. Here, a binaural speech intelligibility model (BSIM) is investigated for speech intelligibility predictions in a real classroom considering the effect of talker-to-listener distance and binaural unmasking due to the spatial separation of noise and speech source. BSIM predictions are compared to the well-established room acoustic measures as reverberation time (T30), clarity or definition. Objective acoustical measurements were carried out in one Italian primary school classroom before (T30= 1.43s±0.03 s) and after (T30= 0.45±0.02 s) the acoustical treatment. Speech reception thresholds (SRTs) corresponding to signal-to-noise ratio yielding 80% of speech intelligibility will be obtained through the BSIM simulations using the measured binaural room impulse responses (BRIRs). A focus on the effect of different speech and noise source spatial positions on the SRT values will aim to show the importance of a model able to deal with the binaural aspects of the auditory system. In particular, it will be observed how the position of the noise source influences speech intelligibility when the target speech source lies always in the same position.


Sign in / Sign up

Export Citation Format

Share Document