scholarly journals Contribution of Low-Level Acoustic and Higher-Level Lexical-Semantic Cues to Speech Recognition in Noise and Reverberation

2021 ◽  
Vol 7 ◽  
Author(s):  
Anna Warzybok ◽  
Jan Rennies ◽  
Birger Kollmeier

Masking noise and reverberation strongly influence speech intelligibility and decrease listening comfort. To optimize acoustics for ensuring a comfortable environment, it is crucial to understand the respective contribution of bottom-up signal-driven cues and top-down linguistic-semantic cues to speech recognition in noise and reverberation. Since the relevance of these cues differs across speech test materials and training status of the listeners, we investigate the influence of speech material type on speech recognition in noise, reverberation and combinations of noise and reverberation. We also examine the influence of training on the performance for a subset of measurement conditions. Speech recognition is measured with an open-set, everyday Plomp-type sentence test and compared to the recognition scores for a closed-set Matrix-type test consisting of syntactically fixed and semantically unpredictable sentences (c.f. data by Rennies et al., J. Acoust. Soc. America, 2014, 136, 2642–2653). While both tests yield approximately the same recognition threshold in noise in trained normal-hearing listeners, their performance may differ as a result of cognitive factors, i.e., the closed-set test is more sensitive to training effects while the open-set test is more affected by language familiarity. All experimental data were obtained at a fixed signal-to-noise ratio (SNR) and/or reverberation time set to obtain the desired speech transmission index (STI) values of 0.17, 0.30, and 0.43. respectively, thus linking the data to STI predictions as a measure of pure low-level acoustic effects. The results confirm the consistent difference between robustness to reverberation observed in the literature between the matrix type sentences and the Plomp-type sentences, especially for poor and medium speech intelligibility. The robustness of the closed-set matrix type sentences against reverberation disappeared when listeners had no a priori knowledge about the speech material (sentence structure and words used), thus demonstrating the influence of higher-level lexical-semantic cues in speech recognition. In addition, the consistent difference between reverberation- and noise-induced recognition scores of everyday sentences for medium and high STI conditions and the differences between Matrix-type and Plomp-type sentence scores clearly demonstrate the limited utility of the STI in predicting speech recognition in noise and reverberation.

1991 ◽  
Vol 34 (5) ◽  
pp. 1180-1184 ◽  
Author(s):  
Larry E. Humes ◽  
Kathleen J. Nelson ◽  
David B. Pisoni

The Modified Rhyme Test (MRT), recorded using natural speech and two forms of synthetic speech, DECtalk and Votrax, was used to measure both open-set and closed-set speech-recognition performance. Performance of hearing-impaired elderly listeners was compared to two groups of young normal-hearing adults, one listening in quiet, and the other listening in a background of spectrally shaped noise designed to simulate the peripheral hearing loss of the elderly. Votrax synthetic speech yielded significant decrements in speech recognition compared to either natural or DECtalk synthetic speech for all three subject groups. There were no differences in performance between natural speech and DECtalk speech for the elderly hearing-impaired listeners or the young listeners with simulated hearing loss. The normal-hearing young adults listening in quiet out-performed both of the other groups, but there were no differences in performance between the young listeners with simulated hearing loss and the elderly hearing-impaired listeners. When the closed-set identification of synthetic speech was compared to its open-set recognition, the hearing-impaired elderly gained as much from the reduction in stimulus/response uncertainty as the two younger groups. Finally, among the elderly hearing-impaired listeners, speech-recognition performance was correlated negatively with hearing sensitivity, but scores were correlated positively among the different talker conditions. Those listeners with the greatest hearing loss had the most difficulty understanding speech and those having the most trouble understanding natural speech also had the greatest difficulty with synthetic speech.


2007 ◽  
Vol 44 (2) ◽  
pp. 163-174 ◽  
Author(s):  
Megan Hodge ◽  
Carrie L. Gotzke

Objective: This study describes a preliminary evaluation of the construct and concurrent validity of the Speech Intelligibility Probe for Children With Cleft Palate. Design: The study used a prospective between-groups design with convenience samples. Participants: Participants (ages 39 to 82 months) included 5 children with cleft palate and 10 children with typical speech development and no history of craniofacial abnormalities. All children had age-appropriate language skills. Interventions: Each child completed the Speech Intelligibility Probe for Children With Cleft Palate by imitating single words. Each child's word productions were recorded and played back to listeners who completed open-set and closed-set response tasks. Recorded utterances that represented a contiguous 100-word sample of each child's spontaneous speech also were played back to listeners for completion of an open-set word identification task. Main Outcome Measures: Measures reported include group means for (1) intelligibility scores for the open-set Speech Intelligibility Probe for Children With Cleft Palate and spontaneous speech sample conditions, and (2) percentage of phonetic contrasts correct and correct-distorted from the Speech Intelligibility Probe for Children With Cleft Palate closed-set response task. Results: The group of children with cleft palate had significantly lower intelligibility scores, lower percentage of correct phonetic contrasts, and higher percentage of correct distorted items (construct validity). A strong positive correlation (r = .88, p < .01) was found between intelligibility scores from the Speech Intelligibility Probe for Children With Cleft Palate and the spontaneous sample (concurrent validity). Conclusions: The results provide preliminary support for the construct and concurrent validities of the Speech Intelligibility Probe for Children With Cleft Palate as a measure of children's speech intelligibility.


1994 ◽  
Vol 3 (1) ◽  
pp. 19-22 ◽  
Author(s):  
June Antablin McCullough ◽  
Richard H. Wilson ◽  
Jonathan D. Birck ◽  
Linda G. Anderson

Spanish picture-identification materials have been created specifically for presentation in a computer-driven multimedia format. Normative performance data in conventional oral, open-set conditions and in computer-controlled pointing, closed-set conditions are being established for the Spanish Picture-Identification Task and will be forthcoming. The use of auditory/visual materials in the computer-controlled format represents an innovative use of multimedia technology that appears to be uniquely applicable for assessing the growing numbers of multilingual clients. It is reasonable to expect that word lists and corresponding illustrations will be developed soon for many common languages, perhaps stored on CD-ROM and accessed by audiologists through the touch of a button.


2002 ◽  
Vol 116 (S28) ◽  
pp. 47-51 ◽  
Author(s):  
Sunil N. Dutt ◽  
Ann-Louise McDermott ◽  
Stuart P. Burrell ◽  
Huw R. Cooper ◽  
Andrew P. Reid ◽  
...  

The Birmingham bone-anchored hearing aid (BAHA) programme, since its inception in 1988, has fitted more than 300 patients with unilateral bone-anchored hearing aids. Recently, some of the patients who benefited extremely well with unilateral aids applied for bilateral amplification. To date, 15 patients have been fitted with bilateral BAHAs. The benefits of bilateral amplification have been compared to unilateral amplification in 11 of these patients who have used their second BAHA for 12 months or longer. Following a subjective analysis in the form of comprehensive questionnaires, objective testing was undertaken to assess specific issues such as ‘speech recognition in quiet’, ‘speech recognition in noise’ and a modified ‘speech-in-simulated-party-noise’ (Plomp) test.‘Speech in quiet’ testing revealed a 100 per cent score with both unilateral and bilateral BAHAs. With ‘speech in noise’ all 11 patients scored marginally better with bilateral aids compared to best unilateral responses. The modified Plomp test demonstrated that bilateral BAHAs provided maximum flexibility when the origin of noise cannot be controlled as in day-to-day situations. In this small case series the results are positive and are comparable to the experience of the Nijmegen BAHA group.


2021 ◽  
Vol 4 ◽  
Author(s):  
Alireza Goudarzi ◽  
Gemma Moya-Galé

The sophistication of artificial intelligence (AI) technologies has significantly advanced in the past decade. However, the observed unpredictability and variability of AI behavior in noisy signals is still underexplored and represents a challenge when trying to generalize AI behavior to real-life environments, especially for people with a speech disorder, who already experience reduced speech intelligibility. In the context of developing assistive technology for people with Parkinson's disease using automatic speech recognition (ASR), this pilot study reports on the performance of Google Cloud speech-to-text technology with dysarthric and healthy speech in the presence of multi-talker babble noise at different intensity levels. Despite sensitivities and shortcomings, it is possible to control the performance of these systems with current tools in order to measure speech intelligibility in real-life conditions.


2018 ◽  
Vol 23 (1) ◽  
pp. 48-57 ◽  
Author(s):  
Elina Kari ◽  
John L. Go ◽  
Janice Loggins ◽  
Neelmini Emmanuel ◽  
Laurel M. Fisher

Objective: Imaging characteristics and hearing outcomes in children with cochleovestibular or cochleovestibular nerve (CVN) abnormalities. Study Design: Retrospective, critical review. Setting: Tertiary referral academic center. Patients: Twenty-seven children with CVN abnormalities with magnetic resonance (MRI) and/or computed tomography (CT). Study Intervention(s): None. Main Outcome Measure(s): Determine the likely presence or absence of a CNV and auditory stimulation responses. Results: Two of 27 cases had unilateral hearing loss, and all others had bilateral loss. Eleven (46%) were identified with a disability or additional condition. Twenty-two (42%) ears received a cochlear implant (CI) and 9 ears (17%) experienced no apparent benefit from the device. MRI acquisition protocols were suboptimal for identification of the nerve in 22 (42%) ears. A likely CVN absence was associated with a narrow cochlear aperture and internal auditory canal and cochlear malformation. Thirteen (48%) children with an abnormal nerve exhibited normal cochleae on the same side. Hearing data were available for 30 ears, and 25 ears (83%) exhibited hearing with or without an assistive device. One child achieved closed set speech recognition with a hearing aid, another with a CI. One child achieved open set speech recognition with a CI. Conclusions: Current imaging cannot accurately characterize the functional status of the CVN or predict an assistive device benefit. Children who would have otherwise been denied a CI exhibited auditory responses after implantation. A CI should be considered in children with abnormal CVN. Furthermore, imaging acquisition protocols need standardization for clear temporal bone imaging.


2019 ◽  
Author(s):  
Jonathan Henry Venezia ◽  
Robert Sandlin ◽  
Leon Wojno ◽  
Anthony Duc Tran ◽  
Gregory Hickok ◽  
...  

Static and dynamic visual speech cues contribute to audiovisual (AV) speech recognition in noise. Static cues (e.g., “lipreading”) provide complementary information that enables perceivers to ascertain ambiguous acoustic-phonetic content. The role of dynamic cues is less clear, but one suggestion is that temporal covariation between facial motion trajectories and the speech envelope enables perceivers to recover a more robust representation of the time-varying acoustic signal. Modeling studies show this is computationally feasible, though it has not been confirmed experimentally. We conducted two experiments to determine whether AV speech recognition depends on the magnitude of cross-sensory temporal coherence (AVC). In Experiment 1, sentence-keyword recognition in steady-state noise (SSN) was assessed across a range of signal-to-noise ratios (SNRs) for auditory and AV speech. The auditory signal was unprocessed or filtered to remove 3-7 Hz temporal modulations. Filtering severely reduced AVC (magnitude-squared coherence of lip trajectories with cochlear-narrowband speech envelopes), but did not reduce the magnitude of the AV advantage (AV &gt; A; ~ 4 dB). This did not depend on the presence of static cues, manipulated via facial blurring. Experiment 2 assessed AV speech recognition in SSN at a fixed SNR (-10.5 dB) for subsets of Exp. 1 stimuli with naturally high or low AVC. A small effect (~ 5% correct; high-AVC &gt; low-AVC) was observed. A computational model of AV speech intelligibility based on AVC yielded good overall predictions of performance, but over-predicted the differential effects of AVC. These results suggest the role and/or computational characterization of AVC must be re-conceptualized.


Author(s):  
Seong Jun Song ◽  
Hyun Joon Shim ◽  
Chul Ho Park ◽  
Seong Hee Lee ◽  
Sang Won Yoon

2020 ◽  
Vol 9 (11) ◽  
pp. 9353-9360
Author(s):  
G. Selvi ◽  
I. Rajasekaran

This paper deals with the concepts of semi generalized closed sets in strong generalized topological spaces such as $sg^{\star \star}_\mu$-closed set, $sg^{\star \star}_\mu$-open set, $g^{\star \star}_\mu$-closed set, $g^{\star \star}_\mu$-open set and studied some of its basic properties included with $sg^{\star \star}_\mu$-continuous maps, $sg^{\star \star}_\mu$-irresolute maps and $T_\frac{1}{2}$-space in strong generalized topological spaces.


Sign in / Sign up

Export Citation Format

Share Document