The influence of bigram constraints on word recognition by humans: implications for computer speech recognition

Author(s):  
R.A. Cole ◽  
Yonghong Yan ◽  
T. Bailey
Author(s):  
Jace Wolfe ◽  
Mila Duke ◽  
Sharon Miller ◽  
Erin Schafer ◽  
Christine Jones ◽  
...  

Background: For children with hearing loss, the primary goal of hearing aids is to provide improved access to the auditory environment within the limits of hearing aid technology and the child’s auditory abilities. However, there are limited data examining aided speech recognition at very low (40 dBA) and low (50 dBA) presentation levels. Purpose: Due to the paucity of studies exploring aided speech recognition at low presentation levels for children with hearing loss, the present study aimed to 1) compare aided speech recognition at different presentation levels between groups of children with normal hearing and hearing loss, 2) explore the effects of aided pure tone average (PTA) and aided Speech Intelligibility Index (SII) on aided speech recognition at low presentation levels for children with hearing loss ranging in degree from mild to severe, and 3) evaluate the effect of increasing low-level gain on aided speech recognition of children with hearing loss. Research Design: In phase 1 of this study, a two-group, repeated-measures design was used to evaluate differences in speech recognition. In phase 2 of this study, a single-group, repeated-measures design was used to evaluate the potential benefit of additional low-level hearing aid gain for low-level aided speech recognition of children with hearing loss. Study Sample: The first phase of the study included 27 school-age children with mild to severe sensorineural hearing loss and 12 school-age children with normal hearing. The second phase included eight children with mild to moderate sensorineural hearing loss. Intervention: Prior to the study, children with hearing loss were fitted binaurally with digital hearing aids. Children in the second phase were fitted binaurally with digital study hearing aids and completed a trial period with two different gain settings: 1) gain required to match hearing aid output to prescriptive targets (i.e., primary program), and 2) a 6-dB increase in overall gain for low-level inputs relative to the primary program. In both phases of this study, real-ear verification measures were completed to ensure the hearing aid output matched prescriptive targets. Data Collection and Analysis: Phase 1 included monosyllabic word recognition and syllable-final plural recognition at three presentation levels (40, 50, and 60 dBA). Phase 2 compared speech recognition performance for the same test measures and presentation levels with two differing gain prescriptions. Results and Conclusions: In phase 1 of the study, aided speech recognition was significantly poorer in children with hearing loss at all presentation levels. Higher aided SII in the better ear (55 dB SPL input) was associated with higher CNC word recognition at a 40 dBA presentation level. In phase 2, increasing the hearing aid gain for low-level inputs provided a significant improvement in syllable-final plural recognition at very low-level inputs and resulted in a non-significant trend toward better monosyllabic word recognition at very low presentation levels. Additional research is needed to document the speech recognition difficulties children with hearing aids may experience with low-level speech in the real world as well as the potential benefit or detriment of providing additional low-level hearing aid gain


2021 ◽  
Vol 32 (08) ◽  
pp. 547-554
Author(s):  
Soha N. Garadat ◽  
Ana'am Alkharabsheh ◽  
Nihad A. Almasri ◽  
Abdulrahman Hagr

Abstract Background Speech audiometry materials are widely available in many different languages. However, there are no known standardized materials for the assessment of speech recognition in Arabic-speaking children. Purpose The aim of the study was to develop and validate phonetically balanced and psychometrically equivalent monosyllabic word recognition lists for children through a picture identification task. Research Design A prospective repeated-measure design was used. Monosyllabic words were chosen from children's storybooks and were evaluated for familiarity. The selected words were then divided into four phonetically balanced word lists. The final lists were evaluated for homogeneity and equivalency. Study Sample Ten adults and 32 children with normal hearing sensitivity were recruited. Data Collection and Analyses Lists were presented to adult subjects in 5 dB increment from 0 to 60 dB hearing level. Individual data were then fitted using a sigmoid function from which the 50% threshold, slopes at the 50% points, and slopes at the 20 to 80% points were derived to determine list psychometric properties. Lists were next presented to children in two separate sessions to assess their equivalency, validity, and reliability. Data were subjected to a mixed design analysis of variance. Results No statistically significant difference was found among the word lists. Conclusion This study provided an evidence that the monosyllabic word lists had comparable psychometric characteristics and reliability. This supports that the constructed speech corpus is a valid tool that can be used in assessing speech recognition in Arabic-speaking children.


2012 ◽  
Vol 55 (3) ◽  
pp. 879-891 ◽  
Author(s):  
Stanley A. Gelfand ◽  
Jessica T. Gelfand

Method Complete psychometric functions for phoneme and word recognition scores at 8 signal-to-noise ratios from −15 dB to 20 dB were generated for the first 10, 20, and 25, as well as all 50, three-word presentations of the Tri-Word or Computer Assisted Speech Recognition Assessment (CASRA) Test (Gelfand, 1998) based on the results of 12 normal-hearing young adult participants from the original study. Results The psychometric functions for both phoneme and word scores were very similar and essentially overlapping for all set sizes. Performance on the shortened tests accounted for 98.8% to 99.5% of the full (50-set) test variance with phoneme scoring, and 95.8% to 99.2% of the full test variance with word scoring. Shortening the tests accounted for little if any of the variance in the slopes of the functions. Conclusions The psychometric functions for abbreviated versions of the Tri-Word speech recognition test using 10, 20, and 25 presentation sets were described and are comparable to those of the original 50-presentation approach for both phoneme and word scoring in healthy, normal-hearing, young adult participants.


2011 ◽  
Vol 22 (07) ◽  
pp. 405-423 ◽  
Author(s):  
Richard H. Wilson

Background: Since the 1940s, measures of pure-tone sensitivity and speech recognition in quiet have been vital components of the audiologic evaluation. Although early investigators urged that speech recognition in noise also should be a component of the audiologic evaluation, only recently has this suggestion started to become a reality. This report focuses on the Words-in-Noise (WIN) Test, which evaluates word recognition in multitalker babble at seven signal-to-noise ratios and uses the 50% correct point (in dB SNR) calculated with the Spearman-Kärber equation as the primary metric. The WIN was developed and validated in a series of 12 laboratory studies. The current study examined the effectiveness of the WIN materials for measuring the word-recognition performance of patients in a typical clinical setting. Purpose: To examine the relations among three audiometric measures including pure-tone thresholds, word-recognition performances in quiet, and word-recognition performances in multitalker babble for veterans seeking remediation for their hearing loss. Research Design: Retrospective, descriptive. Study Sample: The participants were 3430 veterans who for the most part were evaluated consecutively in the Audiology Clinic at the VA Medical Center, Mountain Home, Tennessee. The mean age was 62.3 yr (SD = 12.8 yr). Data Collection and Analysis: The data were collected in the course of a 60 min routine audiologic evaluation. A history, otoscopy, and aural-acoustic immittance measures also were included in the clinic protocol but were not evaluated in this report. Results: Overall, the 1000–8000 Hz thresholds were significantly lower (better) in the right ear (RE) than in the left ear (LE). There was a direct relation between age and the pure-tone thresholds, with greater change across age in the high frequencies than in the low frequencies. Notched audiograms at 4000 Hz were observed in at least one ear in 41% of the participants with more unilateral than bilateral notches. Normal pure-tone thresholds (≤20 dB HL) were obtained from 6% of the participants. Maximum performance on the Northwestern University Auditory Test No. 6 (NU-6) in quiet was ≥90% correct by 50% of the participants, with an additional 20% performing at ≥80% correct; the RE performed 1–3% better than the LE. Of the 3291 who completed the WIN on both ears, only 7% exhibited normal performance (50% correct point of ≤6 dB SNR). Overall, WIN performance was significantly better in the RE (mean = 13.3 dB SNR) than in the LE (mean = 13.8 dB SNR). Recognition performance on both the NU-6 and the WIN decreased as a function of both pure-tone hearing loss and age. There was a stronger relation between the high-frequency pure-tone average (1000, 2000, and 4000 Hz) and the WIN than between the pure-tone average (500, 1000, and 2000 Hz) and the WIN. Conclusions: The results on the WIN from both the previous laboratory studies and the current clinical study indicate that the WIN is an appropriate clinic instrument to assess word-recognition performance in background noise. Recognition performance on a speech-in-quiet task does not predict performance on a speech-in-noise task, as the two tasks reflect different domains of auditory function. Experience with the WIN indicates that word-in-noise tasks should be considered the “stress test” for auditory function.


1995 ◽  
Vol 11 (3) ◽  
pp. 165-175 ◽  
Author(s):  
Linda Ferrier ◽  
Howard Shane ◽  
Holly Ballard ◽  
Tyler Carpenter ◽  
Anne Benoit

2021 ◽  
Vol 3 (1) ◽  
pp. 68-83
Author(s):  
Wiqas Ghai ◽  
Navdeep Singh

Punjabi language is a tonal language belonging to an Indo-Aryan language family and has a number of speakers all around the world. Punjabi language has gained acceptability in the media & communication and therefore deserves to have a place in the growing field of automatic speech recognition which has been explored already for a number of other Indian and foreign languages successfully. Some work has been done in the field of isolated word speech recognition for Punjabi language, but only using whole word based acoustic models. A phone based approach has yet to be applied for Punjabi language speech recognition. This paper describes an automatic speech recognizer that recognizes isolated word speech and connected word speech using a triphone based acoustic model on the HTK 3.4.1 speech Engine and compares the performance with acoustic whole word model based ASR system. Word recognition accuracy of isolated word speech was 92.05% for acoustic whole word model based system and 97.14% for acoustic triphone model based system whereas word recognition accuracy of connected word speech was 87.75% for acoustic whole word model based system and 91.62% for acoustic triphone model based system.


2019 ◽  
Vol 28 (3S) ◽  
pp. 742-755 ◽  
Author(s):  
Annalise Fletcher ◽  
Megan McAuliffe ◽  
Sarah Kerr ◽  
Donal Sinex

Purpose This study aims to examine the combined influence of vocabulary knowledge and statistical properties of language on speech recognition in adverse listening conditions. Furthermore, it aims to determine whether any effects identified are more salient at particular levels of signal degradation. Method One hundred three young healthy listeners transcribed phrases presented at 4 different signal-to-noise ratios, which were coded for recognition accuracy. Participants also completed tests of hearing acuity, vocabulary knowledge, nonverbal intelligence, processing speed, and working memory. Results Vocabulary knowledge and working memory demonstrated independent effects on word recognition accuracy when controlling for hearing acuity, nonverbal intelligence, and processing speed. These effects were strongest at the same moderate level of signal degradation. Although listener variables were statistically significant, their effects were subtle in comparison to the influence of word frequency and phonological content. These language-based factors had large effects on word recognition at all signal-to-noise ratios. Discussion Language experience and working memory may have complementary effects on accurate word recognition. However, adequate glimpses of acoustic information appear necessary for speakers to leverage vocabulary knowledge when processing speech in adverse conditions.


Sign in / Sign up

Export Citation Format

Share Document