Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

Several studies have demonstrated that extended high frequencies (EHFs; >8 kHz) in speech are not only audible but also have some utility for speech recognition, including for speech-in-speech recognition when maskers are facing away from the listener. However, the contribution of EHF spectral versus temporal information to speech recognition is unknown. Here, we show that access to EHF temporal information improved speech-in-speech recognition relative to speech bandlimited at 8 kHz but that additional access to EHF spectral detail provided an additional small but significant benefit. Results suggest that both EHF spectral structure and the temporal envelope contribute to the observed EHF benefit. Speech recognition performance was quite sensitive to masker head orientation, with a rotation of only 15° providing a highly significant benefit. An exploratory analysis indicated that pure-tone thresholds at EHFs are better predictors of speech recognition performance than low-frequency pure-tone thresholds.

Download Full-text

Working Memory Capacity May Influence Perceived Effort during Aided Speech Recognition in Noise

Journal of the American Academy of Audiology ◽

10.3766/jaaa.23.7.7 ◽

2012 ◽

Vol 23 (08) ◽

pp. 577-589 ◽

Cited By ~ 73

Author(s):

Mary Rudner ◽

Thomas Lunner ◽

Thomas Behrens ◽

Elisabet Sundewall Thorén ◽

Jerker Rönnberg

Keyword(s):

Working Memory ◽

Hearing Loss ◽

Speech Recognition ◽

Hearing Aid ◽

Recognition Performance ◽

Cognitive Capacity ◽

Subjective Ratings ◽

Perceived Effort ◽

Noise Type ◽

Speech Recognition In Noise

Background: Recently there has been interest in using subjective ratings as a measure of perceived effort during speech recognition in noise. Perceived effort may be an indicator of cognitive load. Thus, subjective effort ratings during speech recognition in noise may covary both with signal-to-noise ratio (SNR) and individual cognitive capacity. Purpose: The present study investigated the relation between subjective ratings of the effort involved in listening to speech in noise, speech recognition performance, and individual working memory (WM) capacity in hearing impaired hearing aid users. Research Design: In two experiments, participants with hearing loss rated perceived effort during aided speech perception in noise. Noise type and SNR were manipulated in both experiments, and in the second experiment hearing aid compression release settings were also manipulated. Speech recognition performance was measured along with WM capacity. Study Sample: There were 46 participants in all with bilateral mild to moderate sloping hearing loss. In Experiment 1 there were 16 native Danish speakers (eight women and eight men) with a mean age of 63.5 yr (SD = 12.1) and average pure tone (PT) threshold of 47. 6 dB (SD = 9.8). In Experiment 2 there were 30 native Swedish speakers (19 women and 11 men) with a mean age of 70 yr (SD = 7.8) and average PT threshold of 45.8 dB (SD = 6.6). Data Collection and Analysis: A visual analog scale (VAS) was used for effort rating in both experiments. In Experiment 1, effort was rated at individually adapted SNRs while in Experiment 2 it was rated at fixed SNRs. Speech recognition in noise performance was measured using adaptive procedures in both experiments with Dantale II sentences in Experiment 1 and Hagerman sentences in Experiment 2. WM capacity was measured using a letter-monitoring task in Experiment 1 and the reading span task in Experiment 2. Results: In both experiments, there was a strong and significant relation between rated effort and SNR that was independent of individual WM capacity, whereas the relation between rated effort and noise type seemed to be influenced by individual WM capacity. Experiment 2 showed that hearing aid compression setting influenced rated effort. Conclusions: Subjective ratings of the effort involved in speech recognition in noise reflect SNRs, and individual cognitive capacity seems to influence relative rating of noise type.

Download Full-text

Speech Recognition Performance at Loudness Discomfort Level

Scandinavian Audiology ◽

10.3109/01050398109076187 ◽

1981 ◽

Vol 10 (4) ◽

pp. 239-246 ◽

Cited By ~ 9

Author(s):

D. D. Dirks ◽

C. A. Kamm ◽

J. R. Dubno ◽

T. M. Velde

Keyword(s):

Speech Recognition ◽

Recognition Performance

Download Full-text

End-to-end recognition of streaming Japanese speech using CTC and local attention

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.23 ◽

2020 ◽

Vol 9 ◽

Author(s):

Jiahao Chen ◽

Ryota Nishimura ◽

Norihide Kitaoka

Keyword(s):

Speech Recognition ◽

Recognition Performance ◽

Time Lag ◽

Recognition Algorithm ◽

Streaming Data ◽

Continuous Speech Recognition ◽

Voice Input ◽

Sequence Modeling ◽

End To End ◽

Bidirectional Networks

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.

Download Full-text

Multivariate Predictors of Music Perception and Appraisal by Adult Cochlear Implant Users

Journal of the American Academy of Audiology ◽

10.3766/jaaa.19.2.3 ◽

2008 ◽

Vol 19 (02) ◽

pp. 120-134 ◽

Cited By ~ 63

Author(s):

Kate Gfeller ◽

Jacob Oleson ◽

John F. Knutson ◽

Patrick Breheny ◽

Virginia Driscoll ◽

...

Keyword(s):

Speech Recognition ◽

Cochlear Implant ◽

Music Perception ◽

Life Experience ◽

Recognition Performance ◽

Strong Predictor ◽

Music Listening ◽

Linear Regression Models ◽

Hearing Aid Use ◽

Music Appraisal

The research examined whether performance by adult cochlear implant recipients on a variety of recognition and appraisal tests derived from real-world music could be predicted from technological, demographic, and life experience variables, as well as speech recognition scores. A representative sample of 209 adults implanted between 1985 and 2006 participated. Using multiple linear regression models and generalized linear mixed models, sets of optimal predictor variables were selected that effectively predicted performance on a test battery that assessed different aspects of music listening. These analyses established the importance of distinguishing between the accuracy of music perception and the appraisal of musical stimuli when using music listening as an index of implant success. Importantly, neither device type nor processing strategy predicted music perception or music appraisal. Speech recognition performance was not a strong predictor of music perception, and primarily predicted music perception when the test stimuli included lyrics. Additionally, limitations in the utility of speech perception in predicting musical perception and appraisal underscore the utility of music perception as an alternative outcome measure for evaluating implant outcomes. Music listening background, residual hearing (i.e., hearing aid use), cognitive factors, and some demographic factors predicted several indices of perceptual accuracy or appraisal of music. La investigación examinó si el desempeño, por parte de adultos receptores de un implante coclear, sobre una variedad de pruebas de reconocimiento y evaluación derivadas de la música del mundo real, podrían predecirse a partir de variables tecnológicas, demográficas y de experiencias de vida, así como de puntajes de reconocimiento del lenguaje. Participó una muestra representativa de 209 adultos implantados entre 1965 y el 2006. Usando múltiples modelos de regresión lineal y modelos mixtos lineales generalizados, se seleccionaron grupos de variables óptimas de predicción, que pudieran predecir efectivamente el desempeño por medio de una batería de pruebas que permitiera evaluar diferentes aspectos de la apreciación musical. Estos análisis establecieron la importancia de distinguir entre la exactitud en la percepción musical y la evaluación de estímulos musicales cuando se utiliza la apreciación musical como un índice de éxito en la implantación. Importantemente, ningún tipo de dispositivo o estrategia de procesamiento predijo la percepción o la evaluación musical. El desempeño en el reconocimiento del lenguaje no fue un elemento fuerte de predicción, y llegó a predecir primariamente la percepción musical cuando los estímulos de prueba incluyeron las letras. Adicionalmente, las limitaciones en la utilidad de la percepción del lenguaje a la hora de predecir la percepción y la evaluación musical, subrayan la utilidad de la percepción de la música como una medida alternativa de resultado para evaluar la implantación coclear. La música de fondo, la audición residual (p.e., el uso de auxiliares auditivos), los factores cognitivos, y algunos factores demográficos predijeron varios índices de exactitud y evaluación perceptual de la música.

Download Full-text

Speech recognition performance on a voicemail transcription task

Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) ◽

10.1109/icassp.1998.675414 ◽

2002 ◽

Cited By ~ 4

Author(s):

M. Padmanabhan ◽

E. Eide ◽

B. Ramabhadran ◽

G. Ramaswamy ◽

L.R. Bahl

Keyword(s):

Speech Recognition ◽

Recognition Performance

Download Full-text

An Analysis of Speech Recognition Performance Based Upon Network Layers and Transfer Functions

International Journal of Computer Science Engineering and Applications ◽

10.5121/ijcsea.2011.1302 ◽

2011 ◽

Vol 1 (3) ◽

pp. 11-20 ◽

Cited By ~ 1

Author(s):

Kuldeep Kumar ◽

R. K Aggarwal ◽

Ankita Jain

Keyword(s):

Speech Recognition ◽

Transfer Functions ◽

Recognition Performance ◽

Network Layers

Download Full-text

Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API

Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-030-03748-2_13 ◽

2018 ◽

pp. 109-115

Author(s):

Takashi Kimura ◽

Takashi Nose ◽

Shinji Hirooka ◽

Yuya Chiba ◽

Akinori Ito

Keyword(s):

Speech Recognition ◽

Recognition Performance

Download Full-text

Speech Recognition Performance under Noisy Conditions of Children with Hearing Loss

Clinical and Experimental Otorhinolaryngology ◽

10.3342/ceo.2012.5.s1.s73 ◽

2012 ◽

Vol 5 (Suppl 1) ◽

pp. S73 ◽

Cited By ~ 5

Author(s):

Hui-Mei Yang ◽

Yi-Jung Hsieh ◽

Jiunn-Liang Wu

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Recognition Performance ◽

Noisy Conditions ◽

Children With Hearing Loss

Download Full-text

Cochlear Synaptopathy: A Primary Factor Affecting Speech Recognition Performance in Presbycusis

BioMed Research International ◽

10.1155/2021/6667531 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Zhe Chen ◽

Yanmei Zhang ◽

Junbo Zhang ◽

Rui Zhou ◽

Zhen Zhong ◽

...

Keyword(s):

Speech Recognition ◽

Action Potential ◽

Animal Studies ◽

Signal To Noise Ratio ◽

Recognition Performance ◽

Auditory Pathway ◽

Auditory Brainstem ◽

Auditory Brainstem Responses ◽

Hearing Ability ◽

Summating Potential

The results of recent animal studies have suggested that cochlear synaptopathy may be an important factor involved in presbycusis. Therefore, here, we aimed to examine whether cochlear synaptopathy frequently exists in patients with presbycusis and to describe the effect of cochlear synaptopathy on speech recognition in noise. Based on the medical history and an audiological examination, 94 elderly patients with bilateral, symmetrical, sensorineural hearing loss were diagnosed as presbycusis. An electrocochleogram, auditory brainstem responses, auditory cortical evoked potentials, and speech audiometry were recorded to access the function of the auditory pathway. First, 65 ears with hearing levels of 41-50 dB HL were grouped based on the summating potential/action potential (SP/AP) ratio, and the amplitudes of AP and SP were compared between the two resulting groups. Second, 188 ears were divided into two groups: the normal SP/AP and abnormal SP/AP groups. The speech recognition abilities in the two groups were compared. Finally, the relationship between abnormal electrocochleogram and poor speech recognition (signal-to-noise ratio loss ≥7 dB) was analyzed in 188 ears. The results of the present study showed: (1) a remarkable reduction in the action potential amplitude was observed in patients with abnormal SP/AP ratios; this suggests that cochlear synaptopathy was involved in presbycusis. (2) There was a large proportion of patients with poor speech recognition in the abnormal SP/AP group. Furthermore, a larger number of cases with abnormal SP/AP ratios were confirmed among patients with presbycusis and poor speech recognition. We concluded that cochlear synaptopathy is not uncommon among elderly individuals who have hearing ability deficits, and it may have a more pronounced effect on ears with declining auditory performance in noisy environments.

Download Full-text