scholarly journals Experimental results for baseline speech recognition performance using input acquired from a linear microphone array

Author(s):  
Harvey F. Silverman ◽  
Stuart E. Kirtman ◽  
John E. Adcock ◽  
Paul C. Meuse
2012 ◽  
Vol 239-240 ◽  
pp. 1100-1103 ◽  
Author(s):  
Jing Yun ◽  
Zhi Qiang Ma ◽  
Yi La Su ◽  
Xiu Lan Xie

Triphone DDBHMM (Duration Distribution Based HMM) is presented as the acoustic model for Mongolian continuous speech recognition, and the Mongolian Acoustic Model is optimized by state-binding. The experiment made a comparison of the triphone DDBHMM, diphone DDBHMM, triphone HMM on HTK platform and analyzed their effects on the accuracy of acoustic layer. The experimental results have showed that Triphone DDBHMM significantly improves the recognition performance of continuous speech recognition in Mongolian.


2020 ◽  
Vol 24 ◽  
pp. 233121652098029
Author(s):  
Allison Trine ◽  
Brian B. Monson

Several studies have demonstrated that extended high frequencies (EHFs; >8 kHz) in speech are not only audible but also have some utility for speech recognition, including for speech-in-speech recognition when maskers are facing away from the listener. However, the contribution of EHF spectral versus temporal information to speech recognition is unknown. Here, we show that access to EHF temporal information improved speech-in-speech recognition relative to speech bandlimited at 8 kHz but that additional access to EHF spectral detail provided an additional small but significant benefit. Results suggest that both EHF spectral structure and the temporal envelope contribute to the observed EHF benefit. Speech recognition performance was quite sensitive to masker head orientation, with a rotation of only 15° providing a highly significant benefit. An exploratory analysis indicated that pure-tone thresholds at EHFs are better predictors of speech recognition performance than low-frequency pure-tone thresholds.


2012 ◽  
Vol 23 (08) ◽  
pp. 577-589 ◽  
Author(s):  
Mary Rudner ◽  
Thomas Lunner ◽  
Thomas Behrens ◽  
Elisabet Sundewall Thorén ◽  
Jerker Rönnberg

Background: Recently there has been interest in using subjective ratings as a measure of perceived effort during speech recognition in noise. Perceived effort may be an indicator of cognitive load. Thus, subjective effort ratings during speech recognition in noise may covary both with signal-to-noise ratio (SNR) and individual cognitive capacity. Purpose: The present study investigated the relation between subjective ratings of the effort involved in listening to speech in noise, speech recognition performance, and individual working memory (WM) capacity in hearing impaired hearing aid users. Research Design: In two experiments, participants with hearing loss rated perceived effort during aided speech perception in noise. Noise type and SNR were manipulated in both experiments, and in the second experiment hearing aid compression release settings were also manipulated. Speech recognition performance was measured along with WM capacity. Study Sample: There were 46 participants in all with bilateral mild to moderate sloping hearing loss. In Experiment 1 there were 16 native Danish speakers (eight women and eight men) with a mean age of 63.5 yr (SD = 12.1) and average pure tone (PT) threshold of 47. 6 dB (SD = 9.8). In Experiment 2 there were 30 native Swedish speakers (19 women and 11 men) with a mean age of 70 yr (SD = 7.8) and average PT threshold of 45.8 dB (SD = 6.6). Data Collection and Analysis: A visual analog scale (VAS) was used for effort rating in both experiments. In Experiment 1, effort was rated at individually adapted SNRs while in Experiment 2 it was rated at fixed SNRs. Speech recognition in noise performance was measured using adaptive procedures in both experiments with Dantale II sentences in Experiment 1 and Hagerman sentences in Experiment 2. WM capacity was measured using a letter-monitoring task in Experiment 1 and the reading span task in Experiment 2. Results: In both experiments, there was a strong and significant relation between rated effort and SNR that was independent of individual WM capacity, whereas the relation between rated effort and noise type seemed to be influenced by individual WM capacity. Experiment 2 showed that hearing aid compression setting influenced rated effort. Conclusions: Subjective ratings of the effort involved in speech recognition in noise reflect SNRs, and individual cognitive capacity seems to influence relative rating of noise type.


2013 ◽  
Vol 333-335 ◽  
pp. 1106-1109
Author(s):  
Wei Wu

Palm vein pattern recognition is one of the newest biometric techniques researched today. This paper proposes project the palm vein image matrix based on independent component analysis directly, then calculates the Euclidean distance of the projection matrix, seeks the nearest distance for classification. The experiment has been done in a self-build palm vein database. Experimental results show that the algorithm of independent component analysis is suitable for palm vein recognition and the recognition performance is practical.


1981 ◽  
Vol 10 (4) ◽  
pp. 239-246 ◽  
Author(s):  
D. D. Dirks ◽  
C. A. Kamm ◽  
J. R. Dubno ◽  
T. M. Velde

Author(s):  
Jiahao Chen ◽  
Ryota Nishimura ◽  
Norihide Kitaoka

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.


2008 ◽  
Vol 19 (02) ◽  
pp. 120-134 ◽  
Author(s):  
Kate Gfeller ◽  
Jacob Oleson ◽  
John F. Knutson ◽  
Patrick Breheny ◽  
Virginia Driscoll ◽  
...  

The research examined whether performance by adult cochlear implant recipients on a variety of recognition and appraisal tests derived from real-world music could be predicted from technological, demographic, and life experience variables, as well as speech recognition scores. A representative sample of 209 adults implanted between 1985 and 2006 participated. Using multiple linear regression models and generalized linear mixed models, sets of optimal predictor variables were selected that effectively predicted performance on a test battery that assessed different aspects of music listening. These analyses established the importance of distinguishing between the accuracy of music perception and the appraisal of musical stimuli when using music listening as an index of implant success. Importantly, neither device type nor processing strategy predicted music perception or music appraisal. Speech recognition performance was not a strong predictor of music perception, and primarily predicted music perception when the test stimuli included lyrics. Additionally, limitations in the utility of speech perception in predicting musical perception and appraisal underscore the utility of music perception as an alternative outcome measure for evaluating implant outcomes. Music listening background, residual hearing (i.e., hearing aid use), cognitive factors, and some demographic factors predicted several indices of perceptual accuracy or appraisal of music. La investigación examinó si el desempeño, por parte de adultos receptores de un implante coclear, sobre una variedad de pruebas de reconocimiento y evaluación derivadas de la música del mundo real, podrían predecirse a partir de variables tecnológicas, demográficas y de experiencias de vida, así como de puntajes de reconocimiento del lenguaje. Participó una muestra representativa de 209 adultos implantados entre 1965 y el 2006. Usando múltiples modelos de regresión lineal y modelos mixtos lineales generalizados, se seleccionaron grupos de variables óptimas de predicción, que pudieran predecir efectivamente el desempeño por medio de una batería de pruebas que permitiera evaluar diferentes aspectos de la apreciación musical. Estos análisis establecieron la importancia de distinguir entre la exactitud en la percepción musical y la evaluación de estímulos musicales cuando se utiliza la apreciación musical como un índice de éxito en la implantación. Importantemente, ningún tipo de dispositivo o estrategia de procesamiento predijo la percepción o la evaluación musical. El desempeño en el reconocimiento del lenguaje no fue un elemento fuerte de predicción, y llegó a predecir primariamente la percepción musical cuando los estímulos de prueba incluyeron las letras. Adicionalmente, las limitaciones en la utilidad de la percepción del lenguaje a la hora de predecir la percepción y la evaluación musical, subrayan la utilidad de la percepción de la música como una medida alternativa de resultado para evaluar la implantación coclear. La música de fondo, la audición residual (p.e., el uso de auxiliares auditivos), los factores cognitivos, y algunos factores demográficos predijeron varios índices de exactitud y evaluación perceptual de la música.


Sign in / Sign up

Export Citation Format

Share Document