scholarly journals The Effects of Recognition Accuracy and Vocabulary Size of a Speech Recognition System on Task Performance and User Acceptance

1988 ◽  
Vol 32 (4) ◽  
pp. 232-236 ◽  
Author(s):  
Sherry P. Casali ◽  
Robert D. Dryden ◽  
Beverly H. Williges

The purpose of the present study was to determine the effects of recognizer accuracy and vocabulary size on system performance of a speech recognition system. Subjects, ranging in age from 20 to 55 years, performed a data entry task using a simulated speech recognizer which simulated three accuracy levels and three levels of available vocabulary. Task completion times and subjective measures of acceptability were recorded. Results indicated that the accuracy level at which the recognizer was performing significantly influenced the task completion time and the user's acceptability ratings. Vocabulary size also significantly affected task completion time, however, its affect on the acceptability ratings was negligible. Older subjects in general required longer times to complete the tasks, however, they consistently rated the speech input systems more favorably than the younger subjects.

2019 ◽  
Vol 9 (10) ◽  
pp. 2166 ◽  
Author(s):  
Mohamed Tamazin ◽  
Ahmed Gouda ◽  
Mohamed Khedr

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB.


Author(s):  
M. Petroni ◽  
C. Collet ◽  
N. Fumai ◽  
K. Roger ◽  
C. Yien ◽  
...  

Abstract An automatic speech recognition system is being developed for a patient data management system (PDMS) for the pediatric intensive care unit (ICU) at the Montreal Children’s Hospital. Here, fourteen bedside monitors are linked by a local area network to a personal computer for real-time acquisition of vital sign data and the graphical display of trends. The PDMS also allows for the manual input of data, such as fluid balance data, by means of a keyboard and a pointing device. This paper presents a description of the multimodal human-computer interface of the bedside data entry system, focusing on the speech recognition and generation sub-systems and their integration in the OS/2 Presentation Manager environment.


2020 ◽  
Vol 25 (3) ◽  
pp. 93-98
Author(s):  
Kyu-Seok Kim

Real-time voice translation systems receive a speaker s voice and translate their speech into another language. However, the meaning of a whole Korean sentence can be unintentionally changed because Korean words and syllables can be merged or divided by spaces. Therefore, the spaces between the speaker s sentences are occasionally not identified by the speech recognition system, so the translated sentences are sometimes incorrect. This paper presents a methodology to enhance the accuracy of voice translation by adding intentional spaces. An Android application was implemented using Google speech recognizer for Android and Google translator for the Web. The Google speech recognizer app for Android receives the speaker s voice sentences in Korean and shows the text results. Next, the proposed Android application adds spaces when the speaker speaks the dedicated word for the space. Finally, the modified Korean sentences are translated into English by Google translator for the Web. Using this method can enhance interpretation accuracy for translation systems.


Author(s):  
Taylor Shupsky ◽  
Adriana Lyman ◽  
Jibo He ◽  
Maryam Zahabi

Objective The objective of this study was to assess police officers’ performance and workload in using two mobile computer terminal (MCT) configurations under operational and tactical driving conditions. Background Crash reports have identified in-vehicle distraction to be a major cause of law enforcement vehicle crashes. The MCT has been found to be the most frequently used in-vehicle technology and the main source of police in-vehicle distraction. Method Twenty police officers participated in a driving simulator-based assessment of driving behavior, task completion time, and perceived workload with two MCT configurations under operational and tactical levels of driving. Results The findings revealed that using the MCT configuration with speech-based data entry and head-up display location while driving improved driving performance, decreased task completion time, and reduced police officers’ workload as compared to the current MCT configuration used by police departments. Officers had better driving but worse secondary task performance under the operational driving as compared to the tactical driving condition. Conclusion This study provided an empirical support for use of an enhanced MCT configuration in police vehicles to improve police officers’ safety and performance. In addition, the findings emphasize the need for more training to improve officers’ tactical driving skills and multitasking behavior. Application The findings provide guidelines for vehicle manufacturers, MCT developers, and police agencies to improve the design and implementation of MCTs in police vehicles considering input modality and display eccentricity, which are expected to increase officer and civilian safety.


Author(s):  
J.M. KOO ◽  
H.S. KIM ◽  
C.K. UN

In this paper, we introduce a Korean large vocabulary speech recognition system. This system recognizes sentence utterances with a vocabulary size of 1160 words, and is designed for an automatic telephone number query service. The system consists of four subsystems. The first is an acoustic processor recognizing words in an input sentence by a Hidden Markov Model (HMM) based speech recognition algorithm. The second subsystem is a linguistic processor which estimates input sentences from the results of the acoustic processor and determines the following words using syntactic information. The third is a time reduction processor reducing recognition time by limiting the number of candidate words to be computed by the acoustic processor. The time reduction processor uses linguistic information and acoustic information contained in the input sentence. The last subsystem is a speaker adaptation processor which quickly adapts parameters of the speech recognition system to new speakers. This subsystem uses VQ adaptation and HMM parameter adaptation based on spectral mapping. We also present our recent work on improving the performance of the large vocabulary speech recognition system. These works focused on the enhancement of the acoustic processor and the time reduction processor for speaker-independent speech recognition. A new approach for speaker adaptation is also described.


Sign in / Sign up

Export Citation Format

Share Document