Effects of recognition accuracy and vocabulary size of a speech recognition system on task performance and user acceptance

Many studies have indicted that stress and workload can effect the recognition accuracy of the speech recognition system. This can include noise, vibration, G-force, information overload, vocal quality in noise, vocal quality and psychological stress, concurrent task performance and vocal fatigue. The commercially available speech recognition system has not yet reached the perfect design to recognize natural human speech. The military application of automatic speech recognition systems has been studied in a wide arrangement. Verbex’ Voice Master was recommended in its instruction book as especially suited well for use in a noisy environment. This system was selected as a candidate system for use in cockpits. Before implementing it in the cockpit, its strengths and weaknesses for special utterances need to be tested in a laboratory environment. The purpose of the study was to investigate the effects of noise on recognition accuracy in dual-task performance. The experiment was carried out in a noise-insulated room. The Verbex’ Voice Master speech recognition system was installed into the computer. Eleven male Swedish students were the subjects. Two noise levels were set up with a combination of mental workload and physical workload. The results showed that without noise and mental workload, the recognition accuracy could be as good as 99.4%. With noise and mental workload, the recognition accuracy could be reduced to 95%. The results indicated that noise had significant effects on the computer error while mental workload had significant effects on both subject error and computer error.

Download Full-text

A Research on HMM based Speech Recognition in Spoken English

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666210413122517 ◽

2021 ◽

Vol 14 ◽

Author(s):

Na Wang ◽

Xiaohong Zhang ◽

Ashutosh Sharma

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Recognition System ◽

Learning System ◽

Computer Assisted ◽

Speech Signals ◽

English Learning ◽

Speech Recognition System ◽

Spoken English ◽

High Recognition Accuracy

: The computer assisted speech recognition system enabling voice recognition for understanding the spoken words using sound digitization is extensively being used in the field of education, scientific research, industry, etc. This article unveils the technological perspective of automated speech recognition system in order to realize the spoken English speech recognition system based on MATLAB. A speech recognition technology has been designed and implemented in this work which can collect the speech signals of the spoken English learning system and then filter those speech signals. This paper mainly adopts the preprocessing module for the processing of the raw speech data collected utilizing the MATLAB commands. The method of feature extraction is based on HMM model, codebook generation and template training. The research results show that the recognition accuracy of 98% is achieved by the spoken English speech recognition system studied in this paper. It can be seen that the spoken English speech recognition system based on MATLAB has high recognition accuracy and fast speed. This work addresses the current research issued needed to be tackled in the speech recognition field. This approach is able to provide the technical support and interface for the spoken English learning system.

Download Full-text

Simplified neural network architectures for a hybrid speech recognition system with small vocabulary size

Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) ◽

10.1109/icassp.1998.675464 ◽

2002 ◽

Author(s):

H. Sedarat ◽

R. Khadem ◽

H. Franco

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recognition System ◽

Network Architectures ◽

Speech Recognition System ◽

Vocabulary Size ◽

Neural Network Architectures

Download Full-text

Continuous Speech Recognition System for Kannada Language with Triphone Modelling using HTK

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5394.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 7827-7831

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Gaussian Mixture ◽

Recognition System ◽

Experimental Result ◽

Speech Recognition System ◽

Continuous Speech Recognition ◽

Frequency Scale ◽

Context Dependent ◽

Mel Frequency Cepstral Coefficient

Kannada is the regional language of India spoken in Karnataka. This paper presents development of continuous kannada speech recognition system using monophone modelling and triphone modelling using HTK. Mel Frequency Cepstral Coefficient (MFCC) is used as feature extractor, exploits cepstral and perceptual frequency scale leads good recognition accuracy. Hidden Markov Model is used as classifier. In this paper Gaussian mixture splitting is done that captures the variations of the phones. The paper presents performance of continuous Kannada Automatic Speech Recognition (ASR) system with respect to 2, 4,8,16 and 32 Gaussian mixtures with monophone and context dependent tri-phone modelling. The experimental result shows that good recognition accuracy is achieved for context dependent tri-phone modelling than monophone modelling as the number Gaussian mixture is increased.

Download Full-text

Noise Speech Recognition Based on Compressive Sensing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.82 ◽

2011 ◽

Vol 268-270 ◽

pp. 82-87

Author(s):

Zhi Peng Zhao ◽

Yi Gang Cen ◽

Xiao Fang Chen

Keyword(s):

Speech Recognition ◽

Word Recognition ◽

Compressive Sensing ◽

Recognition Accuracy ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Recognition Method ◽

Isolated Word ◽

Isolated Word Recognition

In this paper, we proposed a new noise speech recognition method based on the compressive sensing theory. Through compressive sensing, our method increases the anti-noise ability of speech recognition system greatly, which leads to the improvement of the recognition accuracy. According to the experiments, our proposed method achieved better recognition performance compared with the traditional isolated word recognition method based on DTW algorithm.

Download Full-text

Keyword Recognition Based on MFCC

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.1729 ◽

2014 ◽

Vol 926-930 ◽

pp. 1729-1732

Author(s):

Sha Yang ◽

Tian Hu ◽

Yun Lu Zhang

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Hidden Markov ◽

Recognition System ◽

Recognition Algorithm ◽

Speech Recognition System ◽

Continuous Speech Recognition ◽

Recognition Time ◽

State 1 ◽

Model Approach

After about 50 years of development, speech recognition technology has been able to achieve large vocabulary, non-specific human continuous speech recognition system. On account of Chinese pronunciation features, we research the small vocabulary, non-specific Chinese speech recognition based on continuous Hidden Markov Model approach. With comparing the datasets of VQ/DTW, VQ/DHMM, CHMM state-1 recognition algorithm and CHMM state-2 recognition algorithm, the results of our experiment show that: (1) CHMM state-2 branch method performs primely in reduction of the recognition time; and (2) the recognition accuracy is improved eventually.

Download Full-text

Using Vector Quantization Technique in Recognition of Gurbani Hymns (Japji Sahib): LBG Algorithm (VQ)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8667.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4708-4713

Keyword(s):

Speech Recognition ◽

Vector Quantization ◽

Automatic Speech Recognition ◽

Recognition Accuracy ◽

Recognition System ◽

Background Music ◽

Speech Recognition System ◽

Continuous Mode ◽

Speech Corpus ◽

Lbg Algorithm

An improved and different variation of Automatic Speech Recognition (ASR) is presented which is based on Vector Quantization (VQ). ASR for different languages and different applications has been introduced so far. In this paper, we have presented a Speech Recognition system to recognize the hymns (paath) of Gurbani (sentences of Japji Sahib) as continuous mode of speech. For this, speech corpus has been generated in which the entire path has been recited by different speakers. The speech mode here can be taken as continuous speech encapsulated with background music and different kinds of additional noises and have been eliminated. The work has been done by using VQ approach of speech recognition and LBG algorithm which design optimal codebooks for the process of recognition. Experimental results are included which show that recognition accuracy for such system was found to be 92.6% and 95.8% for different and same speakers with different and same sentences.

Download Full-text

A KOREAN LARGE VOCABULARY SPEECH RECOGNITION SYSTEM FOR AUTOMATIC TELEPHONE NUMBER QUERY SERVICE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001494000103 ◽

1994 ◽

Vol 08 (01) ◽

pp. 215-232 ◽

Cited By ~ 1

Author(s):

J.M. KOO ◽

H.S. KIM ◽

C.K. UN

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Recognition Algorithm ◽

Speech Recognition System ◽

Telephone Number ◽

Vocabulary Size ◽

Large Vocabulary ◽

Input Sentence ◽

Large Vocabulary Speech Recognition ◽

Time Reduction

In this paper, we introduce a Korean large vocabulary speech recognition system. This system recognizes sentence utterances with a vocabulary size of 1160 words, and is designed for an automatic telephone number query service. The system consists of four subsystems. The first is an acoustic processor recognizing words in an input sentence by a Hidden Markov Model (HMM) based speech recognition algorithm. The second subsystem is a linguistic processor which estimates input sentences from the results of the acoustic processor and determines the following words using syntactic information. The third is a time reduction processor reducing recognition time by limiting the number of candidate words to be computed by the acoustic processor. The time reduction processor uses linguistic information and acoustic information contained in the input sentence. The last subsystem is a speaker adaptation processor which quickly adapts parameters of the speech recognition system to new speakers. This subsystem uses VQ adaptation and HMM parameter adaptation based on spectral mapping. We also present our recent work on improving the performance of the large vocabulary speech recognition system. These works focused on the enhancement of the acoustic processor and the time reduction processor for speaker-independent speech recognition. A new approach for speaker adaptation is also described.

Download Full-text

The Effects of Recognition Accuracy and Vocabulary Size of a Speech Recognition System on Task Performance and User Acceptance

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193128803200405 ◽

1988 ◽

Vol 32 (4) ◽

pp. 232-236 ◽

Cited By ~ 1

Author(s):

Sherry P. Casali ◽

Robert D. Dryden ◽

Beverly H. Williges

Keyword(s):

Speech Recognition ◽

Completion Time ◽

Data Entry ◽

Recognition System ◽

Task Completion ◽

Speech Recognition System ◽

Vocabulary Size ◽

Task Completion Time ◽

Older Subjects ◽

Speech Recognizer

The purpose of the present study was to determine the effects of recognizer accuracy and vocabulary size on system performance of a speech recognition system. Subjects, ranging in age from 20 to 55 years, performed a data entry task using a simulated speech recognizer which simulated three accuracy levels and three levels of available vocabulary. Task completion times and subjective measures of acceptability were recorded. Results indicated that the accuracy level at which the recognizer was performing significantly influenced the task completion time and the user's acceptability ratings. Vocabulary size also significantly affected task completion time, however, its affect on the acceptability ratings was negligible. Older subjects in general required longer times to complete the tasks, however, they consistently rated the speech input systems more favorably than the younger subjects.

Download Full-text