Featurematching by skpcawithunsupervisedalgorithmand maximum probability in speech recognition

A Speech recognition system requires a combination of various techniques and algorithms, each of which performs a specific task for achieving the main goal of the system. Speech recognition performance can be enhanced by selecting the proper acoustic model. In this work, the feature extraction and matching is done by SKPCA with Unsupervised learning algorithm and maximum probability. SKPCA reduces the data maximization of the model. It represents a sparse solution for KPCA, because the original data can be reduced considering the weights, i.e., the weights show the vectors which most influence the maximization. Unsupervised learning algorithm is implemented to find the suitable representation of the labels and maximum probability is used to maximize thenormalized acoustic likelihood of the most likely state sequences of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach and maximum probability produce the great performance in the speech recognition system.

Download Full-text

A speech recognition system using fast learning algorithm and beta wavelet network

2015 15th International Conference on Intelligent Systems Design and Applications (ISDA) ◽

10.1109/isda.2015.7489241 ◽

2015 ◽

Author(s):

Ridha Ejbali ◽

Olfa Jemai ◽

Mourad Zaied ◽

Chokri Ben Amar

Keyword(s):

Speech Recognition ◽

Learning Algorithm ◽

Recognition System ◽

Speech Recognition System ◽

Wavelet Network ◽

Fast Learning ◽

Beta Wavelet

Download Full-text

Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study

JMIR Medical Informatics ◽

10.2196/18677 ◽

2020 ◽

Vol 8 (6) ◽

pp. e18677

Author(s):

Weifeng Fu

Keyword(s):

Mental Health ◽

Speech Recognition ◽

Psychological Treatment ◽

Recognition System ◽

Training Data ◽

Mental Health Consultation ◽

Speech Recognition System ◽

Parallel Operation ◽

Endpoint Detection ◽

Health Counseling

Background Speech recognition is a technology that enables machines to understand human language. Objective In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system.

Download Full-text

Construction of English spoken language system based on machine learning algorithm and natural language recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179975 ◽

2020 ◽

Vol 39 (4) ◽

pp. 4891-4902

Author(s):

Hongmei Zhu

Keyword(s):

Speech Recognition ◽

Natural Language ◽

Learning Algorithm ◽

Recognition System ◽

Ease Of Use ◽

Speech Recognition System ◽

Language Recognition ◽

Language System ◽

Controlled Natural Language ◽

Modern Computer

English speech recognition system is affected by a variety of interference factors. Associating the algorithm with the support of modern computer technology can increase the model effect of speech recognition system. Based on the study of the current mainstream controlled natural language thesaurus, this paper proposes a controlled natural language vocabulary classification type. Moreover, this paper defines the domain thesaurus according to the WordNet knowledge description framework, and uses WordNet’s synonym, antisense, upper and lower, etc. In this way, the controlled natural language system can use the semantic relationship of WordNet to identify the words of the non-domain thesaurus input by the user and map the non-domain definition words to the words in the domain thesaurus, thereby improving the ease of use of controlled natural language systems. In addition, this paper designed a controlled experiment to analyze the performance of this system. The research results show that the model constructed in this paper has certain significant effects.

Download Full-text

The Use of Word, Phrase and Intent Accuracy as Measures of Connected Speech Recognition Performance

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193129403800429 ◽

1994 ◽

Vol 38 (4) ◽

pp. 325-329

Author(s):

Tim Barry ◽

Tom Solz ◽

John Reising ◽

Dave Williamson

Keyword(s):

Speech Recognition ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Connected Speech

Eleven subjects participated in a study designed to test the accuracy of a newer-generation connected speech recognition system using a 49 word vocabulary likely to be used in an aircraft cockpit environment. The 49 vocabulary words were used to create 392 test phrases. These phrases were divided into three groups: Complex phrases, which contain more than five words, and two groups of Simple phrases, which contain 5 words or less. The simple phrases were divided into Simple Alternate and Simple No-Alternate phrases, depending on whether or not the phrase was the only one in the entire vocabulary capable of carrying out a particular action once recognition occurred. Performance of the recognition system was measured with three accuracy statistics: word accuracy, the most commonly reported statistic in speech recognition research, phrase accuracy, which is gaining popularity in connected speech recognition research, and intent accuracy, which is probably the most relevant statistic that could be reported in research of this type. Significantly different word, phrase, and intent accuracy results were obtained for the three different phrase types.

Download Full-text

Noise Speech Recognition Based on Compressive Sensing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.82 ◽

2011 ◽

Vol 268-270 ◽

pp. 82-87

Author(s):

Zhi Peng Zhao ◽

Yi Gang Cen ◽

Xiao Fang Chen

Keyword(s):

Speech Recognition ◽

Word Recognition ◽

Compressive Sensing ◽

Recognition Accuracy ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Recognition Method ◽

Isolated Word ◽

Isolated Word Recognition

In this paper, we proposed a new noise speech recognition method based on the compressive sensing theory. Through compressive sensing, our method increases the anti-noise ability of speech recognition system greatly, which leads to the improvement of the recognition accuracy. According to the experiments, our proposed method achieved better recognition performance compared with the traditional isolated word recognition method based on DTW algorithm.

Download Full-text

Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study (Preprint)

10.2196/preprints.18677 ◽

2020 ◽

Author(s):

Weifeng Fu

Keyword(s):

Mental Health ◽

Speech Recognition ◽

Psychological Treatment ◽

Recognition System ◽

Training Data ◽

Mental Health Consultation ◽

Speech Recognition System ◽

Parallel Operation ◽

Endpoint Detection ◽

Health Counseling

BACKGROUND Speech recognition is a technology that enables machines to understand human language. OBJECTIVE In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. METHODS A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. RESULTS The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. CONCLUSIONS The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system.

Download Full-text

Individual Aided Speech-Recognition Performance and Predictions of Benefit for Listeners With Impaired Hearing Employing FADE

Trends in Hearing ◽

10.1177/2331216520938929 ◽

2020 ◽

Vol 24 ◽

pp. 233121652093892

Author(s):

Marc R. Schädler ◽

David Hülsmeier ◽

Anna Warzybok ◽

Birger Kollmeier

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Impaired Hearing ◽

Automatic Speech Recognition System ◽

Frequency Dependent ◽

Hearing Thresholds

The benefit in speech-recognition performance due to the compensation of a hearing loss can vary between listeners, even if unaided performance and hearing thresholds are similar. To accurately predict the individual performance benefit due to a specific hearing device, a prediction model is proposed which takes into account hearing thresholds and a frequency-dependent suprathreshold component of impaired hearing. To test the model, the German matrix sentence test was performed in unaided and individually aided conditions in quiet and in noise by 18 listeners with different degrees of hearing loss. The outcomes were predicted by an individualized automatic speech-recognition system where the individualization parameter for the suprathreshold component of hearing loss was inferred from tone-in-noise detection thresholds. The suprathreshold component was implemented as a frequency-dependent multiplicative noise (mimicking level uncertainty) in the feature-extraction stage of the automatic speech-recognition system. Its inclusion improved the root-mean-square prediction error of individual speech-recognition thresholds (SRTs) from 6.3 dB to 4.2 dB and of individual benefits in SRT due to common compensation strategies from 5.1 dB to 3.4 dB. The outcome predictions are highly correlated with both the corresponding observed SRTs ( R2 = .94) and the benefits in SRT ( R2 = .89) and hence might help to better understand—and eventually mitigate—the perceptual consequences of as yet unexplained hearing problems, also discussed in the context of hidden hearing loss.

Download Full-text

A Chinese Small Vocabulary Offline Speech Recognition System Based on Pocketsphinx in Android Platform

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.623.267 ◽

2014 ◽

Vol 623 ◽

pp. 267-273

Author(s):

Xin Fei Liu ◽

Hui Zhou

Keyword(s):

Speech Recognition ◽

Cell Phone ◽

Recognition Performance ◽

Language Model ◽

Recognition System ◽

Speech Recognition System ◽

Development Environment ◽

Online Tool ◽

Android Development ◽

The Voice

This paper describes a Chinese small-vocabulary offline speech recognition system based on PocketSphinx which acoustic models are regenerated by improving the existing models of Sphinx and language model is generated by LMTool online tool. And then build an offline speech recognition system which could run on the Android smartphone in Android development environment in Linux system. The experiment results show that the system used for recognizing the voice commands for cell phone has good recognition performance.

Download Full-text

Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/31/4/5944 ◽

2016 ◽

Vol 31 (4) ◽

pp. 267

Author(s):

Bao Quoc Nguyen ◽

Thang Tat Vu ◽

Mai Chi Luong

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Error Rate ◽

Deep Neural Networks ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Training Method ◽

Continuous Speech Recognition ◽

Word Error Rate

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively.

Download Full-text

Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2126 ◽

2022 ◽

Author(s):

Hosung Park ◽

Changmin Kim ◽

Hyunsoo Son ◽

Soonshin Seo ◽

Ji-Hwan Kim

Keyword(s):

Speech Recognition ◽

Expert Knowledge ◽

Recognition System ◽

Training Data ◽

Experimental Result ◽

Dramatic Improvement ◽

Speech Recognition System ◽

Korean Language ◽

Attention Network ◽

End To End

In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.

Download Full-text