scholarly journals Featurematching by skpcawithunsupervisedalgorithmand maximum probability in speech recognition

2011 ◽  
Vol 1 (1) ◽  
pp. 9-13
Author(s):  
Pavithra M ◽  
Chinnasamy G ◽  
Azha Periasamy

A Speech recognition system requires a combination of various techniques and algorithms, each of which performs a specific task for achieving the main goal of the system. Speech recognition performance can be enhanced by selecting the proper acoustic model. In this work, the feature extraction and matching is done by SKPCA with Unsupervised learning algorithm and maximum probability. SKPCA reduces the data maximization of the model. It represents a sparse solution for KPCA, because the original data can be reduced considering the weights, i.e., the weights show the vectors which most influence the maximization. Unsupervised learning algorithm is implemented to find the suitable representation of the labels and maximum probability is used to maximize thenormalized acoustic likelihood of the most likely state sequences of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach and maximum probability produce the great performance in the speech recognition system.

10.2196/18677 ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. e18677
Author(s):  
Weifeng Fu

Background Speech recognition is a technology that enables machines to understand human language. Objective In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system.


2020 ◽  
Vol 39 (4) ◽  
pp. 4891-4902
Author(s):  
Hongmei Zhu

English speech recognition system is affected by a variety of interference factors. Associating the algorithm with the support of modern computer technology can increase the model effect of speech recognition system. Based on the study of the current mainstream controlled natural language thesaurus, this paper proposes a controlled natural language vocabulary classification type. Moreover, this paper defines the domain thesaurus according to the WordNet knowledge description framework, and uses WordNet’s synonym, antisense, upper and lower, etc. In this way, the controlled natural language system can use the semantic relationship of WordNet to identify the words of the non-domain thesaurus input by the user and map the non-domain definition words to the words in the domain thesaurus, thereby improving the ease of use of controlled natural language systems. In addition, this paper designed a controlled experiment to analyze the performance of this system. The research results show that the model constructed in this paper has certain significant effects.


Author(s):  
Tim Barry ◽  
Tom Solz ◽  
John Reising ◽  
Dave Williamson

Eleven subjects participated in a study designed to test the accuracy of a newer-generation connected speech recognition system using a 49 word vocabulary likely to be used in an aircraft cockpit environment. The 49 vocabulary words were used to create 392 test phrases. These phrases were divided into three groups: Complex phrases, which contain more than five words, and two groups of Simple phrases, which contain 5 words or less. The simple phrases were divided into Simple Alternate and Simple No-Alternate phrases, depending on whether or not the phrase was the only one in the entire vocabulary capable of carrying out a particular action once recognition occurred. Performance of the recognition system was measured with three accuracy statistics: word accuracy, the most commonly reported statistic in speech recognition research, phrase accuracy, which is gaining popularity in connected speech recognition research, and intent accuracy, which is probably the most relevant statistic that could be reported in research of this type. Significantly different word, phrase, and intent accuracy results were obtained for the three different phrase types.


2011 ◽  
Vol 268-270 ◽  
pp. 82-87
Author(s):  
Zhi Peng Zhao ◽  
Yi Gang Cen ◽  
Xiao Fang Chen

In this paper, we proposed a new noise speech recognition method based on the compressive sensing theory. Through compressive sensing, our method increases the anti-noise ability of speech recognition system greatly, which leads to the improvement of the recognition accuracy. According to the experiments, our proposed method achieved better recognition performance compared with the traditional isolated word recognition method based on DTW algorithm.


2020 ◽  
Author(s):  
Weifeng Fu

BACKGROUND Speech recognition is a technology that enables machines to understand human language. OBJECTIVE In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. METHODS A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. RESULTS The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. CONCLUSIONS The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system.


2020 ◽  
Vol 24 ◽  
pp. 233121652093892
Author(s):  
Marc R. Schädler ◽  
David Hülsmeier ◽  
Anna Warzybok ◽  
Birger Kollmeier

The benefit in speech-recognition performance due to the compensation of a hearing loss can vary between listeners, even if unaided performance and hearing thresholds are similar. To accurately predict the individual performance benefit due to a specific hearing device, a prediction model is proposed which takes into account hearing thresholds and a frequency-dependent suprathreshold component of impaired hearing. To test the model, the German matrix sentence test was performed in unaided and individually aided conditions in quiet and in noise by 18 listeners with different degrees of hearing loss. The outcomes were predicted by an individualized automatic speech-recognition system where the individualization parameter for the suprathreshold component of hearing loss was inferred from tone-in-noise detection thresholds. The suprathreshold component was implemented as a frequency-dependent multiplicative noise (mimicking level uncertainty) in the feature-extraction stage of the automatic speech-recognition system. Its inclusion improved the root-mean-square prediction error of individual speech-recognition thresholds (SRTs) from 6.3 dB to 4.2 dB and of individual benefits in SRT due to common compensation strategies from 5.1 dB to 3.4 dB. The outcome predictions are highly correlated with both the corresponding observed SRTs ( R2 = .94) and the benefits in SRT ( R2 = .89) and hence might help to better understand—and eventually mitigate—the perceptual consequences of as yet unexplained hearing problems, also discussed in the context of hidden hearing loss.


2014 ◽  
Vol 623 ◽  
pp. 267-273
Author(s):  
Xin Fei Liu ◽  
Hui Zhou

This paper describes a Chinese small-vocabulary offline speech recognition system based on PocketSphinx which acoustic models are regenerated by improving the existing models of Sphinx and language model is generated by LMTool online tool. And then build an offline speech recognition system which could run on the Android smartphone in Android development environment in Linux system. The experiment results show that the system used for recognizing the voice commands for cell phone has good recognition performance.


2016 ◽  
Vol 31 (4) ◽  
pp. 267
Author(s):  
Bao Quoc Nguyen ◽  
Thang Tat Vu ◽  
Mai Chi Luong

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively.


Author(s):  
Hosung Park ◽  
Changmin Kim ◽  
Hyunsoo Son ◽  
Soonshin Seo ◽  
Ji-Hwan Kim

In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.


Sign in / Sign up

Export Citation Format

Share Document