Improvement Comparison of Different Lattice-based Discriminative Training Methods in Chinese-monolingual and Chinese-English-bilingual Speech Recognition

2012 ◽  
Vol 38 (7) ◽  
pp. 1162-1168
Author(s):  
Yan-Min QIAN ◽  
Yu-Xiang SHAN ◽  
Lin-Fang WANG ◽  
Jia LIU
2013 ◽  
Vol 6 (1) ◽  
pp. 266-271
Author(s):  
Anurag Upadhyay ◽  
Chitranjanjit Kaur

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature. 


Sign in / Sign up

Export Citation Format

Share Document