scholarly journals Multilingual and multimode phone recognition system for Indian languages

2020 ◽  
Vol 119 ◽  
pp. 12-23
Author(s):  
Kumud Tripathi ◽  
M. Kiran Reddy ◽  
K. Sreenivasa Rao
Author(s):  
Manjunath K. E. ◽  
Srinivasa Raghavan K. M. ◽  
K. Sreenivasa Rao ◽  
Dinesh Babu Jayagopi ◽  
V. Ramasubramanian

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.


2013 ◽  
Vol 6 (1) ◽  
pp. 266-271
Author(s):  
Anurag Upadhyay ◽  
Chitranjanjit Kaur

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature. 


Author(s):  
Ramasamy M ◽  
Rania Anjum S ◽  
V. R. Shree Harini ◽  
Sreevidya Bharathan Rajalakshmi ◽  
Mr. P Dineshkumar

While most of the Indian industries are in the process of automation, it is a bitter truth that the Indian Postal System is still using manual intervention for its mail sorting and processing. Although for postal automation there are many pieces of work towards street name recognition in non-Indian languages, to the best of our knowledge there is no work on street name recognition in Indian languages. The Automatic Mail Processor (AMP), which we have designed, scans a mail and interprets the imperative fields of the destination address such as the Pin Code, City name, Locality name and the Street name. The interpreted address is subsequently converted into a QR code. The code is reprinted onto the mail which can be read by a low-cost machine. By converting the destination address into a barcode, all of the future sorting processes can be accomplished by using a mechanical machine sorter, which can sort the mails according to the barcode present on them. We used two main approaches to accomplish this task: classifying words directly and character segmentation. For the former, we use Convolutional Neural Network (CNN) with various architectures to train a model that can precisely classify words. We then pass the segmented characters to a R ecurrent Neural Network (RNN) for classification and then reconstruct each word according to the results of classification and segmentation.


offline handwritten character recognition system has been a challenge for Indian scripts, especially for South Indian languages. Huge number of characters of local languages including alphabets, consonants and composite characters make the recognition system more complicated. A good recognition system for subset of Tamil script, a famous South Indian script, is proposed in this work. Variable length feature vector is extracted from the thinned character image. This extracted feature is given to a novel simple classification algorithm which works based on probability. A subset of Tamil script, 20 character classes, is considered for experiment. The samples were taken from HP Labs dataset for Tamil language and a recognition accuracy of 88.15% has been produced.


Sign in / Sign up

Export Citation Format

Share Document