Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

Author(s):  
Manjunath K. E. ◽  
Srinivasa Raghavan K. M. ◽  
K. Sreenivasa Rao ◽  
Dinesh Babu Jayagopi ◽  
V. Ramasubramanian

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

2013 ◽  
Vol 6 (1) ◽  
pp. 266-271
Author(s):  
Anurag Upadhyay ◽  
Chitranjanjit Kaur

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature. 


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Jiang Lin ◽  
Yi Yumei ◽  
Zhang Maosheng ◽  
Chen Defeng ◽  
Wang Chao ◽  
...  

In speaker recognition systems, feature extraction is a challenging task under environment noise conditions. To improve the robustness of the feature, we proposed a multiscale chaotic feature for speaker recognition. We use a multiresolution analysis technique to capture more finer information on different speakers in the frequency domain. Then, we extracted the speech chaotic characteristics based on the nonlinear dynamic model, which helps to improve the discrimination of features. Finally, we use a GMM-UBM model to develop a speaker recognition system. Our experimental results verified its good performance. Under clean speech and noise speech conditions, the ERR value of our method is reduced by 13.94% and 26.5% compared with the state-of-the-art method, respectively.


Author(s):  
S. A. Sakulin ◽  
A. N. Alfimtsev ◽  
D. A. Loktev ◽  
A. O. Kovalenko ◽  
V. V. Devyatkov

Recently, human recognition systems based on deep machine learning, in particular, on the basis of deep neural networks, have become widespread. In this regard, research has become relevant in the field of protection against recognition by such systems. In this article a method of designing a specially selected type of camouflage applied to clothing, which will protect a person both from recognition by a human observer and from a deep neural network recognition system is proposed. This type of camouflage is constructed on the basis of competitive examples that are generated by a deep neural network. The article describes experiments on human protection from recognition by Faster-RCNN (Regional Convolution Neural Networks) Inception V2 and Faster-RCNN ResNet101 systems. However, the implementation of camouflage is considered on a macro level, which assesses the combination of the camouflage and background, and the micro level which analyzes the relationship between the properties of individual regions of the camouflage properties of the adjacent regions, with constraints on their continuity, smoothness, closure, asymmetry. The dependence of camouflage characteristics on the conditions of observation of the object and the environment is also considered: the transparency of the atmosphere, the intensity of pixels of the sky horizon and the background, the level of contrast of the background and the camouflaged object, the distance to the object. As an example of a possible attack, a “black box” attack, which involves preliminary testing of generated adversarial examples on a target recognition system without knowledge of the internal structure of this system, is considered. Results of these experiments showed the high efficiency of the proposed method in the virtual world, when there is access to each pixel of the image supplied to the input systems. In the real world, results are less impressive, which can be explained by the distortion of colors when printing on the fabric, as well as the lack of spatial resolution of this print.


2020 ◽  
Vol 119 ◽  
pp. 12-23
Author(s):  
Kumud Tripathi ◽  
M. Kiran Reddy ◽  
K. Sreenivasa Rao

The specially finished annular element of the person eye which is remotely visible is - iris. This iris recognition is useful to identify the individual. In number of applications the iris recognition system is used. Most of the countries uses biometric system for security purpose such that in airfield boarding, custom clearance, congregation entrance and so on. The Indian government also uses biometric system for identification of citizen in different applications like as in rashan shop, Aadhar project, in different government exam forms and registration dept. etc. The customary iris recognition systems develop near infrared (NIR) sensors to obtain pictures of the iris. But in this method the iris can acquire distance less than 1 meter. In the course of the last several years, there have been different designs to plan and complete iris acknowledgment frameworks which operates at longer distance ranging from 1 meter to 60 meter. Therefore, due to such long range of iris recognition systems and iris acquisition system gives to the best applications to the client. In this paper, an effective technique for iris recognition is present to identify the individual. It uses iris-recognition-at-a-distance (IAAD) system and state-of-the-art design methods to audits the iris recognition system. The primary point of this article is analyzing the criticalness and employments of IAAD systems with respect to human recognition, the review of existing IAAD structures, comparison of different method which are already implemented in literature and improvement of IAAD accuracy along with iris.


2019 ◽  
Vol 22 (1) ◽  
pp. 157-168 ◽  
Author(s):  
K. E. Manjunath ◽  
Dinesh Babu Jayagopi ◽  
K. Sreenivasa Rao ◽  
V. Ramasubramanian

2016 ◽  
Vol 41 (4) ◽  
pp. 669-682 ◽  
Author(s):  
Gábor Gosztolya ◽  
András Beke ◽  
Tilda Neuberger ◽  
László Tóth

Abstract Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.


Author(s):  
V. Jagan Naveen ◽  
K. Krishna Kishore ◽  
P. Rajesh Kumar

In the modern world, human recognition systems play an important role to   improve security by reducing chances of evasion. Human ear is used for person identification .In the Empirical study on research on human ear, 10000 images are taken to find the uniqueness of the ear. Ear based system is one of the few biometric systems which can provides stable characteristics over the age. In this paper, ear images are taken from mathematical analysis of images (AMI) ear data base and the analysis is done on ear pattern recognition based on the Expectation maximization algorithm and k means algorithm.  Pattern of ears affected with different types of noises are recognized based on Principle component analysis (PCA) algorithm.


Sign in / Sign up

Export Citation Format

Share Document