scholarly journals Evaluation of Phonetic System for Speech Recognition on Smartphone

This paper presents detailed study and performance evaluation of phonetic system by comparing it with various classification techniques of automatic speech recognition such as Neural Network, Hidden Markov Model, Support Vector Machine and Gaussian Mixture Model. In the phonetic system, recognized speech is processed by using language processing i.e. matching phonemes and hence generates more correct output text. The accuracy of speech recognition of ASR classifier and phonetic system is evaluated on day to day human to machine communications, using high-quality recording equipment, while the results for enhancement of existing systems is done on everyday android phones, and evaluated for normal conversations in Hindi and English language. Classifier is used to classify the fragmented phonemes or words after the fragmentation of the speech signal. Different classification techniques are implemented and comparing accuracy of speech recognition of different classifier. It is seen that GMM is better at the classification of signal data, outcomes of performance evaluation shows that GMM outperforms the other three classifiers in terms of accuracy by more than 20%. This result is compared with implemented phonetic system which shows that ASR accuracy, using phonetic system is better than GMM. We observed 6% improvement in ASR accuracy with phonetic system.

Author(s):  
D. Wang ◽  
M. Hollaus ◽  
N. Pfeifer

Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI) and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM), Na¨ıve Bayes (NB), Random Forest (RF), and Gaussian Mixture Model (GMM), for separating wood and leaf points from terrestrial laser scanning (TLS) data. Two trees, an <i>Erytrophleum fordii</i> and a <i>Betula pendula</i> (silver birch) are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.


2019 ◽  
Vol 33 (35) ◽  
pp. 1950438 ◽  
Author(s):  
Manish Gupta ◽  
Shambhu Shankar Bharti ◽  
Suneeta Agarwal

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.


Author(s):  
Ergün Yücesoy

In this study, the classification of the speakers according to age and gender was discussed. Age and gender classes were first examined separately, and then by combining these classes a classification with a total of 7 classes was made. Speech signals represented by Mel-Frequency Cepstral Coefficients (MFCC) and delta parameters were converted into Gaussian Mixture Model (GMM) mean supervectors and classified with a Support Vector Machine (SVM). While the GMM mean supervectors were formed according to the Maximum-a-posteriori (MAP) adaptive GMM-Universal Background Model (UBM) configuration, the number of components was changed from 16 to 512, and the optimum number of components was decided. Gender classification accuracy of the system developed using aGender dataset was measured as 99.02% for two classes and 92.58% for three classes and age group classification accuracy was measured as 67.03% for female and 63.79% for male. In the classification of age and gender classes together in one step, an accuracy of 61.46% was obtained. In the study, a two-level approach was proposed for classifying age and gender classes together. According to this approach, the speakers were first divided into three classes as child, male and female, then males and females were classified according to their age groups and thus a 7-class classification was realized. This two-level approach was increased the accuracy of the classification in all other cases except when 32-component GMMs were used. While the highest improvement of 2.45% was achieved with 64 component GMMs, an improvement of 0.79 was achieved with 256 component GMMs.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Manana Khachidze ◽  
Magda Tsintsadze ◽  
Maia Archuadze

According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) andK-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Alex Mathew

The paper reviews how human-centered artificial intelligence and security primitive have influenced life in the modern world and how it’s useful in the future. Human-centered A.I. has enhanced our capabilities by the way of intelligence, human informed technology. It has created a technology that has made machines and computer intelligently carry their function. The security primitive has enhanced the safety of the data and increased accessibility of data from anywhere regardless of the password is known. This has improved personalized customer activities and filled the gap between the human-machine. This has been successful due to the usage of heuristics which solve belowems by experimental, support vector machine which evaluates and group the data, natural language processing systems which change speech to language. The results of this will lead to image recognition, games, speech recognition, translation, and answering questions. In conclusion, human-centered A.I. and security primitives is an advanced mode of technology that uses statistical mathematical models that provides tools to perform certain work. The results keep on advancing and spreading with years and it will be common in our lives.


Sign in / Sign up

Export Citation Format

Share Document