Text Independent Amharic Language Speaker Identification in Noisy Environments using Speech Processing Techniques

Author(s):  
Abrham Debasu Mengistu ◽  
Dagnachew Melesew Alemayehu

<p>In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC. </p>

2021 ◽  
Vol 17 (12) ◽  
pp. 151-164
Author(s):  
Abdelouahed Ait Ider ◽  
Said Nouri ◽  
Abdelkrim Maarir

Arabic printed script segmentation and recognition techniques change from font to other i.e. each font has particular properties calligraphic and structural which differ with other. Majority of segmentation system suffer in word or sub word segmentation into characters because they consider one algorithm to segment all kind of Arabic printed font, style and size. The goal of this work is to prepare a system of word or sub word Optical Font Arabic Recognition (OFAR) for different font size and style of Arabic printed script, in order to integrate it in global Arabic Optical Character Recognition (AOCR) to choose preferred and good segmentation algorithm. APTI database was used to extract last ten pixels for each word or sub word to build new database of last 10 pixels for each word; OFAR is based upon this new database and our extraction approach called Pixels Continuity (PC) algorithm in different matrix direction and some histogram statistics to extract 20 features. Three KNN classifiers with K=5 and three different distances using Cityblock, Euclidean and Correlation based upon majority-vote are used to evaluate the system robustness. This classifier is compared in the first time with Back propagation Neural Network and Steerable Pyramid (SP) algorithm to re cognize three font families, then in the second time with Gaussian Mixture Models (GMMs) to recognize font and size. The average recognition results obtained was 99.55% about font and size and 98.17% for font, size and style recognition.


2021 ◽  
Vol 38 (6) ◽  
pp. 1793-1799
Author(s):  
Shivaprasad Satla ◽  
Sadanandam Manchala

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.


2020 ◽  
Vol 27 (1) ◽  
pp. 291-298
Author(s):  
Shoukai Chen ◽  
Yongqiwen Fu ◽  
Lei Guo ◽  
Shifeng Yang ◽  
Yajing Bie

AbstractA data set of cemented sand and gravel (CSG) mix proportion and 28-day compressive strength was established, with outliers determined and removed based on the Boxplot. Then, the distribution law of compressive strength of CSG was analyzed using the skewness kurtosis and single-sample Kolmogorov-Smirnov tests. And with the help of Python software, a model based on Back Propagation neural network was built to predict the compressive strength of CSG according to its mix proportion. The results showed that the compressive strength follows the normal distribution law, the expected value and variance were 5.471 MPa and 3.962 MPa respectively, and the average relative error was 7.16%, indicating the predictability of compressive strength of CSG and its correlation with the mix proportion.


2015 ◽  
Vol 785 ◽  
pp. 14-18 ◽  
Author(s):  
Badar ul Islam ◽  
Zuhairi Baharudin ◽  
Perumal Nallagownden

Although, Back Propagation Neural Network are frequently implemented to forecast short-term electricity load, however, this training algorithm is criticized for its slow and improper convergence and poor generalization. There is a great need to explore the techniques that can overcome the above mentioned limitations to improve the forecast accuracy. In this paper, an improved BP neural network training algorithm is proposed that hybridizes simulated annealing and genetic algorithm (SA-GA). This hybrid approach leads to the integration of powerful local search capability of simulated annealing and near accurate global search performance of genetic algorithm. The proposed technique has shown better results in terms of load forecast accuracy and faster convergence. ISO New England data for the period of five years is employed to develop a case study that validates the efficacy of the proposed technique.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


2012 ◽  
Vol 263-266 ◽  
pp. 2173-2178
Author(s):  
Xin Guang Li ◽  
Min Feng Yao ◽  
Li Rui Jian ◽  
Zhen Jiang Li

A probabilistic neural network (PNN) speech recognition model based on the partition clustering algorithm is proposed in this paper. The most important advantage of PNN is that training is easy and instantaneous. Therefore, PNN is capable of dealing with real time speech recognition. Besides, in order to increase the performance of PNN, the selection of data set is one of the most important issues. In this paper, using the partition clustering algorithm to select data is proposed. The proposed model is tested on two data sets from the field of spoken Arabic numbers, with promising results. The performance of the proposed model is compared to single back propagation neural network and integrated back propagation neural network. The final comparison result shows that the proposed model performs better than the other two neural networks, and has an accuracy rate of 92.41%.


2011 ◽  
Vol 2011 ◽  
pp. 1-8 ◽  
Author(s):  
Phaklen EhKan ◽  
Timothy Allen ◽  
Steven F. Quigley

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.


Sign in / Sign up

Export Citation Format

Share Document