Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Shivaprasad Satla; Sadanandam Manchala

doi:10.18280/ts.380623

Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Traitement du signal ◽

10.18280/ts.380623 ◽

2021 ◽

Vol 38 (6) ◽

pp. 1793-1799

Author(s):

Shivaprasad Satla ◽

Sadanandam Manchala

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Processing ◽

Deep Neural Network ◽

Speaker Identification ◽

Research Work ◽

Gaussian Mixture ◽

Vital Role ◽

Identification System ◽

Mel Frequency Cepstral Coefficients

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.

Download Full-text

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i2.pp782-789 ◽

2020 ◽

Vol 18 (2) ◽

pp. 782

Author(s):

Musab T. S. Al-Kaltakchi ◽

Haithem Abd Al-Raheem Taha ◽

Mohanad Abd Shehab ◽

Mohamed A.M. Abdullah

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Speaker Identification ◽

Gaussian Mixture ◽

Identification Accuracy ◽

Identification System ◽

Good Representation ◽

Mel Frequency Cepstral Coefficients ◽

Normalization Methods ◽

Cepstral Coefficients

In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method.

Download Full-text

FPGA Implementation for GMM-Based Speaker Identification

International Journal of Reconfigurable Computing ◽

10.1155/2011/420369 ◽

2011 ◽

Vol 2011 ◽

pp. 1-8 ◽

Cited By ~ 13

Author(s):

Phaklen EhKan ◽

Timothy Allen ◽

Steven F. Quigley

Keyword(s):

Vocal Tract ◽

Speaker Identification ◽

Hardware Acceleration ◽

Statistical Modelling ◽

Gaussian Mixture ◽

Personal Identification ◽

Identification System ◽

Call Centre ◽

Mel Frequency Cepstral Coefficients ◽

High Level

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

Download Full-text

Arabic Word Dependent Speaker Identification System Using Artificial Neural Network

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.41 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Neural Network ◽

Speaker Identification ◽

Signal To Noise Ratio ◽

Security System ◽

Identification System ◽

Accuracy Rate ◽

Mel Frequency Cepstral Coefficients ◽

Arabic Speakers ◽

Arabic Word ◽

Processing Steps

The security of systems is a vital issue for any society. Hence, the need for authentication mechanisms that protect the confidentiality of users is important. This paper proposes a speech based security system that is able to identify Arabic speakers by using an Arabic word )شكرا (which means “Thank you”. The pre-processing steps are performed on the speech signals to enhance the signal to noise ratio. Features of speakers are obtained as Mel-Frequency Cepstral Coefficients (MFCC). Moreover, feature selection (FS) and radial basis function neural network (RBFNN) are implemented to classify and identify speakers. The proposed security system gives a 97.5% accuracy rate in its user identification process.

Download Full-text

Speaker identification analysis for SGMM with k-means and fuzzy C-means clustering using SVM statistical technique

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-210073 ◽

2021 ◽

Vol 25 (3) ◽

pp. 309-314

Author(s):

K. Manikandan ◽

E. Chandra

Keyword(s):

Speaker Identification ◽

Gaussian Mixture ◽

Statistical Technique ◽

Identification System ◽

Mel Frequency Cepstral Coefficients ◽

Fuzzy C Means ◽

Memory Footprint ◽

Fuzzy C Means Clustering ◽

Input Model ◽

Identification Analysis

Speaker Identification denotes the speech samples of known speaker and it identifies the best matches of the input model. The SGMFC method is the combination of Sub Gaussian Mixture Model (SGMM) with the Mel-frequency Cepstral Coefficients (MFCC) for feature extraction. The SGMFC method minimizes the error rate, memory footprint and also computational throughput measure needs of a medium-vocabulary speaker identification system, supposed for preparation on a transportable or otherwise. Fuzzy C-means and k-means clustering are used in the SGMM method to attain the improved efficiency and their outcomes with parameters such as precision, sensitivity and specificity are compared.

Download Full-text

Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments

Neural Computing and Applications ◽

10.1007/s00521-018-3760-2 ◽

2018 ◽

Vol 32 (7) ◽

pp. 2575-2587 ◽

Cited By ~ 6

Author(s):

Ismail Shahin ◽

Ali Bou Nassif ◽

Shibani Hamsa

Keyword(s):

Neural Network ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Deep Neural Network ◽

Speaker Identification ◽

Gaussian Mixture ◽

Neural Network Classifier ◽

Emotional Talking Environments

Download Full-text

Text Independent Amharic Language Speaker Identification in Noisy Environments using Speech Processing Techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i1.pp109-114 ◽

2017 ◽

Vol 5 (1) ◽

pp. 109 ◽

Cited By ~ 1

Author(s):

Abrham Debasu Mengistu ◽

Dagnachew Melesew Alemayehu

Keyword(s):

Speech Processing ◽

Speaker Identification ◽

Hybrid Approach ◽

Gaussian Mixture Models ◽

Back Propagation ◽

Gaussian Mixture ◽

Back Propagation Neural Network ◽

Identification System ◽

Noisy Environments ◽

Data Set

In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC.

Download Full-text

Short-Term Load Forecasting of Natural Gas with Deep Neural Network Regression †

Energies ◽

10.3390/en11082008 ◽

2018 ◽

Vol 11 (8) ◽

pp. 2008 ◽

Cited By ~ 24

Author(s):

Gregory Merkel ◽

Richard Povinelli ◽

Ronald Brown

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Natural Gas ◽

Speech Processing ◽

Deep Neural Network ◽

Load Forecasting ◽

Classification Problems ◽

Short Term ◽

Artificial Neural

Deep neural networks are proposed for short-term natural gas load forecasting. Deep learning has proven to be a powerful tool for many classification problems seeing significant use in machine learning fields such as image recognition and speech processing. We provide an overview of natural gas forecasting. Next, the deep learning method, contrastive divergence is explained. We compare our proposed deep neural network method to a linear regression model and a traditional artificial neural network on 62 operating areas, each of which has at least 10 years of data. The proposed deep network outperforms traditional artificial neural networks by 9.83% weighted mean absolute percent error (WMAPE).

Download Full-text

Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops

Nature Communications ◽

10.1038/s41467-021-25427-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Florian Stelzer ◽

André Röhm ◽

Raul Vicente ◽

Ingo Fischer ◽

Serhiy Yanchuk

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Single Neuron ◽

Deep Neural Networks ◽

Back Propagation ◽

Local Network ◽

Multiple Time ◽

Learning Tools ◽

Back Propagation Algorithm

AbstractDeep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron’s dynamics. By adjusting the feedback-modulation within the loops, we adapt the network’s connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks.

Download Full-text

A Hybrid Hidden Markov Model for Pipeline Leakage Detection

Applied Sciences ◽

10.3390/app11073138 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3138

Author(s):

Mingchi Zhang ◽

Xuemin Chen ◽

Wei Li

Keyword(s):

Neural Network ◽

Markov Model ◽

Hidden Markov Model ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Deep Neural Network ◽

Hidden Markov ◽

Gaussian Mixture ◽

State Sequence ◽

Non Linear

In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model (HMM). The hybrid HMM, i.e., DNN-HMM, consists of a deep neural network (DNN) with multiple layers to exploit the non-linear data. The DNN is initialized by using a deep belief network (DBN). The DBN is a pre-trained model built by stacking top-down restricted Boltzmann machines (RBM) that compute the emission probabilities for the HMM instead of Gaussian mixture model (GMM). Two comparative studies based on different numbers of states using Gaussian mixture model-hidden Markov model (GMM-HMM) and DNN-HMM are performed. The accuracy of the testing performance between detected state sequence and actual state sequence is measured by micro F1 score. The micro F1 score approaches 0.94 for GMM-HMM method and it is close to 0.95 for DNN-HMM method when the pipeline is divided into three sections. In the experiment that divides the pipeline as five sections, the micro F1 score for GMM-HMM is 0.69, while it approaches 0.96 with DNN-HMM method. The results demonstrate that the DNN-HMM can learn a better model of non-linear data and achieve better performance compared to GMM-HMM method.

Download Full-text

Deep neural network analysis of nanoparticle ordering to identify defects in layered carbon materials

Chemical Science ◽

10.1039/d0sc05696k ◽

2021 ◽

Author(s):

Daniil A. Boiko ◽

Evgeniy O. Pentsak ◽

Vera A. Cherepanova ◽

Evgeniy G. Gordeev ◽

Valentine P. Ananikov

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Analysis ◽

Carbon Material ◽

Deep Neural Network ◽

Carbon Materials ◽

Material Surface ◽

Neural Network Analysis ◽

Pd Nanoparticle ◽

Sem Imaging

Defectiveness of carbon material surface is a key issue for many applications. Pd-nanoparticle SEM imaging was used to highlight “hidden” defects and analyzed by neural networks to solve order/disorder classification and defect segmentation tasks.

Download Full-text