scholarly journals Building LSTM neural network based speaker identification system

Author(s):  
Laurynas Dovydaitis ◽  
Vytautas Rudžionis
1992 ◽  
Author(s):  
Christopher J. Burke ◽  
Syama P. Chaudhuri ◽  
Gary Dean

Author(s):  
Anny Tandyo ◽  
Martono Martono ◽  
Adi Widyatmoko

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.


Author(s):  
SAWIT KASURIYA ◽  
CHAI WUTIWIWATCHAI ◽  
VARIN ACHARIYAKULPORN ◽  
CHULARAT TANPRASERT

This paper reports a comparative study between a continuous hidden Markov model (CHMM) and an artificial neural network (ANN) on a text dependent, closed set speaker identification (SID) system with Thai language recording in office and telephone environment. Thai isolated digit "0–9" and their concatenation are used as speaking text. Mel frequency cepstral coefficients (MFCC) are selected as the studied features. Two well-known recognition engines, CHMM and ANN, are conducted and compared. The ANN system (multilayer perceptron network with backpropagation learning algorithm) is applied with a special design of input feeding methods in avoiding the distortion from the normalization process. The general Gaussian density distribution HMM is developed for CHMM system. After optimizing some system's parameters by performing some preliminary experiments, CHMM gives the best identification rate at 90.4%, which is slightly better than 90.1% of ANN on digit "5" in office environment. For telephone environment, ANN gives the best identification rate at 88.84% on digit "0" which is higher than 81.1% of CHMM on digit "3". When using 3-concatenated digit, the identification rate of ANN and CHMM achieves 97.3% and 95.7% respectively for office environment, and 92.1% and 96.3% respectively for telephone environment.


The security of systems is a vital issue for any society. Hence, the need for authentication mechanisms that protect the confidentiality of users is important. This paper proposes a speech based security system that is able to identify Arabic speakers by using an Arabic word )شكرا (which means “Thank you”. The pre-processing steps are performed on the speech signals to enhance the signal to noise ratio. Features of speakers are obtained as Mel-Frequency Cepstral Coefficients (MFCC). Moreover, feature selection (FS) and radial basis function neural network (RBFNN) are implemented to classify and identify speakers. The proposed security system gives a 97.5% accuracy rate in its user identification process.


2021 ◽  
Vol 38 (6) ◽  
pp. 1793-1799
Author(s):  
Shivaprasad Satla ◽  
Sadanandam Manchala

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.


2020 ◽  
pp. 65-72
Author(s):  
V. V. Savchenko ◽  
A. V. Savchenko

This paper is devoted to the presence of distortions in a speech signal transmitted over a communication channel to a biometric system during voice-based remote identification. We propose to preliminary correct the frequency spectrum of the received signal based on the pre-distortion principle. Taking into account a priori uncertainty, a new information indicator of speech signal distortions and a method for measuring it in conditions of small samples of observations are proposed. An example of fast practical implementation of the method based on a parametric spectral analysis algorithm is considered. Experimental results of our approach are provided for three different versions of communication channel. It is shown that the usage of the proposed method makes it possible to transform the initially distorted speech signal into compliance on the registered voice template by using acceptable information discrimination criterion. It is demonstrated that our approach may be used in existing biometric systems and technologies of speaker identification.


Sign in / Sign up

Export Citation Format

Share Document