Speaker recognition system using neural network

1996 ◽  
Vol 100 (2) ◽  
pp. 692
Author(s):  
Shingo Nishimura
Author(s):  
Mridusmita Sharma ◽  
Rituraj Kaushik ◽  
Kandarpa Kumar Sarma

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.


1996 ◽  
Vol 7 (1) ◽  
pp. 87-99 ◽  
Author(s):  
M. Zaki ◽  
A. Ghalwash ◽  
A. A. Elkouny

2020 ◽  
Author(s):  
Karthika Kuppusamy ◽  
Chandra Eswaran

Abstract With the advent of conversational voice recognition systems growing such as Alexa, SIRI, OK Google, etc., natural language conversational systems including Chatbot and voice recognition systems are in new high and determining the age of a speaker is critical for setting the pertinent context. Age can be inferred from the speech signal by inferring various factors such as physical attributes of voice, linguistic attributes, frequency, speech rate,etc., The proposed research article discusses about extracting the spectral features of speech such as Cepstral Coefficients, Spectral Decrease, Centroid, Flatness, Spectral Entropy, F0DIFF, Jitter and Shimmer as inputs. This would help in classifying speaker age through deep learning techniques. A novel approach is addressed along with the model for implementation using Deep Neural Network and Convolutional Neural Network for classifying the features using three different classifiers which are Gaussian Mixture Model (GMM), Support Vector Machine (SVM) and GMM-SVM. The results obtained from the proposed system would outline the performance in speaker age recognition.


2019 ◽  
Vol 8 (4) ◽  
pp. 9139-9143

Speaker Recognition is the procedure of validating a speaker’s claimed identity using his/her speech characteristics which is unique to each individual. The primary objective of all speech recognition system is a man-machine interface which grants access into the system with the voice characteristics. This will served as a highly secure biometric system where security is the primary concern. The primary aim of this paper is to classify each speaker accurately with MFCC and Back Propagation Neural Network. Scaled conjugate gradient training function is used for back propagation neural network. A small database of 10 people is created from a group of five male and five female uttering the same sentence five times repeatedly. The sentence consists of five different words. The numbers of data set for classification is 22182.The accuracy obtained from the classification is 92.1% with small percentage of 7.9% misclassification which is acceptable good. The tool for simulation is MATLAB.


2020 ◽  
pp. 805-829
Author(s):  
Mridusmita Sharma ◽  
Rituraj Kaushik ◽  
Kandarpa Kumar Sarma

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.


1996 ◽  
Vol 07 (02) ◽  
pp. 203-212 ◽  
Author(s):  
M. ZAKI ◽  
A. GHALWASH ◽  
A.A. ELKOUNY

The main emphasis of this paper is to present an approach for combining supervised and unsupervised neural network models to the issue of speaker recognition. To enhance the overall operation and performance of recognition, the proposed strategy integrates the two techniques, forming one global model called the cascaded model. We first present a simple conventional technique based on the distance measured between a test vector and a reference vector for different speakers in the population. This particular distance metric has the property of weighting down the components in those directions along which the intraspeaker variance is large. The reason for presenting this method is to clarify the discrepancy in performance between the conventional and neural network approach. We then introduce the idea of using unsupervised learning technique, presented by the winner-take-all model, as a means of recognition. Due to several tests that have been conducted and in order to enhance the performance of this model, dealing with noisy patterns, we have preceded it with a supervised learning model—the pattern association model—which acts as a filtration stage. This work includes both the design and implementation of both conventional and neural network approaches to recognize the speakers templates—which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognition. The conclusion indicates that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation in respect of noisy patterns, and higher performance in respect of noise-free patterns.


2019 ◽  
Vol 3 (6) ◽  
pp. 1-6
Author(s):  
Haowei Li

This paper lies in the field of digital signal processing. This is a speech recognition system that identifies the different speakers based on deep learning. The invention consists of the following steps: Firstly, we collect the voice data from different people. Secondly, the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients (MFCC) and is divided into training set and test set randomly. Thirdly, we cut the training set into batches, and put them into the convolutional neural network which consists of convolutional layers, max pooling layers and fully connected layers. After repeatedly adjusting the parameters of the network such as learning rate, dropout rate and decay rate, the model will reach the optimal performance. Finally, the testing set is also cut into batches and put into the trained neural network. The final recognition accuracy rate is 70.23%. In brief, the research can automatically recognize different speakers efficiently.


Sign in / Sign up

Export Citation Format

Share Document