Speaker recognition system using neural network

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.

Download Full-text

Design of speaker recognition system based on artificial neural network

10.1117/12.970642 ◽

2012 ◽

Author(s):

Yanhong Chen ◽

Li Wang ◽

Han Lin ◽

Jinlong Li

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Speaker Recognition ◽

Recognition System ◽

Artificial Neural

Download Full-text

CNN: A speaker recognition system using a cascaded neural network

Multidimensional Systems and Signal Processing ◽

10.1007/bf02106109 ◽

1996 ◽

Vol 7 (1) ◽

pp. 87-99 ◽

Cited By ~ 3

Author(s):

M. Zaki ◽

A. Ghalwash ◽

A. A. Elkouny

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Recognition System

Download Full-text

Speaker Recognition System based on Age-related Features using Convolutional and Deep Neural Networks

10.21203/rs.2.23454/v1 ◽

2020 ◽

Author(s):

Karthika Kuppusamy ◽

Chandra Eswaran

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Speech Rate ◽

Voice Recognition ◽

Gaussian Mixture ◽

Recognition System ◽

Support Vector ◽

Age Related ◽

Novel Approach ◽

Recognition Systems

Abstract With the advent of conversational voice recognition systems growing such as Alexa, SIRI, OK Google, etc., natural language conversational systems including Chatbot and voice recognition systems are in new high and determining the age of a speaker is critical for setting the pertinent context. Age can be inferred from the speech signal by inferring various factors such as physical attributes of voice, linguistic attributes, frequency, speech rate,etc., The proposed research article discusses about extracting the spectral features of speech such as Cepstral Coefficients, Spectral Decrease, Centroid, Flatness, Spectral Entropy, F0DIFF, Jitter and Shimmer as inputs. This would help in classifying speaker age through deep learning techniques. A novel approach is addressed along with the model for implementation using Deep Neural Network and Convolutional Neural Network for classifying the features using three different classifiers which are Gaussian Mixture Model (GMM), Support Vector Machine (SVM) and GMM-SVM. The results obtained from the proposed system would outline the performance in speaker age recognition.

Download Full-text

Text Dependent Speakers Pattern Classification with Back Propagation Neural Network

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8889.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 9139-9143

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Back Propagation ◽

Recognition System ◽

Back Propagation Neural Network ◽

Primary Objective ◽

Primary Concern ◽

Data Set ◽

Speech Characteristics ◽

Machine Interface

Speaker Recognition is the procedure of validating a speaker’s claimed identity using his/her speech characteristics which is unique to each individual. The primary objective of all speech recognition system is a man-machine interface which grants access into the system with the voice characteristics. This will served as a highly secure biometric system where security is the primary concern. The primary aim of this paper is to classify each speaker accurately with MFCC and Back Propagation Neural Network. Scaled conjugate gradient training function is used for back propagation neural network. A small database of 10 people is created from a group of five male and five female uttering the same sentence five times repeatedly. The sentence consists of five different words. The numbers of data set for classification is 22182.The accuracy obtained from the classification is 92.1% with small percentage of 7.9% misclassification which is acceptable good. The tool for simulation is MATLAB.

Download Full-text

Speaker Recognition With Normal and Telephonic Assamese Speech Using I-Vector and Learning-Based Classifier

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch042 ◽

2020 ◽

pp. 805-829

Author(s):

Mridusmita Sharma ◽

Rituraj Kaushik ◽

Kandarpa Kumar Sarma

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Recognition Rate ◽

Recognition System ◽

Feed Forward Neural Network ◽

Individual Behaviour ◽

Behavioural Characteristics ◽

Regional Languages ◽

Optimal Feature

Speaker recognition is the task of identifying a person by his/her unique identification features or behavioural characteristics that are included in the speech uttered by the person. Speaker recognition deals with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is influenced by one's individual behaviour as well as the characteristics of the vocal cord. The issue becomes more complex when regional languages are considered. Here, the authors report the design of a speaker recognition system using normal and telephonic Assamese speech for their case study. In their work, the authors have implemented i-vectors as features to generate an optimal feature set and have used the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.

Download Full-text

Automatic speaker recognition system using the discrete Hartley transform and an artificial neural network

[1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers ◽

10.1109/acssc.1991.186628 ◽

2002 ◽

Cited By ~ 1

Author(s):

C.N. Gedo ◽

J.C. Eremic

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Speaker Recognition ◽

Recognition System ◽

Discrete Hartley Transform ◽

Hartley Transform ◽

Automatic Speaker Recognition ◽

Artificial Neural

Download Full-text

CNN: A SPEAKER RECOGNITION SYSTEM USING A CASCADED NEURAL NETWORK

International Journal of Neural Systems ◽

10.1142/s0129065796000178 ◽

1996 ◽

Vol 07 (02) ◽

pp. 203-212 ◽

Cited By ~ 5

Author(s):

M. ZAKI ◽

A. GHALWASH ◽

A.A. ELKOUNY

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Network Models ◽

Recognition System ◽

Conventional Technique ◽

Neural Network Models ◽

Neural Network Approach ◽

Unsupervised Neural Network ◽

And Performance ◽

Filtration Stage

The main emphasis of this paper is to present an approach for combining supervised and unsupervised neural network models to the issue of speaker recognition. To enhance the overall operation and performance of recognition, the proposed strategy integrates the two techniques, forming one global model called the cascaded model. We first present a simple conventional technique based on the distance measured between a test vector and a reference vector for different speakers in the population. This particular distance metric has the property of weighting down the components in those directions along which the intraspeaker variance is large. The reason for presenting this method is to clarify the discrepancy in performance between the conventional and neural network approach. We then introduce the idea of using unsupervised learning technique, presented by the winner-take-all model, as a means of recognition. Due to several tests that have been conducted and in order to enhance the performance of this model, dealing with noisy patterns, we have preceded it with a supervised learning model—the pattern association model—which acts as a filtration stage. This work includes both the design and implementation of both conventional and neural network approaches to recognize the speakers templates—which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognition. The conclusion indicates that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation in respect of noisy patterns, and higher performance in respect of noise-free patterns.

Download Full-text

A Speaker Recognition System Based on Deep Learning

Journal of Electronic Research and Application ◽

10.26689/jera.v3i6.1056 ◽

2019 ◽

Vol 3 (6) ◽

pp. 1-6

Author(s):

Haowei Li

Keyword(s):

Neural Network ◽

Deep Learning ◽

Speaker Recognition ◽

Digital Signal ◽

Dropout Rate ◽

Recognition System ◽

Training Set ◽

Mel Frequency Cepstral Coefficients ◽

Trained Neural Network ◽

Voice Data

This paper lies in the field of digital signal processing. This is a speech recognition system that identifies the different speakers based on deep learning. The invention consists of the following steps: Firstly, we collect the voice data from different people. Secondly, the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients (MFCC) and is divided into training set and test set randomly. Thirdly, we cut the training set into batches, and put them into the convolutional neural network which consists of convolutional layers, max pooling layers and fully connected layers. After repeatedly adjusting the parameters of the network such as learning rate, dropout rate and decay rate, the model will reach the optimal performance. Finally, the testing set is also cut into batches and put into the trained neural network. The final recognition accuracy rate is 70.23%. In brief, the research can automatically recognize different speakers efficiently.

Download Full-text