Deep Neural Network for Speaker Identification Using Static and Dynamic Prosodic Feature for Spontaneous and Dictated Data

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification.

Download Full-text

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-015-0056-7 ◽

2015 ◽

Vol 2015 (1) ◽

Cited By ~ 30

Author(s):

Zhaofeng Zhang ◽

Longbiao Wang ◽

Atsuhiko Kai ◽

Takanori Yamada ◽

Weifeng Li ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Speaker Identification ◽

Denoising Autoencoder

Download Full-text

Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Traitement du signal ◽

10.18280/ts.380623 ◽

2021 ◽

Vol 38 (6) ◽

pp. 1793-1799

Author(s):

Shivaprasad Satla ◽

Sadanandam Manchala

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Processing ◽

Deep Neural Network ◽

Speaker Identification ◽

Research Work ◽

Gaussian Mixture ◽

Vital Role ◽

Identification System ◽

Mel Frequency Cepstral Coefficients

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.

Download Full-text

Study on Human Detection System Using Deep Neural Network and Alternative Learning for Autonomous Flying Drones

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.139.149 ◽

2019 ◽

Vol 139 (2) ◽

pp. 149-157

Author(s):

Itaru Nagayama ◽

Wakaki Uehara ◽

Takaya Miyazato

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Detection System ◽

Human Detection ◽

Alternative Learning

Download Full-text

Estimating the Corneal Thickness for post-operative Laser Eye Surgery using Deep Neural Network

Polytechnic Journal ◽

10.25156/ptj.2018.8.2.128 ◽

2018 ◽

Vol 8 (2) ◽

pp. 80-91

Author(s):

Sumia Jaffer

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Corneal Thickness ◽

Eye Surgery

Download Full-text

Deep Neural Network Classification of I-123 Ioflupane SPECT

10.26226/morressier.5e8335ba7cb08a046ef7c709 ◽

2020 ◽

Author(s):

David T. Wang ◽

Brady Williamson ◽

Thomas Eluvathingal ◽

Bruce Mahoney ◽

Jennifer Scheler

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Classification

Download Full-text

Analysis and Design of Deep Neural Network to Detect Damage of the Four-Floor Building Structure

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/269892020 ◽

2020 ◽

Vol 8 (9) ◽

pp. 6618-6629

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Building Structure ◽

Analysis And Design

Download Full-text

Method of determination of the text direction on the image with the use of convolutional neural network

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-96-99 ◽

2020 ◽

pp. 96-99

Author(s):

P.L. Nikolaev

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Binary Classification ◽

Synthetic Data ◽

Real Data ◽

Method Of Determination ◽

Classification Of Images

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.

Download Full-text