Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

For subjects with amyotrophic lateral sclerosis (ALS), the verbal and nonverbal communication is greatly impaired. Steady state visually evoked potential (SSVEP)-based brain computer interfaces (BCIs) is one of successful alternative augmentative communications to help subjects with ALS communicate with others or devices. For practical applications, the performance of SSVEP-based BCIs is severely reduced by the effects of noises. Therefore, developing robust SSVEP-based BCIs is very important to help subjects communicate with others or devices. In this study, a noise suppression-based feature extraction and deep neural network are proposed to develop a robust SSVEP-based BCI. To suppress the effects of noises, a denoising autoencoder is proposed to extract the denoising features. To obtain an acceptable recognition result for practical applications, the deep neural network is used to find the decision results of SSVEP-based BCIs. The experimental results showed that the proposed approaches can effectively suppress the effects of noises and the performance of SSVEP-based BCIs can be greatly improved. Besides, the deep neural network outperforms other approaches. Therefore, the proposed robust SSVEP-based BCI is very useful for practical applications.

Download Full-text

Stacked denoising autoencoder and dropout together to prevent overfitting in deep neural network

2015 8th International Congress on Image and Signal Processing (CISP) ◽

10.1109/cisp.2015.7407967 ◽

2015 ◽

Cited By ~ 13

Author(s):

Jianglin Liang ◽

Ruifang Liu

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Denoising Autoencoder

Download Full-text

Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments

Neural Computing and Applications ◽

10.1007/s00521-018-3760-2 ◽

2018 ◽

Vol 32 (7) ◽

pp. 2575-2587 ◽

Cited By ~ 6

Author(s):

Ismail Shahin ◽

Ali Bou Nassif ◽

Shibani Hamsa

Keyword(s):

Neural Network ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Deep Neural Network ◽

Speaker Identification ◽

Gaussian Mixture ◽

Neural Network Classifier ◽

Emotional Talking Environments

Download Full-text

Robust Deep Neural Network Using Fuzzy Denoising Autoencoder

International Journal of Fuzzy Systems ◽

10.1007/s40815-020-00845-6 ◽

2020 ◽

Vol 22 (4) ◽

pp. 1356-1375 ◽

Cited By ~ 1

Author(s):

Hong-Gui Han ◽

Hui-Juan Zhang ◽

Jun-Fei Qiao

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Denoising Autoencoder

Download Full-text

A Deep Neural Network Model for Speaker Identification

Applied Sciences ◽

10.3390/app11083603 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3603

Author(s):

Feng Ye ◽

Jun Yang

Keyword(s):

Neural Network ◽

Time Series ◽

Feature Extraction ◽

Network Model ◽

Speech Signal ◽

Deep Neural Network ◽

Speaker Identification ◽

Network Models ◽

Acoustic Features ◽

Gated Recurrent Unit

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification.

Download Full-text

Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Traitement du signal ◽

10.18280/ts.380623 ◽

2021 ◽

Vol 38 (6) ◽

pp. 1793-1799

Author(s):

Shivaprasad Satla ◽

Sadanandam Manchala

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Processing ◽

Deep Neural Network ◽

Speaker Identification ◽

Research Work ◽

Gaussian Mixture ◽

Vital Role ◽

Identification System ◽

Mel Frequency Cepstral Coefficients

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.

Download Full-text