scholarly journals Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Author(s):  
Zhaofeng Zhang ◽  
Longbiao Wang ◽  
Atsuhiko Kai ◽  
Takanori Yamada ◽  
Weifeng Li ◽  
...  
IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 32187-32202 ◽  
Author(s):  
Rashid Jahangir ◽  
Ying Wah TEh ◽  
Nisar Ahmed Memon ◽  
Ghulam Mujtaba ◽  
Mahdi Zareei ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5019
Author(s):  
Yeou-Jiunn Chen ◽  
Pei-Chung Chen ◽  
Shih-Chung Chen ◽  
Chung-Min Wu

For subjects with amyotrophic lateral sclerosis (ALS), the verbal and nonverbal communication is greatly impaired. Steady state visually evoked potential (SSVEP)-based brain computer interfaces (BCIs) is one of successful alternative augmentative communications to help subjects with ALS communicate with others or devices. For practical applications, the performance of SSVEP-based BCIs is severely reduced by the effects of noises. Therefore, developing robust SSVEP-based BCIs is very important to help subjects communicate with others or devices. In this study, a noise suppression-based feature extraction and deep neural network are proposed to develop a robust SSVEP-based BCI. To suppress the effects of noises, a denoising autoencoder is proposed to extract the denoising features. To obtain an acceptable recognition result for practical applications, the deep neural network is used to find the decision results of SSVEP-based BCIs. The experimental results showed that the proposed approaches can effectively suppress the effects of noises and the performance of SSVEP-based BCIs can be greatly improved. Besides, the deep neural network outperforms other approaches. Therefore, the proposed robust SSVEP-based BCI is very useful for practical applications.


2020 ◽  
Vol 22 (4) ◽  
pp. 1356-1375 ◽  
Author(s):  
Hong-Gui Han ◽  
Hui-Juan Zhang ◽  
Jun-Fei Qiao

2021 ◽  
Vol 11 (8) ◽  
pp. 3603
Author(s):  
Feng Ye ◽  
Jun Yang

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification.


2021 ◽  
Vol 38 (6) ◽  
pp. 1793-1799
Author(s):  
Shivaprasad Satla ◽  
Sadanandam Manchala

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.


Sign in / Sign up

Export Citation Format

Share Document