AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments

The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.

Download Full-text

Distant Speaker Recognition Based on the Automatic Selection of Reverberant Environments Using GMMs

2009 Chinese Conference on Pattern Recognition ◽

10.1109/ccpr.2009.5343954 ◽

2009 ◽

Cited By ~ 1

Author(s):

Longbiao Wang ◽

Yoshiki Kishi ◽

Atsuhiko Kai

Keyword(s):

Speaker Recognition ◽

Automatic Selection ◽

Reverberant Environments ◽

Selection Of

Download Full-text

A level-dependent auditory filter-bank for speech recognition in reverberant environments

10.21437/interspeech.2011-204 ◽

2011 ◽

Author(s):

HariKrishna Maganti ◽

Marco Matassoni

Keyword(s):

Speech Recognition ◽

Filter Bank ◽

Auditory Filter ◽

Reverberant Environments

Download Full-text

SPARSE-BASED AUDITORY MODEL FOR ROBUST SPEAKER RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412500152 ◽

2012 ◽

Vol 26 (07) ◽

pp. 1250015 ◽

Cited By ~ 1

Author(s):

DATAO YOU ◽

JIQING HAN ◽

TIERAN ZHENG ◽

GUIBIN ZHENG

Keyword(s):

Auditory Processing ◽

Speaker Recognition ◽

Filter Bank ◽

Basilar Membrane ◽

Membrane Filter ◽

Wide Band ◽

Tectorial Membrane ◽

Auditory Model ◽

Band Filter ◽

Testing Environments

The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, speaker recognition in mismatch condition is still a challenge. To solve this problem, we propose a sparse-based auditory model as the front-end of speaker recognition by simulating auditory processing of speech signal. To this end, we introduce narrow-band filter-bank instead of the widely used wide-band filter-bank to simulate the basilar membrane filter-bank, use sparse representation as the approximation of basilar membrane coding strategy, and incorporate the frequency selectivity enhance mechanism between tectorial membrane and basilar membrane by practical engineering approximation. Compared with the standard Mel-frequency cepstral coefficient approach, our preliminary experimental results indicate that the sparse-based auditory model consistently improve the robustness of speaker recognition in mismatched condition.

Download Full-text

Filter bank based cepstral features for speaker recognition

2014 IEEE Global Conference on Wireless Computing & Networking (GCWCN) ◽

10.1109/gcwcn.2014.7030857 ◽

2014 ◽

Cited By ~ 2

Author(s):

Sharada V Chougule ◽

Mahesh S Chavan ◽

M S Gaikwad

Keyword(s):

Speaker Recognition ◽

Filter Bank ◽

Cepstral Features

Download Full-text

Mel Filter Bank energy-based Slope feature and its application to speaker recognition

2011 National Conference on Communications (NCC) ◽

10.1109/ncc.2011.5734713 ◽

2011 ◽

Cited By ~ 12

Author(s):

Srikanth R Madikeri ◽

Hema A Murthy

Keyword(s):

Speaker Recognition ◽

Filter Bank

Download Full-text

Text-Independent Speaker Recognition Based on Adaptive Course Learning Loss and Deep Residual Network

10.21203/rs.3.rs-206450/v1 ◽

2021 ◽

Author(s):

Qinghua Zhong ◽

Ruining Dai ◽

Han Zhang ◽

YongSheng Zhu ◽

Guofu Zhou

Keyword(s):

Error Rate ◽

Speaker Recognition ◽

Filter Bank ◽

Equal Error Rate ◽

Residual Network ◽

Learning Loss ◽

Test Dataset ◽

Identity Recognition ◽

Signal Features ◽

Recognition Ability

Abstract Text-independent speaker recognition is widely used in identity recognition. In order to improve the features recognition ability, a method of text-independent speaker recognition based on a deep residual network model was proposed in this paper. Firstly, the original audio was extracted with a 64-dimensional log filter bank signal features. Secondly, a deep residual network was used to extract log filter bank signal features. The deep residual network was composed of a residual network and a Convolutional Attention Statistics Pooling (CASP) layer. The CASP layer could aggregate the frame-level features from the residual network into utterance-level features. Lastly, Adaptive Curriculum Learning Loss (ACLL) classifiers was used to optimize the output of abstract features by the deep residual network, and the text-independent speaker recognition was completed by ACLL classifiers. The proposed method was applied to a large VoxCeleb2 dataset for extensive text-independent speaker recognition experiments, and average equal error rate (EER) could achieve 1.76% on VoxCeleb1 test dataset, 1.91% on VoxCeleb1-E test dataset, and 3.24% on VoxCeleb1-H test dataset. Compared with related speaker recognition methods, EER was improved by 1.11% on VoxCeleb1 test dataset, 1.04% on VoxCeleb1-E test dataset, and 1.69% on VoxCeleb1-H test dataset.

Download Full-text