Background & Objective:
Speaker Recognition (SR) techniques have been developed into
a relatively mature status over the past few decades through development work. Existing methods
typically use robust features extracted from clean speech signals, and therefore in idealized conditions
can achieve very high recognition accuracy. For critical applications, such as security and forensics,
robustness and reliability of the system are crucial.
Methods:
The background noise and reverberation as often occur in many real-world applications are
known to compromise recognition performance. To improve the performance of speaker verification
systems, an effective and robust technique is proposed to extract features for speech processing, capable
of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs)
and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most
common features, which are used for speaker recognition. MFCCs are calculated from the log energies
in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of
Gammatone filters, which was originally suggested to model human cochlear filtering. This paper
investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions.
The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance
have been taken into account in this work.
Conclusion:
Experimental results have shown significant improvement in system performance in
terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition
rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation.
Results of the study are also presented and discussed.