Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.

Download Full-text

Noise Classification Based on GMM and AANN

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.58-60.1847 ◽

2011 ◽

Vol 58-60 ◽

pp. 1847-1853 ◽

Cited By ~ 2

Author(s):

Yan Zhang ◽

Cun Bao Chen ◽

Li Zhao

Keyword(s):

Neural Network ◽

Maximum Likelihood ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Specific Method ◽

Accuracy Rate ◽

Noise Classification ◽

Distribution Type

In this paper, Gaussian Mixture model (GMM) as specific method is applied to noise classification. On this basis, a modified Gaussian Mixture Model with an embedded Auto-Associate Neural Network (AANN) is proposed. It integrates the merits of GMM and AANN. We train GMM and AANN as a whole and they are trained by means of Maximum Likelihood (ML). In the process of training, the parameter of GMM and AANN are updated alternately. AANN reshapes the distribution of the data and improves the similarity of the feature data in the same distribution type of noise. Experiments show that the GMM with embedded AANN improves accuracy rate of noise classification against baseline GMM.

Download Full-text

Comparison of Parametric representations of Birdcall in Gaussian Mixture model

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.71 ◽

2017 ◽

Vol 2 (3) ◽

pp. 124-130

Author(s):

Ricky Mohanty ◽

Sandeep Singh Solanki

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Bird Species ◽

Gaussian Mixture ◽

Recognition System ◽

Extraction Methods ◽

Paper Briefly ◽

Mel Frequency Cepstral Coefficients ◽

Model Classification ◽

Audio Recordings

This paper focuses on the methods of automatic classifications of birds into different species based on feature extraction methods & audio recordings of their sounds. The recognition system uses Gaussian mixture model (GMM) to model 14 poultry bird species calls. Mel frequency cepstral coefficients (MFCC) parameters & wavelet parameters are used for feature vector extraction. The paper briefly explains the methods & also evaluates the performance of these methods in Gaussian Mixture Model classification .The results depicts the performance of Gaussian Mixture Model classification using wavelet was more efficient in terms of percentage of accuracy at around 80% and computation was also faster.

Download Full-text

Acoustic Model Transformation Method for Speech Recognition Employing Gaussian Mixture Model Adaptation Using Untranscribed Speech Database

The Journal of the Korean Institute of Information and Communication Engineering ◽

10.6109/jkiice.2015.19.5.1047 ◽

2015 ◽

Vol 19 (5) ◽

pp. 1047-1054

Author(s):

Wooil Kim

Keyword(s):

Speech Recognition ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Model Transformation ◽

Gaussian Mixture ◽

Transformation Method ◽

Acoustic Model ◽

Model Adaptation ◽

Speech Database

Download Full-text

Utterance Clustering Using Stereo Audio Channels

Computational Intelligence and Neuroscience ◽

10.1155/2021/6151651 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Yingjun Dong ◽

Neil G. MacLaren ◽

Yiding Cao ◽

Francis J. Yammarino ◽

Shelley D. Dionne ◽

...

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Audio Signal ◽

Gaussian Mixture ◽

Audio Signal Processing ◽

Audio Signals ◽

Multichannel Audio ◽

Audio Recordings ◽

Left And Right ◽

Complicated Conditions

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Download Full-text