scholarly journals Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique

2021 ◽  
Vol 11 (2) ◽  
pp. 35-41
Author(s):  
Thurgeaswary Rokanatnam ◽  
Hazinah Kutty Mammi

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.

2011 ◽  
Vol 58-60 ◽  
pp. 1847-1853 ◽  
Author(s):  
Yan Zhang ◽  
Cun Bao Chen ◽  
Li Zhao

In this paper, Gaussian Mixture model (GMM) as specific method is applied to noise classification. On this basis, a modified Gaussian Mixture Model with an embedded Auto-Associate Neural Network (AANN) is proposed. It integrates the merits of GMM and AANN. We train GMM and AANN as a whole and they are trained by means of Maximum Likelihood (ML). In the process of training, the parameter of GMM and AANN are updated alternately. AANN reshapes the distribution of the data and improves the similarity of the feature data in the same distribution type of noise. Experiments show that the GMM with embedded AANN improves accuracy rate of noise classification against baseline GMM.


Author(s):  
Ricky Mohanty ◽  
Sandeep Singh Solanki

This paper focuses on the methods of automatic classifications of birds into different species based on feature extraction methods & audio recordings of their sounds. The recognition system uses Gaussian mixture model (GMM) to model 14 poultry bird species calls. Mel frequency cepstral coefficients (MFCC) parameters & wavelet parameters are used for feature vector extraction. The paper briefly explains the methods &  also evaluates the performance of these methods in Gaussian Mixture Model classification .The results depicts the performance of  Gaussian Mixture Model classification using wavelet was more efficient in terms of percentage of accuracy  at around 80% and computation was also faster.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yingjun Dong ◽  
Neil G. MacLaren ◽  
Yiding Cao ◽  
Francis J. Yammarino ◽  
Shelley D. Dionne ◽  
...  

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.


2018 ◽  
Vol 30 (4) ◽  
pp. 642
Author(s):  
Guichao Lin ◽  
Yunchao Tang ◽  
Xiangjun Zou ◽  
Qing Zhang ◽  
Xiaojie Shi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document