Utterance Clustering Using Stereo Audio Channels

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Download Full-text

The Generalized Bayes Method for High-Dimensional Data Recognition with Applications to Audio Signal Recognition

Symmetry ◽

10.3390/sym13010019 ◽

2020 ◽

Vol 13 (1) ◽

pp. 19

Author(s):

Hsiuying Wang

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Conventional Method ◽

High Dimensional Data ◽

Audio Signal ◽

Gaussian Mixture ◽

High Dimensional ◽

Signal Recognition ◽

Bayes Method ◽

Generalized Bayes

High-dimensional data recognition problem based on the Gaussian Mixture model has useful applications in many area, such as audio signal recognition, image analysis, and biological evolution. The expectation-maximization algorithm is a popular approach to the derivation of the maximum likelihood estimators of the Gaussian mixture model (GMM). An alternative solution is to adopt a generalized Bayes estimator for parameter estimation. In this study, an estimator based on the generalized Bayes approach is established. A simulation study shows that the proposed approach has a performance competitive to that of the conventional method in high-dimensional Gaussian mixture model recognition. We use a musical data example to illustrate this recognition problem. Suppose that we have audio data of a piece of music and know that the music is from one of four compositions, but we do not know exactly which composition it comes from. The generalized Bayes method shows a higher average recognition rate than the conventional method. This result shows that the generalized Bayes method is a competitor to the conventional method in this real application.

Download Full-text

Comparison of Parametric representations of Birdcall in Gaussian Mixture model

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.71 ◽

2017 ◽

Vol 2 (3) ◽

pp. 124-130

Author(s):

Ricky Mohanty ◽

Sandeep Singh Solanki

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Bird Species ◽

Gaussian Mixture ◽

Recognition System ◽

Extraction Methods ◽

Paper Briefly ◽

Mel Frequency Cepstral Coefficients ◽

Model Classification ◽

Audio Recordings

This paper focuses on the methods of automatic classifications of birds into different species based on feature extraction methods & audio recordings of their sounds. The recognition system uses Gaussian mixture model (GMM) to model 14 poultry bird species calls. Mel frequency cepstral coefficients (MFCC) parameters & wavelet parameters are used for feature vector extraction. The paper briefly explains the methods & also evaluates the performance of these methods in Gaussian Mixture Model classification .The results depicts the performance of Gaussian Mixture Model classification using wavelet was more efficient in terms of percentage of accuracy at around 80% and computation was also faster.

Download Full-text

Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.343 ◽

2021 ◽

Vol 11 (2) ◽

pp. 35-41

Author(s):

Thurgeaswary Rokanatnam ◽

Hazinah Kutty Mammi

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Noise Removal ◽

Accuracy Rate ◽

Speech Corpus ◽

Speech Database ◽

Audio Recordings ◽

Speech Data ◽

Mel Frequency Cepstrum Coefficient

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.

Download Full-text