scholarly journals Utterance Clustering Using Stereo Audio Channels

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yingjun Dong ◽  
Neil G. MacLaren ◽  
Yiding Cao ◽  
Francis J. Yammarino ◽  
Shelley D. Dionne ◽  
...  

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Symmetry ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 19
Author(s):  
Hsiuying Wang

High-dimensional data recognition problem based on the Gaussian Mixture model has useful applications in many area, such as audio signal recognition, image analysis, and biological evolution. The expectation-maximization algorithm is a popular approach to the derivation of the maximum likelihood estimators of the Gaussian mixture model (GMM). An alternative solution is to adopt a generalized Bayes estimator for parameter estimation. In this study, an estimator based on the generalized Bayes approach is established. A simulation study shows that the proposed approach has a performance competitive to that of the conventional method in high-dimensional Gaussian mixture model recognition. We use a musical data example to illustrate this recognition problem. Suppose that we have audio data of a piece of music and know that the music is from one of four compositions, but we do not know exactly which composition it comes from. The generalized Bayes method shows a higher average recognition rate than the conventional method. This result shows that the generalized Bayes method is a competitor to the conventional method in this real application.


Author(s):  
Ricky Mohanty ◽  
Sandeep Singh Solanki

This paper focuses on the methods of automatic classifications of birds into different species based on feature extraction methods & audio recordings of their sounds. The recognition system uses Gaussian mixture model (GMM) to model 14 poultry bird species calls. Mel frequency cepstral coefficients (MFCC) parameters & wavelet parameters are used for feature vector extraction. The paper briefly explains the methods &  also evaluates the performance of these methods in Gaussian Mixture Model classification .The results depicts the performance of  Gaussian Mixture Model classification using wavelet was more efficient in terms of percentage of accuracy  at around 80% and computation was also faster.


2021 ◽  
Vol 11 (2) ◽  
pp. 35-41
Author(s):  
Thurgeaswary Rokanatnam ◽  
Hazinah Kutty Mammi

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.


2018 ◽  
Vol 30 (4) ◽  
pp. 642
Author(s):  
Guichao Lin ◽  
Yunchao Tang ◽  
Xiangjun Zou ◽  
Qing Zhang ◽  
Xiaojie Shi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document