Generative Models and Unsupervised Learning

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584.v1 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00591 ◽

2020 ◽

Author(s):

Yiyi Liao ◽

Katja Schwarz ◽

Lars Mescheder ◽

Andreas Geiger

Keyword(s):

Unsupervised Learning ◽

Image Synthesis ◽

Generative Models

Download Full-text

A Sample-based Criterion for Unsupervised Learning of Complex Models beyond Maximum Likelihood and Density Estimation

ABC Journal of Advanced Research ◽

10.18034/abcjar.v5i2.581 ◽

2016 ◽

Vol 5 (2) ◽

pp. 123-130

Author(s):

Mani Manavalan ◽

Praveen Kumar Donepudi

Keyword(s):

Maximum Likelihood ◽

Unsupervised Learning ◽

Density Estimation ◽

Nonlinear Models ◽

Probability Distributions ◽

Projection Pursuit ◽

Generative Models ◽

Probability Density Estimation ◽

Error Measure ◽

Unconstrained Model

Many unsupervised learning processes have the purpose of aligning two probability distributions. Recoding models like ICA and projection pursuit, as well as generative models like Gaussian mixtures and Boltzmann machines, can be seen in this perspective. For these types of models, we offer a new sample-based error measure that can be used even when maximum likelihood (ML) and probability density estimation-based formulations can't be used, such as when the posteriors are nonlinear or intractable. Furthermore, the challenges of approximating a density function are avoided by our sample-based error measure. We show that with an unconstrained model, (1) our technique converges on the correct solution as the number of samples increases to infinity, and (2) our approach's predicted answer in the generative framework is the ML solution. Finally, simulations of linear and nonlinear models on mixtures of Gaussians and ICA issues are used to evaluate our approach. Our method's applicability and generality are demonstrated by the experiments.

Download Full-text

Multi-layer perceptrons as nonlinear generative models for unsupervised learning: a Bayesian treatment

9th International Conference on Artificial Neural Networks: ICANN '99 ◽

10.1049/cp:19991078 ◽

1999 ◽

Cited By ~ 3

Author(s):

H. Lappalainen

Keyword(s):

Unsupervised Learning ◽

Generative Models

Download Full-text

Unsupervised learning of motion patterns using generative models

2008 15th IEEE International Conference on Image Processing ◽

10.1109/icip.2008.4711866 ◽

2008 ◽

Cited By ~ 1

Author(s):

Jacinto C. Nascimento ◽

Mario A. T. Figueiredo ◽

Jorge S. Marques

Keyword(s):

Unsupervised Learning ◽

Generative Models ◽

Motion Patterns

Download Full-text

Prior knowledge and correlational structure in unsupervised learning.

Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale ◽

10.1037/cjep20070012 ◽

2007 ◽

Vol 61 (2) ◽

pp. 109-127 ◽

Cited By ~ 3

Author(s):

John P. Clapper

Keyword(s):

Unsupervised Learning ◽

Prior Knowledge

Download Full-text

Emergence of a "visual number sense" in hierarchical generative models

PsycEXTRA Dataset ◽

10.1037/e512592013-298 ◽

2011 ◽

Author(s):

M. Zorzi ◽

I. Stoianov

Keyword(s):

Number Sense ◽

Generative Models ◽

Visual Number

Download Full-text

Unsupervised learning of object identities and their parts in a hierarchical visual memory

Frontiers in Computational Neuroscience ◽

10.3389/conf.neuro.10.2009.14.168 ◽

1970 ◽

Author(s):

Jenia Jitsev ◽

Christoph von der Malsburg

Keyword(s):

Unsupervised Learning ◽

Visual Memory

Download Full-text

Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.13 ◽

2017 ◽

Vol 7 (8) ◽

pp. 30

Author(s):

Hyeuk Kim

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Unsupervised Learning ◽

Principal Component ◽

Component Analysis ◽

Baseball Players ◽

Partitioning Around Medoids ◽

Different Characteristics

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text