Unsupervised Learning with Contrastive Latent Variable Models

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00421 ◽

2021 ◽

Vol 9 ◽

pp. 1180-1196

Author(s):

Gašper Beguš

Keyword(s):

Neural Network ◽

Neural Networks ◽

Unsupervised Learning ◽

Data Exploration ◽

Categorical Variables ◽

Continuous Data ◽

Deep Convolutional Neural Networks ◽

Convolutional Networks ◽

Latent Space ◽

Identity Based

Abstract This paper models unsupervised learning of an identity-based pattern (or copying) in speech called reduplication from raw continuous data with deep convolutional neural networks. We use the ciwGAN architecture (Beguš, 2021a) in which learning of meaningful representations in speech emerges from a requirement that the CNNs generate informative data. We propose a technique to wug-test CNNs trained on speech and, based on four generative tests, argue that the network learns to represent an identity-based pattern in its latent space. By manipulating only two categorical variables in the latent space, we can actively turn an unreduplicated form into a reduplicated form with no other substantial changes to the output in the majority of cases. We also argue that the network extends the identity-based pattern to unobserved data. Exploration of how meaningful representations of identity-based patterns emerge in CNNs and how the latent space variables outside of the training range correlate with identity-based patterns in the output has general implications for neural network interpretability.

Download Full-text

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584.v1 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Comparison of Unsupervised Learning Methods for Natural Image Processing

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37886 ◽

2019 ◽

Vol 3 ◽

Author(s):

Wilfried Wöber ◽

Papius Tibihika ◽

Cristina Olaverri-Monreal ◽

Lars Mehnen ◽

Peter Sykacek ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Prior Information ◽

Latent Variable Models ◽

Component Analysis ◽

Support Vector ◽

Learning Methods

For computer vision based appraoches such as image classification (Krizhevsky et al. 2012), object detection (Ren et al. 2015) or pixel-wise weed classification (Milioto et al. 2017) machine learning is used for both feature extraction and processing (e.g. classification or regression). Historically, feature extraction (e.g. PCA; Ch. 12.1. in Bishop 2006) and processing were sequential and independent tasks (Wöber et al. 2013). Since the rise of convolutional neuronal networks (LeCun et al. 1989), a deep machine learning approach optimized for images, in 2012 (Krizhevsky et al. 2012), feature extraction for image analysis became an automated procedure. A convolutional neuronal net uses a deep architecture of artificial neurons (Goodfellow 2016) for both feature extraction and processing. Based on prior information such as image classes and supervised learning procedures, parameters of the neuronal nets are adjusted. This is known as the learning process. Simultaneously, geometric morphometrics (Tibihika et al. 2018, Cadrin and Friedland 1999) are used in biodiversity research for association analysis. Those approaches use deterministic two-dimensional locations on digital images (landmarks; Mitteroecker et al. 2013), where each position corresponds to biologically relevant regions of interest. Since this methodology is based on scientific results and compresses image content into deterministic landmarks, no uncertainty regarding those landmark positions is taken into account, which leads to information loss (Pearl 1988). Both, the reduction of this loss and novel knowledge detection, can be done using machine learning. Supervised learning methods (e.g., neuronal nets or support vector machines (Ch. 5 and 6. in Bishop 2006)) map data on prior information (e.g. labels). This increases the performance of classification or regression but affects the latent representation of the data itself. Unsupervised learning (e.g. latent variable models) uses assumptions concerning data structures to extract latent representations without prior information. Those representations does not have to be useful for data processing such as classification and due to that, the use of supervised and unsupervised machine learning and combinations of both, needs to be chosen carefully, according to the application and data. In this work, we discuss unsupervised learning algorithms in terms of explainability, performance and theoretical restrictions in context of known deep learning restrictions (Marcus 2018, Szegedy et al. 2014, Su et al. 2017). We analyse extracted features based on multiple image datasets and discuss shortcomings and performance for processing (e.g. reconstruction error or complexity measurement (Pincus 1997)) using the principal component analysis (Wöber et al. 2013), independent component analysis (Stone 2004), deep neuronal nets (auto encoders; Ch. 14 in Goodfellow 2016) and Gaussian process latent variable models (Titsias and Lawrence 2010, Lawrence 2005).

Download Full-text

Interpretable Variational Graph Autoencoder with Noninformative Prior

Future Internet ◽

10.3390/fi13020051 ◽

2021 ◽

Vol 13 (2) ◽

pp. 51

Author(s):

Lili Sun ◽

Xueyan Liu ◽

Min Zhao ◽

Bo Yang

Keyword(s):

Latent Variables ◽

Latent Variable ◽

Expert Knowledge ◽

Structural Information ◽

Standard Normal Distribution ◽

Noninformative Prior ◽

Latent Space ◽

Distribution Parameters ◽

Standard Normal ◽

Low Dimensional

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.

Download Full-text

A flexible framework for anomaly Detection via dimensionality reduction

Neural Computing and Applications ◽

10.1007/s00521-021-05839-5 ◽

2021 ◽

Author(s):

Alireza Vafaei Sadr ◽

Bruce A. Bassett ◽

M. Kunz

Keyword(s):

Anomaly Detection ◽

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensions ◽

Detection Algorithms ◽

Latent Space ◽

Wide Range ◽

Flexible Framework ◽

Online Anomaly Detection ◽

Python Package

AbstractAnomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.

Download Full-text

Efficient Dimensionality Reduction Methods in Reservoir History Matching

Energies ◽

10.3390/en14113137 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3137

Author(s):

Amine Tadjer ◽

Reider B. Bratvold ◽

Remus G. Hanea

Keyword(s):

Data Assimilation ◽

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variable ◽

History Matching ◽

Production Performance ◽

Latent Variable Model ◽

Variable Model ◽

Multiple Data ◽

Ensemble Smoother

Production forecasting is the basis for decision making in the oil and gas industry, and can be quite challenging, especially in terms of complex geological modeling of the subsurface. To help solve this problem, assisted history matching built on ensemble-based analysis such as the ensemble smoother and ensemble Kalman filter is useful in estimating models that preserve geological realism and have predictive capabilities. These methods tend, however, to be computationally demanding, as they require a large ensemble size for stable convergence. In this paper, we propose a novel method of uncertainty quantification and reservoir model calibration with much-reduced computation time. This approach is based on a sequential combination of nonlinear dimensionality reduction techniques: t-distributed stochastic neighbor embedding or the Gaussian process latent variable model and clustering K-means, along with the data assimilation method ensemble smoother with multiple data assimilation. The cluster analysis with t-distributed stochastic neighbor embedding and Gaussian process latent variable model is used to reduce the number of initial geostatistical realizations and select a set of optimal reservoir models that have similar production performance to the reference model. We then apply ensemble smoother with multiple data assimilation for providing reliable assimilation results. Experimental results based on the Brugge field case data verify the efficiency of the proposed approach.

Download Full-text

Generalization-Based Acquisition of Training Data for Motor Primitive Learning by Neural Networks

Applied Sciences ◽

10.3390/app11031013 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1013

Author(s):

Zvezdan Lončarević ◽

Rok Pahič ◽

Aleš Ude ◽

Andrej Gams

Keyword(s):

Neural Networks ◽

Dimensionality Reduction ◽

Gaussian Process Regression ◽

Search Space ◽

Robot Learning ◽

Training Data ◽

Practical Applications ◽

Latent Space ◽

Real Robot ◽

Low Dimensional

Autonomous robot learning in unstructured environments often faces the problem that the dimensionality of the search space is too large for practical applications. Dimensionality reduction techniques have been developed to address this problem and describe motor skills in low-dimensional latent spaces. Most of these techniques require the availability of a sufficiently large database of example task executions to compute the latent space. However, the generation of many example task executions on a real robot is tedious, and prone to errors and equipment failures. The main result of this paper is a new approach for efficient database gathering by performing a small number of task executions with a real robot and applying statistical generalization, e.g., Gaussian process regression, to generate more data. We have shown in our experiments that the data generated this way can be used for dimensionality reduction with autoencoder neural networks. The resulting latent spaces can be exploited to implement robot learning more efficiently. The proposed approach has been evaluated on the problem of robotic throwing at a target. Simulation and real-world results with a humanoid robot TALOS are provided. They confirm the effectiveness of generalization-based database acquisition and the efficiency of learning in a low-dimensional latent space.

Download Full-text

Probabilistic nonlinear dimensionality reduction through gaussian process latent variable models: An overview

Computer-Aided Developments: Electronics and Communication ◽

10.1201/9780429340710-10 ◽

2019 ◽

pp. 77-89

Author(s):

Matteo Bodini

Keyword(s):

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variable ◽

Latent Variable Models ◽

Nonlinear Dimensionality Reduction

Download Full-text

Categorization of Data Clustering Techniques

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch052 ◽

2008 ◽

pp. 568-577

Author(s):

Baoying Wang ◽

Imad Rahal ◽

Richard Leipold

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Data Clustering ◽

Analysis Data ◽

Discovery Process ◽

Data Set ◽

Market Basket ◽

Clustering Techniques ◽

Data Points ◽

Class Labels

Data clustering is a discovery process that partitions a data set into groups (clusters) such that data points within the same group have high similarity while being very dissimilar to points in other groups (Han & Kamber, 2001). The ultimate goal of data clustering is to discover natural groupings in a set of patterns, points, or objects without prior knowledge of any class labels. In fact, in the machine-learning literature, data clustering is typically regarded as a form of unsupervised learning as opposed to supervised learning. In unsupervised learning or clustering, there is no training function as in supervised learning. There are many applications for data clustering including, but not limited to, pattern recognition, data analysis, data compression, image processing, understanding genomic data, and market-basket research.

Download Full-text