Projections as visual aids for classification system design

Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms.

Download Full-text

Unsupervised Text Feature Learning via Deep Variational Auto-encoder

Information Technology And Control ◽

10.5755/j01.itc.49.3.25918 ◽

2020 ◽

Vol 49 (3) ◽

pp. 421-437

Author(s):

Genggeng Liu ◽

Lin Xie ◽

Chi-Hua Chen

Keyword(s):

Dimensionality Reduction ◽

High Dimensional Data ◽

Image Data ◽

Original Data ◽

Feature Representation ◽

High Dimensional ◽

Learning To Learn ◽

Text Feature ◽

Reduction Methods ◽

Low Dimensional

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.

Download Full-text

Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data

10.1101/722918 ◽

2019 ◽

Cited By ~ 1

Author(s):

Robert Krueger ◽

Johanna Beyer ◽

Won-Dong Jang ◽

Nam Wook Kim ◽

Artem Sokolov ◽

...

Keyword(s):

Supervised Learning ◽

Visual Analytics ◽

Cancer Biology ◽

Large Scale ◽

Hierarchical Structures ◽

Image Data ◽

Automated Analysis ◽

Cell Types ◽

High Dimensional ◽

Exploration Process

AbstractFacetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.

Download Full-text

Capturing discrete latent structures: choose LDs over PCs

Biostatistics ◽

10.1093/biostatistics/kxab030 ◽

2021 ◽

Author(s):

Theresa A Alexander ◽

Rafael A Irizarry ◽

Héctor Corrada Bravo

Keyword(s):

Dimensionality Reduction ◽

Biological Data ◽

Reduction Technique ◽

Latent Structure ◽

High Dimensional ◽

Underlying Structure ◽

Linear Transformations ◽

Latent Structures ◽

Low Dimensional ◽

Discriminatory Information

Summary High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.

Download Full-text

Extracting Low-Dimensional Latent Structure from Time Series in the Presence of Delays

Neural Computation ◽

10.1162/neco_a_00759 ◽

2015 ◽

Vol 27 (9) ◽

pp. 1825-1856 ◽

Cited By ~ 19

Author(s):

Karthik C. Lakshmanan ◽

Patrick T. Sadtler ◽

Elizabeth C. Tyler-Kabara ◽

Aaron P. Batista ◽

Byron M. Yu

Keyword(s):

Time Series ◽

Time Delay ◽

Motor Cortex ◽

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variables ◽

Time Delays ◽

High Dimensional ◽

Latent Space ◽

Low Dimensional

Noisy, high-dimensional time series observations can often be described by a set of low-dimensional latent variables. Commonly used methods to extract these latent variables typically assume instantaneous relationships between the latent and observed variables. In many physical systems, changes in the latent variables manifest as changes in the observed variables after time delays. Techniques that do not account for these delays can recover a larger number of latent variables than are present in the system, thereby making the latent representation more difficult to interpret. In this work, we introduce a novel probabilistic technique, time-delay gaussian-process factor analysis (TD-GPFA), that performs dimensionality reduction in the presence of a different time delay between each pair of latent and observed variables. We demonstrate how using a gaussian process to model the evolution of each latent variable allows us to tractably learn these delays over a continuous domain. Additionally, we show how TD-GPFA combines temporal smoothing and dimensionality reduction into a common probabilistic framework. We present an expectation/conditional maximization either (ECME) algorithm to learn the model parameters. Our simulations demonstrate that when time delays are present, TD-GPFA is able to correctly identify these delays and recover the latent space. We then applied TD-GPFA to the activity of tens of neurons recorded simultaneously in the macaque motor cortex during a reaching task. TD-GPFA is able to better describe the neural activity using a more parsimonious latent space than GPFA, a method that has been used to interpret motor cortex data but does not account for time delays. More broadly, TD-GPFA can help to unravel the mechanisms underlying high-dimensional time series data by taking into account physical delays in the system.

Download Full-text

SCDRHA: A scRNA-Seq Data Dimensionality Reduction Algorithm Based on Hierarchical Autoencoder

Frontiers in Genetics ◽

10.3389/fgene.2021.733906 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jianping Zhao ◽

Na Wang ◽

Haiyun Wang ◽

Chunhou Zheng ◽

Yansen Su

Keyword(s):

Dimensionality Reduction ◽

Data Visualization ◽

State Of The Art ◽

Dimensional Space ◽

High Dimensional ◽

Reduction Algorithm ◽

Cell Clustering ◽

Data Dimensionality Reduction ◽

Single Cell Rna Sequencing ◽

Low Dimensional

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.

Download Full-text

Deep learning for predicting disease status using genomic data

10.7287/peerj.preprints.27123 ◽

2018 ◽

Cited By ~ 1

Author(s):

Qianfan Wu ◽

Adel Boueiz ◽

Alican Bozkurt ◽

Arya Masoomi ◽

Allan Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Rapid Development ◽

Learning Algorithms ◽

Genomic Data ◽

Disease Status ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Learning Approach ◽

Low Dimensional

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Download Full-text

A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-Seq data

10.1101/689851 ◽

2019 ◽

Cited By ~ 9

Author(s):

Shamus M. Cooley ◽

Timothy Hamilton ◽

J. Christian J. Ray ◽

Eric J. Deeds

Keyword(s):

Dimensionality Reduction ◽

Single Cells ◽

High Dimensional Data ◽

Three Dimensional ◽

Principal Component ◽

Cell Types ◽

High Dimensional ◽

Before And After ◽

Downstream Analysis ◽

High Level

AbstractHigh-dimensional data are becoming increasingly common in nearly all areas of science. Developing approaches to analyze these data and understand their meaning is a pressing issue. This is particularly true for the rapidly growing field of single-cell RNA-Seq (scRNA-Seq), a technique that simultaneously measures the expression of tens of thousands of genes in thousands to millions of single cells. The emerging consensus for analysis workflows reduces the dimensionality of the dataset before performing downstream analysis, such as assignment of cell types. One problem with this approach is that dimensionality reduction can introduce substantial distortion into the data; consider the familiar example of trying to represent the three-dimensional earth as a two-dimensional map. It is currently unclear if such distortion affects analysis of scRNA-Seq data sets. Here, we introduce a straightforward approach to quantifying this distortion by comparing the local neighborhoods of points before and after dimensionality reduction. We found that popular techniques like t-SNE and UMAP introduce substantial distortion even for relatively simple geometries such as simulated hyperspheres. For scRNA-Seq data, we found the distortion in local neighborhoods was often greater than 95% in the representations typically used for downstream analysis. This high level of distortion can readily introduce important errors into cell type identification, pseudotime ordering, and other analyses that rely on local relationships. We found that principal component analysis can generate accurate embeddings of the data, but only when using dimensionalities that are much higher than typically used in scRNA-Seq analysis. We suggest approaches to take these findings into account and call for a new generation of dimensional reduction algorithms that can accurately embed high dimensional data in its true latent dimension.

Download Full-text

Explaining three-dimensional dimensionality reduction plots

Information Visualization ◽

10.1177/1473871615600010 ◽

2015 ◽

Vol 15 (2) ◽

pp. 154-172 ◽

Cited By ~ 11

Author(s):

Danilo B Coimbra ◽

Rafael M Martins ◽

Tácito TAT Neves ◽

Alexandru C Telea ◽

Fernando V Paulovich

Keyword(s):

Dimensionality Reduction ◽

Dimensional Space ◽

Three Dimensional ◽

Original Data ◽

Reduction Technique ◽

High Dimensional ◽

Dimensionality Reduction Technique ◽

Visualization Techniques ◽

High Dimensional Datasets ◽

Three Dimensional Space

Understanding three-dimensional projections created by dimensionality reduction from high-variate datasets is very challenging. In particular, classical three-dimensional scatterplots used to display such projections do not explicitly show the relations between the projected points, the viewpoint used to visualize the projection, and the original data variables. To explore and explain such relations, we propose a set of interactive visualization techniques. First, we adapt and enhance biplots to show the data variables in the projected three-dimensional space. Next, we use a set of interactive bar chart legends to show variables that are visible from a given viewpoint and also assist users to select an optimal viewpoint to examine a desired set of variables. Finally, we propose an interactive viewpoint legend that provides an overview of the information visible in a given three-dimensional projection from all possible viewpoints. Our techniques are simple to implement and can be applied to any dimensionality reduction technique. We demonstrate our techniques on the exploration of several real-world high-dimensional datasets.

Download Full-text

High Dimensional Correspondences from Low Dimensional Manifolds – An Empirical Comparison of Graph-Based Dimensionality Reduction Algorithms

Computer Vision – ACCV 2010 Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-642-22819-3_34 ◽

2011 ◽

pp. 334-343

Author(s):

Ribana Roscher ◽

Falko Schindler ◽

Wolfgang Förstner

Keyword(s):

Dimensionality Reduction ◽

High Dimensional ◽

Empirical Comparison ◽

Low Dimensional

Download Full-text

Study on dual peg-in-hole insertion using of constraints formed in the environment

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-07-2016-0186 ◽

2017 ◽

Vol 44 (6) ◽

pp. 730-740 ◽

Cited By ~ 3

Author(s):

Jianhua Su ◽

Rui Li ◽

Hong Qiao ◽

Jing Xu ◽

Qinglin Ai ◽

...

Keyword(s):

Configuration Space ◽

Dimensional Space ◽

Three Dimensional ◽

Force Sensor ◽

High Dimensional ◽

Content Type ◽

Attractive Region ◽

Low Dimensional ◽

The Cost ◽

Yaw Angle

Purpose The purpose of this paper is to develop a dual peg-in-hole insertion strategy. Dual peg-in-hole insertion is the most common task in manufacturing. Most of the previous work develop the insertion strategy in a two- or three-dimensional space, in which they suppose the initial yaw angle is zero and only concern the roll and pitch angles. However, in some case, the yaw angle could not be ignored due to the pose uncertainty of the peg on the gripper. Therefore, there is a need to design the insertion strategy in a higher-dimensional configuration space. Design/methodology/approach In this paper, the authors handle the insertion problem by converting it into several sub-problems based on the attractive region formed by the constraints. The existence of the attractive region in the high-dimensional configuration space is first discussed. Then, the construction of the high-dimensional attractive region with its sub-attractive region in the low-dimensional space is proposed. Therefore, the robotic insertion strategy can be designed in the subspace to eliminate some uncertainties between the dual pegs and dual holes. Findings Dual peg-in-hole insertion is realized without using of force sensors. The proposed strategy is also used to demonstrate the precision dual peg-in-hole insertion, where the clearance between the dual-peg and dual-hole is about 0.02 mm. Practical implications The sensor-less insertion strategy will not increase the cost of the assembly system and also can be used in the dual peg-in-hole insertion. Originality/value The theoretical and experimental analyses for dual peg-in-hole insertion are proposed without using of force sensor.

Download Full-text