MultiMAP: Dimensionality Reduction and Integration of Multimodal Data

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, an approach for dimensionality reduction and integration of multiple datasets. MultiMAP recovers a single manifold on which all of the data resides and then projects the data into a single low-dimensional space so as to preserve the structure of the manifold. It is based on a framework of Riemannian geometry and algebraic topology, and generalizes the popular UMAP algorithm1 to the multimodal setting. MultiMAP can be used for visualization of multimodal data, and as an integration approach that enables joint analyses. MultiMAP has several advantages over existing integration strategies for single-cell data, including that MultiMAP can integrate any number of datasets, leverages features that are not present in all datasets (i.e. datasets can be of different dimensionalities), is not restricted to a linear mapping, can control the influence of each dataset on the embedding, and is extremely scalable to large datasets. We apply MultiMAP to the integration of a variety of single-cell transcriptomics, chromatin accessibility, methylation, and spatial data, and show that it outperforms current approaches in preservation of high-dimensional structure, alignment of datasets, visual separation of clusters, transfer learning, and runtime. On a newly generated single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) and single-cell RNA-seq (scRNA-seq) dataset of the human thymus, we use MultiMAP to integrate cells along a temporal trajectory. This enables the quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of transcription factor kinetics.

Download Full-text

MultiMAP: dimensionality reduction and integration of multimodal data

Genome Biology ◽

10.1186/s13059-021-02565-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mika Sarkin Jain ◽

Krzysztof Polanski ◽

Cecilia Dominguez Conde ◽

Xi Chen ◽

Jongeun Park ◽

...

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Binding Site ◽

Spatial Data ◽

Cell Biology ◽

Chromatin Accessibility ◽

Linear Mapping ◽

Multimodal Data ◽

Transcription Factor Expression ◽

Temporal Trajectory

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics.

Download Full-text

Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO

Nature Methods ◽

10.1038/s41592-021-01343-9 ◽

2022 ◽

Author(s):

Britta Velten ◽

Jana M. Braunger ◽

Ricard Argelaguet ◽

Damien Arnol ◽

Jakob Wirbel ◽

...

Keyword(s):

Factor Analysis ◽

Dimensionality Reduction ◽

Single Cell ◽

Cell Biology ◽

Multimodal Data ◽

Spatially Resolved ◽

Temporal And Spatial ◽

Spatio Temporal ◽

Personalized Health ◽

Analysis Models

AbstractFactor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.

Download Full-text

scGAE: topology-preserving dimensionality reduction for single-cell RNA-seq data using graph autoencoder

10.1101/2021.02.16.431357 ◽

2021 ◽

Author(s):

Zixiang Luo ◽

Chenyu Xu ◽

Zhen Zhang ◽

Wenfei Jin

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Topological Structure ◽

Dimensional Space ◽

Simulated Data ◽

Oriented Graph ◽

Developmental Trajectory ◽

Structure Information ◽

Low Dimensional ◽

Cell Graph

ABSTRACTDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.

Download Full-text

Metric Multidimensional Scaling for Large Single-Cell Data Sets using Neural Networks

10.1101/2021.06.24.449725 ◽

2021 ◽

Author(s):

Stefan Canzar ◽

Van Hoan Do ◽

Slobodan Jelic ◽

Soeren Laue ◽

Domagoj Matijevic ◽

...

Keyword(s):

Multidimensional Scaling ◽

Single Cell ◽

State Of The Art ◽

Dimensional Space ◽

Linear Mapping ◽

Alternative Methods ◽

Dimensional Euclidean Space ◽

Data Sets ◽

Metric Multidimensional Scaling ◽

Low Dimensional

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a neural network based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

Download Full-text

Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data

Nature Communications ◽

10.1038/s41467-020-19465-7 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Xiaoxiao Sun ◽

Yiwen Liu ◽

Lingling An

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Dimensional Space ◽

Essential Feature ◽

Empirical Studies ◽

Expression Patterns ◽

Cell Types ◽

Stochastic Gradient Descent ◽

Reduction Techniques ◽

Low Dimensional

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies allow researchers to uncover the biological states of a single cell at high resolution. For computational efficiency and easy visualization, dimensionality reduction is necessary to capture gene expression patterns in low-dimensional space. Here we propose an ensemble method for simultaneous dimensionality reduction and feature gene extraction (EDGE) of scRNA-seq data. Different from existing dimensionality reduction techniques, the proposed method implements an ensemble learning scheme that utilizes massive weak learners for an accurate similarity search. Based on the similarity matrix constructed by those weak learners, the low-dimensional embedding of the data is estimated and optimized through spectral embedding and stochastic gradient descent. Comprehensive simulation and empirical studies show that EDGE is well suited for searching for meaningful organization of cells, detecting rare cell types, and identifying essential feature genes associated with certain cell types.

Download Full-text

Graph Drawing-based Dimensionality Reduction to Identify Hidden Communities in Single-Cell Sequencing Spatial Representation

10.1101/2020.05.05.078550 ◽

2020 ◽

Author(s):

Alireza Khodadadi-Jamayran ◽

Aristotelis Tsirigos

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Large Scale ◽

Graph Drawing ◽

Dimensional Space ◽

K Nearest Neighbor ◽

Network Graph ◽

Gene Expressions ◽

Single Cell Sequencing ◽

Spring Force

SUMMARYWith the rapid growth of single cell sequencing technologies, finding cell communities with high accuracy has become crucial for large scale projects. Employing the current commonly used dimensionality reduction techniques such as tSNE and UMAP, it is often difficult to clearly distinguish cell communities in high dimensional space. Usually cell communities with similar origin and trajectories cluster so closely to each that their subtle but important differences do not become readily apparent. This creates a problem for clustering, as clustering is also performed on dimensionality reduction results. In order to identify such communities, scientists either perform broad clustering and then extract each cluster and perform re-clustering to identify sub-populations or they over-cluster the data and then merging the clusters with similar gene expressions. This is an incredibly cumbersome and time-consuming process. To solve this problem, we propose K-nearest-neighbor-based Network graph drawing Layout (KNetL, pronounced like ‘nettle’) for dimensionality reduction. In our method, we use force-directed graph drawing, whereby the attractive force (analogous to a spring force) and the repulsive force (analogous to an electrical force in atomic particles) between the cells are evaluated, and the cell communities are organized in a structural visualization. The coordinates of the force-compacted nodes are then extracted, and we employ dimensionality reduction methods, such as tSNE and UMAP to unpack the nodes. The final plot, a KNetL map, shows a visually-appealing and distinctive separation between cell communities. Our results show that KNetL maps bring significant resolution to visualizing and identifying otherwise hidden cell communities. All the algorithms are implemented in the iCellR package and available through the CRAN repository. Single (i) Cell R package (iCellR) provides great flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, interactive 2D and 3D visualizations, batch alignment or data integration, imputation, and interactive cell gating tools, which allow users to manually gate around the cells.

Download Full-text

A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder

Scientific Reports ◽

10.1038/s41598-021-99003-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zixiang Luo ◽

Chenyu Xu ◽

Zhen Zhang ◽

Wenfei Jin

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Topological Structure ◽

Reduction Method ◽

Dimensional Space ◽

Oriented Graph ◽

Developmental Trajectory ◽

Structure Information ◽

Dimensionality Reduction Method ◽

Cell Graph

AbstractDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.

Download Full-text

Three Dimensional structure and organization of diploid chromosomes by optical section microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100127608 ◽

1987 ◽

Vol 45 ◽

pp. 633-635

Author(s):

David A. Agard ◽

Yasushi Hiraoka ◽

John W. Sedat

Keyword(s):

Dimensional Space ◽

Three Dimensional ◽

Developmental State ◽

Mitotic Cell ◽

Biological Properties ◽

Dimensional Structure ◽

Specific Gene ◽

Three Dimensional Structure ◽

Complementary Techniques ◽

Dynamic Structures

In an effort to understand the complex relationship between structure and biological function within the nucleus, we have embarked on a program to examine the three-dimensional structure and organization of Drosophila melanogaster embryonic chromosomes. Our overall goal is to determine how DNA and proteins are organized into complex and highly dynamic structures (chromosomes) and how these chromosomes are arranged in three dimensional space within the cell nucleus. Futher, we hope to be able to correlate structual data with such fundamental biological properties as stage in the mitotic cell cycle, developmental state and transcription at specific gene loci.Towards this end, we have been developing methodologies for the three-dimensional analysis of non-crystalline biological specimens using optical and electron microscopy. We feel that the combination of these two complementary techniques allows an unprecedented look at the structural organization of cellular components ranging in size from 100A to 100 microns.

Download Full-text

Faculty Opinions recommendation of Single-cell sequencing in stem cell biology.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726290250.793542739 ◽

2018 ◽

Author(s):

Catherine Verfaillie

Keyword(s):

Stem Cell ◽

Single Cell ◽

Cell Biology ◽

Stem Cell Biology ◽

Single Cell Sequencing

Download Full-text

Molecular regulation of Snai2 in development and disease

Journal of Cell Science ◽

10.1242/jcs.235127 ◽

2019 ◽

Vol 132 (23) ◽

Author(s):

Wenhui Zhou ◽

Kayla M. Gross ◽

Charlotte Kuperwasser

Keyword(s):

Transcription Factor ◽

Cell Biology ◽

Molecular Mechanisms ◽

Epithelial To Mesenchymal Transition ◽

Zinc Finger Protein ◽

Main Role ◽

Biological Processes ◽

Developmental Defects ◽

Mesenchymal Transition ◽

Damage Repair

ABSTRACT The transcription factor Snai2, encoded by the SNAI2 gene, is an evolutionarily conserved C2H2 zinc finger protein that orchestrates biological processes critical to tissue development and tumorigenesis. Initially characterized as a prototypical epithelial-to-mesenchymal transition (EMT) transcription factor, Snai2 has been shown more recently to participate in a wider variety of biological processes, including tumor metastasis, stem and/or progenitor cell biology, cellular differentiation, vascular remodeling and DNA damage repair. The main role of Snai2 in controlling such processes involves facilitating the epigenetic regulation of transcriptional programs, and, as such, its dysregulation manifests in developmental defects, disruption of tissue homeostasis, and other disease conditions. Here, we discuss our current understanding of the molecular mechanisms regulating Snai2 expression, abundance and activity. In addition, we outline how these mechanisms contribute to disease phenotypes or how they may impact rational therapeutic targeting of Snai2 dysregulation in human disease.

Download Full-text