scholarly journals MultiMAP: Dimensionality Reduction and Integration of Multimodal Data

2021 ◽  
Author(s):  
Mika Sarkin Jain ◽  
Krzysztof Polanski ◽  
Cecilia Dominguez Conde ◽  
Xi Chen ◽  
Jongeun Park ◽  
...  

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, an approach for dimensionality reduction and integration of multiple datasets. MultiMAP recovers a single manifold on which all of the data resides and then projects the data into a single low-dimensional space so as to preserve the structure of the manifold. It is based on a framework of Riemannian geometry and algebraic topology, and generalizes the popular UMAP algorithm1 to the multimodal setting. MultiMAP can be used for visualization of multimodal data, and as an integration approach that enables joint analyses. MultiMAP has several advantages over existing integration strategies for single-cell data, including that MultiMAP can integrate any number of datasets, leverages features that are not present in all datasets (i.e. datasets can be of different dimensionalities), is not restricted to a linear mapping, can control the influence of each dataset on the embedding, and is extremely scalable to large datasets. We apply MultiMAP to the integration of a variety of single-cell transcriptomics, chromatin accessibility, methylation, and spatial data, and show that it outperforms current approaches in preservation of high-dimensional structure, alignment of datasets, visual separation of clusters, transfer learning, and runtime. On a newly generated single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) and single-cell RNA-seq (scRNA-seq) dataset of the human thymus, we use MultiMAP to integrate cells along a temporal trajectory. This enables the quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of transcription factor kinetics.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mika Sarkin Jain ◽  
Krzysztof Polanski ◽  
Cecilia Dominguez Conde ◽  
Xi Chen ◽  
Jongeun Park ◽  
...  

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics.


2022 ◽  
Author(s):  
Britta Velten ◽  
Jana M. Braunger ◽  
Ricard Argelaguet ◽  
Damien Arnol ◽  
Jakob Wirbel ◽  
...  

AbstractFactor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.


2021 ◽  
Author(s):  
Zixiang Luo ◽  
Chenyu Xu ◽  
Zhen Zhang ◽  
Wenfei Jin

ABSTRACTDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.


2021 ◽  
Author(s):  
Stefan Canzar ◽  
Van Hoan Do ◽  
Slobodan Jelic ◽  
Soeren Laue ◽  
Domagoj Matijevic ◽  
...  

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a neural network based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiaoxiao Sun ◽  
Yiwen Liu ◽  
Lingling An

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies allow researchers to uncover the biological states of a single cell at high resolution. For computational efficiency and easy visualization, dimensionality reduction is necessary to capture gene expression patterns in low-dimensional space. Here we propose an ensemble method for simultaneous dimensionality reduction and feature gene extraction (EDGE) of scRNA-seq data. Different from existing dimensionality reduction techniques, the proposed method implements an ensemble learning scheme that utilizes massive weak learners for an accurate similarity search. Based on the similarity matrix constructed by those weak learners, the low-dimensional embedding of the data is estimated and optimized through spectral embedding and stochastic gradient descent. Comprehensive simulation and empirical studies show that EDGE is well suited for searching for meaningful organization of cells, detecting rare cell types, and identifying essential feature genes associated with certain cell types.


2020 ◽  
Author(s):  
Alireza Khodadadi-Jamayran ◽  
Aristotelis Tsirigos

SUMMARYWith the rapid growth of single cell sequencing technologies, finding cell communities with high accuracy has become crucial for large scale projects. Employing the current commonly used dimensionality reduction techniques such as tSNE and UMAP, it is often difficult to clearly distinguish cell communities in high dimensional space. Usually cell communities with similar origin and trajectories cluster so closely to each that their subtle but important differences do not become readily apparent. This creates a problem for clustering, as clustering is also performed on dimensionality reduction results. In order to identify such communities, scientists either perform broad clustering and then extract each cluster and perform re-clustering to identify sub-populations or they over-cluster the data and then merging the clusters with similar gene expressions. This is an incredibly cumbersome and time-consuming process. To solve this problem, we propose K-nearest-neighbor-based Network graph drawing Layout (KNetL, pronounced like ‘nettle’) for dimensionality reduction. In our method, we use force-directed graph drawing, whereby the attractive force (analogous to a spring force) and the repulsive force (analogous to an electrical force in atomic particles) between the cells are evaluated, and the cell communities are organized in a structural visualization. The coordinates of the force-compacted nodes are then extracted, and we employ dimensionality reduction methods, such as tSNE and UMAP to unpack the nodes. The final plot, a KNetL map, shows a visually-appealing and distinctive separation between cell communities. Our results show that KNetL maps bring significant resolution to visualizing and identifying otherwise hidden cell communities. All the algorithms are implemented in the iCellR package and available through the CRAN repository. Single (i) Cell R package (iCellR) provides great flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, interactive 2D and 3D visualizations, batch alignment or data integration, imputation, and interactive cell gating tools, which allow users to manually gate around the cells.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zixiang Luo ◽  
Chenyu Xu ◽  
Zhen Zhang ◽  
Wenfei Jin

AbstractDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.


Author(s):  
David A. Agard ◽  
Yasushi Hiraoka ◽  
John W. Sedat

In an effort to understand the complex relationship between structure and biological function within the nucleus, we have embarked on a program to examine the three-dimensional structure and organization of Drosophila melanogaster embryonic chromosomes. Our overall goal is to determine how DNA and proteins are organized into complex and highly dynamic structures (chromosomes) and how these chromosomes are arranged in three dimensional space within the cell nucleus. Futher, we hope to be able to correlate structual data with such fundamental biological properties as stage in the mitotic cell cycle, developmental state and transcription at specific gene loci.Towards this end, we have been developing methodologies for the three-dimensional analysis of non-crystalline biological specimens using optical and electron microscopy. We feel that the combination of these two complementary techniques allows an unprecedented look at the structural organization of cellular components ranging in size from 100A to 100 microns.


2019 ◽  
Vol 132 (23) ◽  
Author(s):  
Wenhui Zhou ◽  
Kayla M. Gross ◽  
Charlotte Kuperwasser

ABSTRACT The transcription factor Snai2, encoded by the SNAI2 gene, is an evolutionarily conserved C2H2 zinc finger protein that orchestrates biological processes critical to tissue development and tumorigenesis. Initially characterized as a prototypical epithelial-to-mesenchymal transition (EMT) transcription factor, Snai2 has been shown more recently to participate in a wider variety of biological processes, including tumor metastasis, stem and/or progenitor cell biology, cellular differentiation, vascular remodeling and DNA damage repair. The main role of Snai2 in controlling such processes involves facilitating the epigenetic regulation of transcriptional programs, and, as such, its dysregulation manifests in developmental defects, disruption of tissue homeostasis, and other disease conditions. Here, we discuss our current understanding of the molecular mechanisms regulating Snai2 expression, abundance and activity. In addition, we outline how these mechanisms contribute to disease phenotypes or how they may impact rational therapeutic targeting of Snai2 dysregulation in human disease.


Sign in / Sign up

Export Citation Format

Share Document