scholarly journals MultiMAP: dimensionality reduction and integration of multimodal data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mika Sarkin Jain ◽  
Krzysztof Polanski ◽  
Cecilia Dominguez Conde ◽  
Xi Chen ◽  
Jongeun Park ◽  
...  

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics.

2021 ◽  
Author(s):  
Mika Sarkin Jain ◽  
Krzysztof Polanski ◽  
Cecilia Dominguez Conde ◽  
Xi Chen ◽  
Jongeun Park ◽  
...  

AbstractMultimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, an approach for dimensionality reduction and integration of multiple datasets. MultiMAP recovers a single manifold on which all of the data resides and then projects the data into a single low-dimensional space so as to preserve the structure of the manifold. It is based on a framework of Riemannian geometry and algebraic topology, and generalizes the popular UMAP algorithm1 to the multimodal setting. MultiMAP can be used for visualization of multimodal data, and as an integration approach that enables joint analyses. MultiMAP has several advantages over existing integration strategies for single-cell data, including that MultiMAP can integrate any number of datasets, leverages features that are not present in all datasets (i.e. datasets can be of different dimensionalities), is not restricted to a linear mapping, can control the influence of each dataset on the embedding, and is extremely scalable to large datasets. We apply MultiMAP to the integration of a variety of single-cell transcriptomics, chromatin accessibility, methylation, and spatial data, and show that it outperforms current approaches in preservation of high-dimensional structure, alignment of datasets, visual separation of clusters, transfer learning, and runtime. On a newly generated single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) and single-cell RNA-seq (scRNA-seq) dataset of the human thymus, we use MultiMAP to integrate cells along a temporal trajectory. This enables the quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of transcription factor kinetics.


2022 ◽  
Author(s):  
Britta Velten ◽  
Jana M. Braunger ◽  
Ricard Argelaguet ◽  
Damien Arnol ◽  
Jakob Wirbel ◽  
...  

AbstractFactor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.


2021 ◽  
Author(s):  
Zhongli Xu ◽  
Elisa Heidrich-OHare ◽  
Wei Chen ◽  
Richard H. Duerr

The recently developed transcription, epitopes, and chromatin accessibility by sequencing (TEA-seq) and similar DOGMA-seq single-cell trimodal omics assays provide unprecedented opportunities for understanding cell biology, but independent optimization, benchmarking and evaluation are lacking. We explored the utility, pros and cons of DOGMA-seq compared to the bimodal cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) assay in activated and stimulated human peripheral blood T cells. We identified an optimal incubation time and concentration of digitonin (DIG) for cell permeabilization and found that single-cell trimodal omics measurements after DIG permeabilization were generally better than after an alternative low-loss lysis (LLL) permeabilization condition. Next, we found that DOGMA-seq with optimized DIG permeabilization and its ATAC library provides more information, even though its mRNA and cell surface protein antibody-derived tag (ADT) libraries have slightly inferior quality, compared to CITE-seq. Finally, we recognized the additional value of DOGMA-seq for studying lineage-specific T helper cells.


2021 ◽  
Author(s):  
Wolfgang Kopp ◽  
Altuna Akalin ◽  
Uwe Ohler

Advances in single-cell technologies enable the routine interrogation of chromatin accessibility for tens of thousands of single cells, shedding light on gene regulatory processes at an unprecedented resolution. Meanwhile, size, sparsity and high dimensionality of the resulting data continue to pose challenges for its computational analysis, and specifically the integration of data from different sources. We have developed a dedicated computational approach, a variational auto-encoder using a noise model specifically designed for single-cell ATAC-seq data, which facilitates simultaneous dimensionality reduction and batch correction via an adversarial learning strategy. We showcase both its individual advantages on carefully chosen real and simulated data sets, as well as the benefits for detailed cell type characterization via integrating multiple complex datasets.


2021 ◽  
Author(s):  
Tommaso Biancalani ◽  
Gabriele Scalia ◽  
Lorenzo Buffoni ◽  
Raghav Avasthi ◽  
Ziqing Lu ◽  
...  

AbstractCharting an organs’ biological atlas requires us to spatially resolve the entire single-cell transcriptome, and to relate such cellular features to the anatomical scale. Single-cell and single-nucleus RNA-seq (sc/snRNA-seq) can profile cells comprehensively, but lose spatial information. Spatial transcriptomics allows for spatial measurements, but at lower resolution and with limited sensitivity. Targeted in situ technologies solve both issues, but are limited in gene throughput. To overcome these limitations we present Tangram, a method that aligns sc/snRNA-seq data to various forms of spatial data collected from the same region, including MERFISH, STARmap, smFISH, Spatial Transcriptomics (Visium) and histological images. Tangram can map any type of sc/snRNA-seq data, including multimodal data such as those from SHARE-seq, which we used to reveal spatial patterns of chromatin accessibility. We demonstrate Tangram on healthy mouse brain tissue, by reconstructing a genome-wide anatomically integrated spatial map at single-cell resolution of the visual and somatomotor areas.


2020 ◽  
Author(s):  
Tommaso Biancalani ◽  
Gabriele Scalia ◽  
Lorenzo Buffoni ◽  
Raghav Avasthi ◽  
Ziqing Lu ◽  
...  

Charting a biological atlas of an organ, such as the brain, requires us to spatially-resolve whole transcriptomes of single cells, and to relate such cellular features to the histological and anatomical scales. Single-cell and single-nucleus RNA-Seq (sc/snRNA-seq) can map cells comprehensively5,6, but relating those to their histological and anatomical positions in the context of an organ’s common coordinate framework remains a major challenge and barrier to the construction of a cell atlas7–10. Conversely, Spatial Transcriptomics allows for in-situ measurements11–13 at the histological level, but at lower spatial resolution and with limited sensitivity. Targeted in situ technologies1–3 solve both issues, but are limited in gene throughput which impedes profiling of the entire transcriptome. Finally, as samples are collected for profiling, their registration to anatomical atlases often require human supervision, which is a major obstacle to build pipelines at scale. Here, we demonstrate spatial mapping of cells, histology, and anatomy in the somatomotor area and the visual area of the healthy adult mouse brain. We devise Tangram, a method that aligns snRNA-seq data to various forms of spatial data collected from the same brain region, including MERFISH1, STARmap2, smFISH3, and Spatial Transcriptomics4 (Visium), as well as histological images and public atlases. Tangram can map any type of sc/snRNA-seq data, including multi-modal data such as SHARE-seq data5, which we used to reveal spatial patterns of chromatin accessibility. We equipped Tangram with a deep learning computer vision pipeline, which allows for automatic identification of anatomical annotations on histological images of mouse brain. By doing so, Tangram reconstructs a genome-wide, anatomically-integrated, spatial map of the visual and somatomotor area with ∼30,000 genes at single-cell resolution, revealing spatial gene expression and chromatin accessibility patterning beyond current limitation of in-situ technologies.


2021 ◽  
Vol 118 (15) ◽  
pp. e2023070118
Author(s):  
Kevin E. Wu ◽  
Kathryn E. Yost ◽  
Howard Y. Chang ◽  
James Zou

Simultaneous profiling of multiomic modalities within a single cell is a grand challenge for single-cell biology. While there have been impressive technical innovations demonstrating feasibility—for example, generating paired measurements of single-cell transcriptome (single-cell RNA sequencing [scRNA-seq]) and chromatin accessibility (single-cell assay for transposase-accessible chromatin using sequencing [scATAC-seq])—widespread application of joint profiling is challenging due to its experimental complexity, noise, and cost. Here, we introduce BABEL, a deep learning method that translates between the transcriptome and chromatin profiles of a single cell. Leveraging an interoperable neural network model, BABEL can predict single-cell expression directly from a cell’s scATAC-seq and vice versa after training on relevant data. This makes it possible to computationally synthesize paired multiomic measurements when only one modality is experimentally available. Across several paired single-cell ATAC and gene expression datasets in human and mouse, we validate that BABEL accurately translates between these modalities for individual cells. BABEL also generalizes well to cell types within new biological contexts not seen during training. Starting from scATAC-seq of patient-derived basal cell carcinoma (BCC), BABEL generated single-cell expression that enabled fine-grained classification of complex cell states, despite having never seen BCC data. These predictions are comparable to analyses of experimental BCC scRNA-seq data for diverse cell types related to BABEL’s training data. We further show that BABEL can incorporate additional single-cell data modalities, such as protein epitope profiling, thus enabling translation across chromatin, RNA, and protein. BABEL offers a powerful approach for data exploration and hypothesis generation.


Author(s):  
Noa Liscovitch-Brauer ◽  
Antonino Montalbano ◽  
Jiale Deng ◽  
Alejandro Méndez-Mancilla ◽  
Hans-Hermann Wessels ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


Sign in / Sign up

Export Citation Format

Share Document