scholarly journals Modeling latent flows on single-cell data using the Hodge decomposition

2019 ◽  
Author(s):  
Kazumitsu Maehara ◽  
Yasuyuki Ohkawa

AbstractSingle-cell analysis is a powerful technique used to identify a specific cell population of interest during differentiation, aging, or oncogenesis. Individual cells occupy a particular transient state in the cell cycle, circadian rhythm, or during cell death. An appealing concept of pseudo-time trajectory analysis of single-cell RNA sequencing data was proposed in the software Monocle, and several methods of trajectory analysis have since been published to date. These aim to infer the ordering of cells and enable the tracing of gene expression profile trajectories in cell differentiation and reprogramming. However, the methods are restricted in terms of time structure because of the pre-specified structure of trajectories (linear, branched, tree or cyclic) which contrasts with the mixed state of single cells.Here, we propose a technique to extract underlying flows in single-cell data based on the Hodge decomposition (HD). HD is a theorem of vector fields on a manifold which guarantees that any given flow can decompose into three types of orthogonal component: gradient-flow (acyclic), curl-, and harmonic-flow (cyclic). HD is generalized on a simplicial complex (graph) and the discretized HD has only a weak assumption that the graph is directed. Therefore, in principle, HD can extract flows from any mixture of tree and cyclic time flows of observed cells. The decomposed flows provide intuitive interpretations about complex flow because of their linearity and orthogonality. Thus, each extracted flow can be focused on separately with no need to consider crosstalk.We developed ddhodge software, which aims to model the underlying flow structure that implies unobserved time or causal relations in the hodge-podge collection of data points. We demonstrated that the mathematical framework of HD is suitable to reconstruct a sparse graph representation of diffusion process as a candidate model of differentiation while preserving the divergence of the original fully-connected graph. The preserved divergence can be used as an indicator of the source and sink cells in the observed population. A sparse graph representation of the diffusion process transforms data analysis of the non-linear structure embedded in the high-dimensional space of single-cell data into inspection of the visible flow using graph algorithms. Hence, ddhodge is a suitable toolkit to visualize, inspect, and subsequently interpret large data sets including, but not limited to, high-throughput measurements of biological data.The beta version of ddhodge R package is available at:https://github.com/kazumits/ddhodge

2019 ◽  
Author(s):  
Arnav Moudgil ◽  
Michael N. Wilkinson ◽  
Xuhua Chen ◽  
June He ◽  
Alex J. Cammack ◽  
...  

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.


2019 ◽  
Author(s):  
Shuoguo Wang ◽  
Constance Brett ◽  
Mohan Bolisetty ◽  
Ryan Golhar ◽  
Isaac Neuhaus ◽  
...  

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]


2021 ◽  
Author(s):  
Nathanael Andrews ◽  
Martin Enge

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.


2018 ◽  
Author(s):  
Yijie Wang ◽  
Jan Hoinka ◽  
Teresa M Przytycka

The identification of sub-populations of cells present in a sample and the comparison of such sub-populations across samples are among the most frequently performed analyzes of single-cell data. Current tools for these kinds of data, however, fall short in their ability to adequately perform these tasks. We introduce a novel method, PopCorn (single cell sub-Populations Comparison), allowing for the identification of sub-populations of cells present within individual experiments while simultaneously performing sub-populations mapping across these experiments. PopCorn utilizes several novel algorithmic solutions enabling the execution of these tasks with unprecedented precision. As such, PopCorn provides a much-needed tool for comparative analysis of populations of single cells.


2018 ◽  
Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.


2020 ◽  
Author(s):  
Jian Zhou ◽  
Olga G. Troyanskaya

AbstractScaling single-cell data exploratory analysis with the rapidly growing diversity and quantity of single-cell omics datasets demands more interpretable and robust data representation that is generalizable across datasets. To address this challenge, here we developed a novel ‘quasilinear’ framework that combines the interpretability and transferability of linear methods with the representational power of nonlinear methods. Within this framework, we introduce a data representation and visualization method, GraphDR, and a structure discovery method, StructDR, that unifies cluster, trajectory, and surface estimation and allows their confidence set inference. We applied both methods to diverse single-cell RNA-seq datasets from whole embryos and tissues. Unlike PCA and t-SNE, GraphDR and StructDR generated representations that both distinguished highly specific cell types and were comparable across datasets. In addition, GraphDR is at least an order of magnitude faster than commonly used nonlinear methods. Our visualizations of scRNA-seq data from developing zebrafish and Xenopus embryos revealed extruding branches of lineages from a continuum of cell states, suggesting that the current branch view of cell specification may be oversimplified. Moreover, StructDR identified a novel neuronal population using scRNA-seq data from mouse hippocampus. An open-source python library and a user-friendly graphical interface for 3D data visualization and analysis with these methods are available at https://github.com/jzthree/quasildr.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.


2021 ◽  
Author(s):  
Xianjie Huang ◽  
Yuanhua Huang

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]


2020 ◽  
Author(s):  
HARIPRIYA HARIKUMAR ◽  
Thomas P Quinn ◽  
Santu Rana ◽  
Sunil Gupta ◽  
Svetha Venkatesh

Abstract Background: The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure gene expression in bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient.Methods: We take 2 approaches to benchmarking a “dual-channel” random walk with restart (RWR) for data integration. First, we evaluate how well RWR can predict known gene functions from single-cell gene co-expression networks. Second, we evaluate how well RWR can predict known drug responses from individual cell networks. We then present two exploratory applications. In the first application, we combine the Gene Ontology database with glioblastoma single cells from 5 individual patients to identify genes whose functions differ between cancers. In the second application, we combine the LINCS drug-response database with the same glioblastoma data to identify genes that may exhibit patient-specific drug responses.Conclusions: Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a “dual-channel” method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. Taken together, our work shows promise that single-cell co-expression data could be combined in heterogeneous information networks to facilitate precision medicine.


Author(s):  
Mohammad Lotfollahi ◽  
Mohsen Naghipourfar ◽  
Malte D. Luecken ◽  
Matin Khajavi ◽  
Maren Büttner ◽  
...  

AbstractLarge single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.


Sign in / Sign up

Export Citation Format

Share Document