DTFLOW: Inference and Visualization of Single-cell Pseudo-temporal Trajectories Using Diffusion Propagation

ABSTRACTOne of the major challenges in single-cell data analysis is the determination of cellular developmental trajectories using single-cell data. Although substantial studies have been conducted in recent years, more effective methods are still strongly needed to infer the developmental processes accurately. In this work we devise a new method, named DTFLOW, for determining the pseudo-temporal trajectories with multiple branches. This method consists of two major steps: namely a new dimension reduction method (i.e. Bhattacharyya kernel feature decomposition (BKFD)) and a novel approach, named Reverse Searching on kNN Graph (RSKG), to identify the underlying multi-branching processes of cellular differentiations. In BKFD we first establish a stationary distribution for each cell to represent the transition of cellular developmental states based on the random walk with restart algorithm and then propose a new distance metric for calculating pseudo-times of single-cells by introducing the Bhattacharyya kernel matrix. The effectiveness of DTFLOW is rigorously examined by using four single-cell datasets. We compare the efficiency of the new method with two state-of-the-art methods. Simulation results suggest that our proposed method has superior accuracy and strong robustness properties for constructing pseudo-time trajectories. Availability: DTFLOW is implemented in Python and available at https://github.com/statway/DTFLOW.

Download Full-text

Integration of a microfluidic chip with a size-based cell bandpass filter for reliable isolation of single cells

Lab on a Chip ◽

10.1039/c5lc00904a ◽

2015 ◽

Vol 15 (21) ◽

pp. 4128-4132 ◽

Cited By ~ 22

Author(s):

Hojin Kim ◽

Sanghyun Lee ◽

Jae-hyung Lee ◽

Joonwon Kim

Keyword(s):

Single Cell ◽

Microfluidic Chip ◽

Bandpass Filter ◽

Single Cells ◽

Cell Array ◽

Novel Approach ◽

Reliable Isolation

A novel approach for reliable arraying of single cells is presented using a size-based cell bandpass filter integrated with a microfluidic single-cell array chip.

Download Full-text

A highly-occupied, single-cell trapping microarray for determination of cell membrane permeability

Lab on a Chip ◽

10.1039/c7lc00883j ◽

2017 ◽

Vol 17 (23) ◽

pp. 4077-4088 ◽

Cited By ~ 27

Author(s):

Lindong Weng ◽

Felix Ellett ◽

Jon Edd ◽

Keith H. K. Wong ◽

Korkut Uygun ◽

...

Keyword(s):

Cell Membrane ◽

Single Cell ◽

Membrane Permeability ◽

Single Cells ◽

Cell Membrane Permeability ◽

Volumetric Change ◽

Cell Trapping ◽

Single Cell Trapping ◽

Passive Pumping

A passive pumping, single-cell trapping microarray was developed to monitor volumetric change of multiple, single cells following hypertonic exposure.

Download Full-text

Probabilistic inference of bifurcations in single-cell data using a hierarchical mixture of factor analysers

10.1101/076547 ◽

2016 ◽

Author(s):

Kieran R. Campbell ◽

Christopher Yau

Keyword(s):

Single Cell ◽

Probabilistic Inference ◽

Bifurcation Structure ◽

Bayesian Hierarchical ◽

Mcmc Sampling ◽

Hierarchical Prior ◽

Transcriptomics Data ◽

Cell Data ◽

Bifurcation Process

AbstractModelling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analysers. Our model exhibits competitive performance on large datasets despite implementing full MCMC sampling and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process.

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text

CIM-seq

10.21203/rs.3.pex-1365/v1 ◽

2021 ◽

Author(s):

Nathanael Andrews ◽

Martin Enge

Keyword(s):

Single Cell ◽

Single Cells ◽

Likelihood Estimation ◽

Cell Types ◽

Data Sets ◽

Target Tissue ◽

Data Set ◽

Rnaseq Data ◽

The Given ◽

Cell Data

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.

Download Full-text

Accurate sub-population detection and mapping across single cell experiments with PopCorn

10.1101/485979 ◽

2018 ◽

Author(s):

Yijie Wang ◽

Jan Hoinka ◽

Teresa M Przytycka

Keyword(s):

Comparative Analysis ◽

Single Cell ◽

Single Cells ◽

Novel Method ◽

Cell Data

The identification of sub-populations of cells present in a sample and the comparison of such sub-populations across samples are among the most frequently performed analyzes of single-cell data. Current tools for these kinds of data, however, fall short in their ability to adequately perform these tasks. We introduce a novel method, PopCorn (single cell sub-Populations Comparison), allowing for the identification of sub-populations of cells present within individual experiments while simultaneously performing sub-populations mapping across these experiments. PopCorn utilizes several novel algorithmic solutions enabling the execution of these tasks with unprecedented precision. As such, PopCorn provides a much-needed tool for comparative analysis of populations of single cells.

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

10.1101/567115 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul DW Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Clustering Methods ◽

Link Type ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

AbstractMotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotem-poral ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.AvailabilityAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/[email protected] informationSupplementary materials are available.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

Modeling latent flows on single-cell data using the Hodge decomposition

10.1101/592089 ◽

2019 ◽

Author(s):

Kazumitsu Maehara ◽

Yasuyuki Ohkawa

Keyword(s):

Diffusion Process ◽

Single Cell ◽

Trajectory Analysis ◽

Single Cells ◽

Hodge Decomposition ◽

Biological Data ◽

Graph Representation ◽

Specific Cell ◽

Sparse Graph ◽

Cell Data

AbstractSingle-cell analysis is a powerful technique used to identify a specific cell population of interest during differentiation, aging, or oncogenesis. Individual cells occupy a particular transient state in the cell cycle, circadian rhythm, or during cell death. An appealing concept of pseudo-time trajectory analysis of single-cell RNA sequencing data was proposed in the software Monocle, and several methods of trajectory analysis have since been published to date. These aim to infer the ordering of cells and enable the tracing of gene expression profile trajectories in cell differentiation and reprogramming. However, the methods are restricted in terms of time structure because of the pre-specified structure of trajectories (linear, branched, tree or cyclic) which contrasts with the mixed state of single cells.Here, we propose a technique to extract underlying flows in single-cell data based on the Hodge decomposition (HD). HD is a theorem of vector fields on a manifold which guarantees that any given flow can decompose into three types of orthogonal component: gradient-flow (acyclic), curl-, and harmonic-flow (cyclic). HD is generalized on a simplicial complex (graph) and the discretized HD has only a weak assumption that the graph is directed. Therefore, in principle, HD can extract flows from any mixture of tree and cyclic time flows of observed cells. The decomposed flows provide intuitive interpretations about complex flow because of their linearity and orthogonality. Thus, each extracted flow can be focused on separately with no need to consider crosstalk.We developed ddhodge software, which aims to model the underlying flow structure that implies unobserved time or causal relations in the hodge-podge collection of data points. We demonstrated that the mathematical framework of HD is suitable to reconstruct a sparse graph representation of diffusion process as a candidate model of differentiation while preserving the divergence of the original fully-connected graph. The preserved divergence can be used as an indicator of the source and sink cells in the observed population. A sparse graph representation of the diffusion process transforms data analysis of the non-linear structure embedded in the high-dimensional space of single-cell data into inspection of the visible flow using graph algorithms. Hence, ddhodge is a suitable toolkit to visualize, inspect, and subsequently interpret large data sets including, but not limited to, high-throughput measurements of biological data.The beta version of ddhodge R package is available at:https://github.com/kazumits/ddhodge

Download Full-text