Mining single-cell time-series datasets with Time Course Inspector

Abstract Summary Thanks to recent advances in live cell imaging of biosensors, microscopy experiments can generate thousands of single-cell time-series. To identify sub-populations with distinct temporal behaviours that correspond to different cell fates, we developed Time Course Inspector (TCI)—a unique tool written in R/Shiny to combine time-series analysis with clustering. With TCI it is convenient to inspect time-series, plot different data views and remove outliers. TCI facilitates interactive exploration of various hierarchical clustering and cluster validation methods. We showcase TCI by analysing a single-cell signalling time-series dataset acquired using a fluorescent biosensor. Availability and implementation https://github.com/pertzlab/shiny-timecourse-inspector. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep learning of gene relationships from single cell time-course expression data

10.1101/2020.09.21.306332 ◽

2020 ◽

Author(s):

Ye Yuan ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Deep Learning ◽

Single Cell ◽

Time Course ◽

Expression Profiles ◽

Regulatory Gene ◽

Supplementary Information ◽

Expression Data ◽

Rna Seq ◽

Time Course Data

AbstractMotivationTime-course gene expression data has been widely used to infer regulatory and signaling relationships between genes. Most of the widely used methods for such analysis were developed for bulk expression data. Single cell RNA-Seq (scRNA-Seq) data offers several advantages including the large number of expression profiles available and the ability to focus on individual cells rather than averages. However, this data also raises new computational challenges.ResultsUsing a novel encoding for scRNA-Seq expression data we develop deep learning methods for interaction prediction from time-course data. Our methods use a supervised framework which represents the data as a 3D tensor and train convolutional and recurrent neural networks (CNN and RNN) for predicting interactions. We tested our Time-course Deep Learning (TDL) models on five different time series scRNA-Seq datasets. As we show, TDL can accurately identify causal and regulatory gene-gene interactions and can also be used to assign new function to genes. TDL improves on prior methods for the above tasks and can be generally applied to new time series scRNA-Seq data.Availability and ImplementationFreely available at https://github.com/xiaoyeye/[email protected] informationSupplementary data are available at XXX online.

Download Full-text

A unified approach for sparse dynamical system inference from temporal measurements

Bioinformatics ◽

10.1093/bioinformatics/btz065 ◽

2018 ◽

Vol 35 (18) ◽

pp. 3387-3396 ◽

Cited By ~ 5

Author(s):

Yannis Pantazis ◽

Ioannis Tsamardinos

Keyword(s):

Dynamical System ◽

Dynamical Systems ◽

Single Cell ◽

Time Course ◽

Cell Mass ◽

Supplementary Information ◽

Unified Approach ◽

Unknown Parameters ◽

Inference Problem ◽

Linear Dynamical Systems

Abstract Motivation Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously. Results In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway. Availability and implementation Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

10.1101/567115 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul DW Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Clustering Methods ◽

Link Type ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

AbstractMotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotem-poral ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.AvailabilityAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/[email protected] informationSupplementary materials are available.

Download Full-text

V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa128 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3582-3584

Author(s):

Nathan Lawlor ◽

Eladio J Marquez ◽

Donghyung Lee ◽

Duygu Ucar

Keyword(s):

Single Cell ◽

Gene Annotation ◽

Supplementary Information ◽

Surrogate Variable Analysis ◽

Batch Correction ◽

Surrogate Variable ◽

R Shiny ◽

Sources Of Variation ◽

Shiny Application ◽

Variable Analysis

Abstract Summary Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including ‘unwanted’ variation that needs to be removed in downstream analyses (e.g. batch effects) and ‘wanted’ or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying ‘wanted’ variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. Availability and implementation The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Continuous State HMMs for Modeling Time Series Single Cell RNA-Seq Data

10.1101/380568 ◽

2018 ◽

Author(s):

Chieh Lin ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Single Cell ◽

Developmental Process ◽

Developmental Trajectories ◽

Cell Types ◽

Supplementary Information ◽

Rna Seq ◽

Inference Algorithms ◽

Continuous State ◽

Efficient Learning

AbstractMotivationMethods for reconstructing developmental trajectories from time series single cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods, are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy.ResultsWe developed a new method based on continuous state HMMs (CSHMMs) for representing and modeling time series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single cell datasets we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types.AvailabilitySoftware and Supporting website: www.andrew.cmu.edu/user/chiehll/CSHMM/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

Bioinformatics ◽

10.1093/bioinformatics/btz778 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul D W Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Rna Seq ◽

Clustering Methods ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

Abstract Motivation Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. Results The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation.We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. Availability An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. Supplementary Information Supplementary data are available at Bioinformatics online.

Download Full-text

Continuous-state HMMs for modeling time-series single-cell RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btz296 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4707-4715 ◽

Cited By ~ 10

Author(s):

Chieh Lin ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Single Cell ◽

Developmental Process ◽

Developmental Trajectories ◽

Cell Types ◽

Supplementary Information ◽

Rna Seq ◽

Inference Algorithms ◽

Continuous State ◽

Efficient Learning

Abstract Motivation Methods for reconstructing developmental trajectories from time-series single-cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy. Results We developed a new method based on continuous-state HMMs (CSHMMs) for representing and modeling time-series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single-cell datasets, we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types. Availability and implementation Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

0027 - Analyses of bacterioplankton single cell amplified genomes (SAG) from the model Hawaii Ocean Time-series (HOT) site, Station ALOHA

10.26226/morressier.5b5199bdb1b87b000ecee20b ◽

2018 ◽

Author(s):

Dominique Boeuf ◽

Paul Den Uyl

Keyword(s):

Time Series ◽

Single Cell ◽

Station Aloha

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Using single-cell cytometry to illustrate integrated multi-perspective evaluation of clustering algorithms using Pareto fronts

Bioinformatics ◽

10.1093/bioinformatics/btab038 ◽

2021 ◽

Author(s):

Givanna H Putri ◽

Irena Koprinska ◽

Thomas M Ashhurst ◽

Nicholas J C King ◽

Mark N Read

Keyword(s):

Single Cell ◽

Performance Metrics ◽

Clustering Algorithms ◽

Latin Hypercube Sampling ◽

Supplementary Information ◽

Sequencing Data ◽

Evaluation Protocol ◽

Benchmark Datasets ◽

Pareto Fronts ◽

Parameter Values

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text