Gene regulatory network inference from single-cell data using multivariate information measures

AbstractWhile single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell data sets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available here:https://github.com/Tchanders/network_inference_tutorials. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data.

Download Full-text

WASABI: a dynamic iterative framework for gene regulatory network inference

10.1101/292128 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arnaud Bonnaffoux ◽

Ulysse Herbach ◽

Angélique Richard ◽

Anissa Guillemin ◽

Sandrine Giraud ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Regulatory Network ◽

Regulatory Network ◽

Network Inference ◽

Molecular Mechanisms ◽

Gene Regulatory Network Inference ◽

Piecewise Deterministic Markov Processes ◽

Gene Regulatory ◽

Cell Data

AbstractInference of gene regulatory networks from gene expression data has been a long-standing and notoriously difficult task in systems biology. Recently, single-cell transcriptomic data have been massively used for gene regulatory network inference, with both successes and limitations. In the present work we propose an iterative algorithm called WASABI, dedicated to inferring a causal dynamical network from time-stamped single-cell data, which tackles some of the limitations associated with current approaches. We first introduce the concept of waves, which posits that the information provided by an external stimulus will affect genes one-by-one through a cascade, like waves spreading through a network. This concept allows us to infer the network one gene at a time, after genes have been ordered regarding their time of regulation. We then demonstrate the ability of WASABI to correctly infer small networks, which have been simulated in silico using a mechanistic model consisting of coupled piecewise-deterministic Markov processes for the proper description of gene expression at the single-cell level. We finally apply WASABI on in vitro generated data on an avian model of erythroid differentiation. The structure of the resulting gene regulatory network sheds a fascinating new light on the molecular mechanisms controlling this process. In particular, we find no evidence for hub genes and a much more distributed network structure than expected. Interestingly, we find that a majority of genes are under the direct control of the differentiation-inducing stimulus. In conclusion, WASABI is a versatile algorithm which should help biologists to fully exploit the power of time-stamped single-cell data.

Download Full-text

Robust Lineage Reconstruction from High-Dimensional Single-Cell Data

10.1101/036533 ◽

2016 ◽

Author(s):

Gregory Giecold ◽

Eugenio Marco ◽

Lorenzo Trippa ◽

Guo-Cheng Yuan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Quantitative Estimate ◽

Cell Lineage ◽

Computational Method ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

Single-cell gene expression data provide invaluable resources for systematic characterization of cellular hierarchy in multi-cellular organisms. However, cell lineage reconstruction is still often associated with significant uncertainty due to technological constraints. Such uncertainties have not been taken into account in current methods. We present ECLAIR, a novel computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.

Download Full-text

Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks

BMC Systems Biology ◽

10.1186/s12918-018-0549-y ◽

2018 ◽

Vol 12 (S3) ◽

Cited By ~ 6

Author(s):

Alireza Karbalayghareh ◽

Ulisses Braga-Neto ◽

Edward R. Dougherty

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

Download Full-text

Scanpy for analysis of large-scale single-cell gene expression data

10.1101/174029 ◽

2017 ◽

Cited By ~ 9

Author(s):

F. Alexander Wolf ◽

Philipp Angerer ◽

Fabian J. Theis

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Large Scale ◽

Expression Data ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

We present Scanpy, a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The Python-based implementation efficiently deals with datasets of more than one million cells and enables easy interfacing of advanced machine learning packages. Code is available fromhttps://github.com/theislab/scanpy.

Download Full-text

Intrinsically Bayesian Robust Classifier for Single-Cell Gene Expression Time Series in Gene Regulatory Networks

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics - ACM-BCB '17 ◽

10.1145/3107411.3110408 ◽

2017 ◽

Author(s):

Alireza Karbalayghareh ◽

Ulisses Braga-Neto ◽

Edward R. Dougherty

Keyword(s):

Gene Expression ◽

Time Series ◽

Single Cell ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Gene Expression Time Series ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene ◽

Expression Time

Download Full-text

Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data

BMC Bioinformatics ◽

10.1186/s12859-018-2217-z ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 59

Author(s):

Shuonan Chen ◽

Jessica C. Mar

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Expression Data ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Evaluating Methods ◽

Cell Gene

Download Full-text

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124693 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Single Cells ◽

High Dimensional ◽

Expression Data ◽

Rna Seq ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data

Download Full-text

Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data

10.1101/642926 ◽

2019 ◽

Cited By ~ 5

Author(s):

Aditya Pratapa ◽

Amogh P. Jalihal ◽

Jeffrey N. Law ◽

Aditya Bharadwaj ◽

T. M. Murali

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Expression Data ◽

Boolean Models ◽

Transcriptomic Data ◽

Inference Algorithms ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

AbstractWe present a comprehensive evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell gene expression data. We develop a systematic framework called BEELINE for this purpose. We use synthetic networks with predictable cellular trajectories as well as curated Boolean models to serve as the ground truth for evaluating the accuracy of GRN inference algorithms. We develop a strategy to simulate single-cell gene expression data from these two types of networks that avoids the pitfalls of previously-used methods. We selected 12 representative GRN inference algorithms. We found that the accuracy of these methods (measured in terms of AUROC and AUPRC) was moderate, by and large, although the methods were better in recovering interactions in the synthetic networks than the Boolean models. Techniques that did not require pseudotime-ordered cells were more accurate, in general. The observation that the endpoints of many false positive edges were connected by paths of length two in the Boolean models suggested that indirect effects may be predominant in the outputs of the algorithms we tested. The predicted networks were considerably inconsistent with each other, indicating that combining GRN inference algorithms using ensembles is likely to be challenging. Based on the results, we present some recommendations to users of GRN inference algorithms, including suggestions on how to create simulated gene expression datasets for testing them. BEELINE, which is available at http://github.com/murali-group/BEELINE under an open-source license, will aid in the future development of GRN inference algorithms for single-cell transcriptomic data.

Download Full-text

BGP: Branched Gaussian processes for identifying gene-specific branching dynamics in single cell data

10.1101/166868 ◽

2017 ◽

Cited By ~ 3

Author(s):

Alexis Boukouvalas ◽

James Hensman ◽

Magnus Rattray

Keyword(s):

Gene Expression ◽

Single Cell ◽

Prior Information ◽

Synthetic Data ◽

Parametric Model ◽

Credible Region ◽

Cell Gene Expression ◽

Probabilistic Nature ◽

Cell Data ◽

Cell Gene

AbstractHigh-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through use of pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provides an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on both synthetic data and a published single-cell gene expression hematopoiesis study. The method requires prior information about pseudotime and global cellular branching for each cell but the probabilistic nature of the method means that it is robust to errors in these global branch labels and can be used to discover early branching genes which diverge before the inferred global cell branching. The code is open-source and available at https://github.com/ManchesterBioinference/BranchedGP.

Download Full-text