SPRING: a kinetic interface for visualizing high dimensional single-cell expression data

Mapping Intimacies ◽

10.1101/090332 ◽

2016 ◽

Cited By ~ 10

Author(s):

Caleb Weinreb ◽

Samuel Wolock ◽

Allon Klein

Keyword(s):

Gene Expression ◽

Single Cell ◽

Nearest Neighbor ◽

High Dimensional ◽

K Nearest Neighbor ◽

Link Type ◽

Cell Gene Expression ◽

Graph Layouts ◽

Cell Expression ◽

Cell Data

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124693 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Single Cells ◽

High Dimensional ◽

Expression Data ◽

Rna Seq ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Nature Biotechnology ◽

10.1038/s41587-021-01033-z ◽

2021 ◽

Author(s):

Emma Dann ◽

Neil C. Henderson ◽

Sarah A. Teichmann ◽

Michael D. Morgan ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Differential Abundance ◽

Cell Data

Continuous visualization of differences between biological conditions in single-cell data

10.1101/337485 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tyler J. Burns ◽

Garry P. Nolan ◽

Nikolay Samusik

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Developmental Trajectory ◽

Functional Markers ◽

Mass Cytometry ◽

K Nearest Neighbor ◽

Cell Frequency ◽

Low Dimensional ◽

Marker Shift ◽

Cell Data

In high-dimensional single cell data, comparing changes in functional markers between conditions is typically done across manual or algorithm-derived partitions based on population-defining markers. Visualizations of these partitions is commonly done on low-dimensional embeddings (eg. t-SNE), colored by per-partition changes. Here, we provide an analysis and visualization tool that performs these comparisons across overlapping k-nearest neighbor (KNN) groupings. This allows one to color low-dimensional embeddings by marker changes without hard boundaries imposed by partitioning. We devised an objective optimization of k based on minimizing functional marker KNN imputation error. Proof-of-concept work visualized the exact location of an IL-7 responsive subset in a B cell developmental trajectory on a t-SNE map independent of clustering. Per-condition cell frequency analysis revealed that KNN is sensitive to detecting artifacts due to marker shift, and therefore can also be valuable in a quality control pipeline. Overall, we found that KNN groupings lead to useful multiple condition visualizations and efficiently extract a large amount of information from mass cytometry data. Our software is publicly available through the Bioconductor package Sconify.

Robust Lineage Reconstruction from High-Dimensional Single-Cell Data

10.1101/036533 ◽

2016 ◽

Author(s):

Gregory Giecold ◽

Eugenio Marco ◽

Lorenzo Trippa ◽

Guo-Cheng Yuan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Quantitative Estimate ◽

Cell Lineage ◽

Computational Method ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

Single-cell gene expression data provide invaluable resources for systematic characterization of cellular hierarchy in multi-cellular organisms. However, cell lineage reconstruction is still often associated with significant uncertainty due to technological constraints. Such uncertainties have not been taken into account in current methods. We present ECLAIR, a novel computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.

destiny – diffusion maps for large-scale single-cell data in R

10.1101/023309 ◽

2015 ◽

Cited By ~ 6

Author(s):

Philipp Angerer ◽

Laleh Haghverdi ◽

Maren Büttner ◽

Fabian J. Theis ◽

Carsten Marr ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cellular Reprogramming ◽

Noise Model ◽

Diffusion Maps ◽

Time Resolved ◽

Describing Functions ◽

Link Type ◽

Cell Expression ◽

Cell Data

ABSTRACTSummaryDiffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming.Availability and implementationdestiny is an open-source R/Bioconductor package http://bioconductor.org/packages/ destiny also available at https://www.helmholtz-muenchen.de/icb/destiny. A detailed vignette describing functions and workflows is provided with the [email protected], [email protected]

UCSC Cell Browser: Visualize Your Single-Cell Data

10.1101/2020.10.30.361162 ◽

2020 ◽

Cited By ~ 1

Author(s):

Matthew L Speir ◽

Aparna Bhaduri ◽

Nikolay S Markov ◽

Pablo Moreno ◽

Tomasz J Nowakowski ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Link Type ◽

Multiple Datasets ◽

Cell Technologies ◽

Metadata Annotation ◽

Cell Data ◽

Python Package

AbstractSummaryAs the use of single-cell technologies has grown, so has the need for tools to explore these large, complicated datasets. The UCSC Cell Browser is a tool that allows scientists to visualize gene expression and metadata annotation distribution throughout a single-cell dataset or multiple datasets.Availability and implementationWe provide the UCSC Cell Browser as a free website where users can explore a growing collection of single-cell datasets and a freely available python package for scientists to create stable, self-contained visualizations for their own single-cell datasets. Learn more at https://[email protected]

Gene regulatory network inference from single-cell data using multivariate information measures

10.1101/082099 ◽

2016 ◽

Cited By ~ 4

Author(s):

Thalia E. Chan ◽

Michael P.H. Stumpf ◽

Ann C. Babtie

Keyword(s):

Gene Expression ◽

Information Theory ◽

Single Cell ◽

Network Inference ◽

Gene Regulatory Network Inference ◽

Functional Relationships ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Data ◽

Cell Gene

AbstractWhile single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell data sets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available here:https://github.com/Tchanders/network_inference_tutorials. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data.

scAMACE: Model-based approach to the joint analysis of single-cell data on chromatin accessibility, gene expression and methylation

10.1101/2021.03.29.437485 ◽

2021 ◽

Author(s):

Jiaxuan Wangwu ◽

Zexuan Sun ◽

Zhixiang Lin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Chromatin Accessibility ◽

Integrative Analysis ◽

Joint Analysis ◽

Data Types ◽

Link Type ◽

Complex Biological Process ◽

Cell Data

AbstractThe advancement in technologies and the growth of available single-cell datasets motivate integrative analysis of multiple single-cell genomic datasets. Integrative analysis of multimodal single-cell datasets combines complementary information offered by single-omic datasets and can offer deeper insights on complex biological process. Clustering methods that identify the unknown cell types are among the first few steps in the analysis of single-cell datasets, and they are important for downstream analysis built upon the identified cell types. We propose scAMACE for the integrative analysis and clustering of single-cell data on chromatin accessibility, gene expression and methylation. We demonstrate that cell types are better identified and characterized through analyzing the three data types jointly. We develop an efficient expectation-maximization (EM) algorithm to perform statistical inference, and evaluate our methods on both simulation study and real data applications. We also provide the GPU implementation of scAMACE, making it scalable to large datasets. The software and datasets are available at https://github.com/cuhklinlab/scAMACE_py (pythom implementation) and https://github.com/cuhklinlab/scAMACE (R implementation).

BGP: Branched Gaussian processes for identifying gene-specific branching dynamics in single cell data

10.1101/166868 ◽

2017 ◽

Cited By ~ 3

Author(s):

Alexis Boukouvalas ◽

James Hensman ◽

Magnus Rattray

Keyword(s):

Gene Expression ◽

Single Cell ◽

Prior Information ◽

Synthetic Data ◽

Parametric Model ◽

Credible Region ◽

Cell Gene Expression ◽

Probabilistic Nature ◽

Cell Data ◽

Cell Gene

AbstractHigh-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through use of pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provides an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on both synthetic data and a published single-cell gene expression hematopoiesis study. The method requires prior information about pseudotime and global cellular branching for each cell but the probabilistic nature of the method means that it is robust to errors in these global branch labels and can be used to discover early branching genes which diverge before the inferred global cell branching. The code is open-source and available at https://github.com/ManchesterBioinference/BranchedGP.

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124735 ◽

2017 ◽

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Embryo Development ◽

Single Cells ◽

High Dimensional ◽

Data Sets ◽

Rna Seq ◽

Expression Levels ◽

Cell Lineages ◽

Cell Data

Measurements of gene expression levels for multiple genes in single cells provide a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the lineage relationship) are not directly evident from the measurement. Classifying cellular states and identifying transitions among those states are challenging due to many factors, including the small number of cells versus the large number of genes collected in the data. In this paper we adapt a classical self-organizing-map approach to single-cell gene expression data, such as those based on qPCR and RNA-seq. In this method (SOMSC), a cellular state map (CSM) is derived and employed to identify cellular states inherited in a population of measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers between the basins provide information on transitions among the cellular states. Consequently, paths of cellular state transitions (e.g. differentiation) and a temporal ordering of the measured single cells are obtained. Applied to a set of synthetic data, two single-cell qPCR data sets and two single-cell RNA-seq data sets for a simulated model of cell differentiation, and systems on the early embryo development, haematopoietic cell lineages, human preimplanation embryo development, and human skeletal muscle myoblasts differentiation, the SOMSC shows good capabilities in identifying cellular states and their transitions in the high-dimensional single-cell data. This approach will have broad applications in studying cell lineages and cellular fate specification.