Continuous visualization of differences between biological conditions in single-cell data

Mapping Intimacies ◽

10.1101/337485 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tyler J. Burns ◽

Garry P. Nolan ◽

Nikolay Samusik

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Developmental Trajectory ◽

Functional Markers ◽

Mass Cytometry ◽

K Nearest Neighbor ◽

Cell Frequency ◽

Low Dimensional ◽

Marker Shift ◽

Cell Data

In high-dimensional single cell data, comparing changes in functional markers between conditions is typically done across manual or algorithm-derived partitions based on population-defining markers. Visualizations of these partitions is commonly done on low-dimensional embeddings (eg. t-SNE), colored by per-partition changes. Here, we provide an analysis and visualization tool that performs these comparisons across overlapping k-nearest neighbor (KNN) groupings. This allows one to color low-dimensional embeddings by marker changes without hard boundaries imposed by partitioning. We devised an objective optimization of k based on minimizing functional marker KNN imputation error. Proof-of-concept work visualized the exact location of an IL-7 responsive subset in a B cell developmental trajectory on a t-SNE map independent of clustering. Per-condition cell frequency analysis revealed that KNN is sensitive to detecting artifacts due to marker shift, and therefore can also be valuable in a quality control pipeline. Overall, we found that KNN groupings lead to useful multiple condition visualizations and efficiently extract a large amount of information from mass cytometry data. Our software is publicly available through the Bioconductor package Sconify.

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Nature Biotechnology ◽

10.1038/s41587-021-01033-z ◽

2021 ◽

Author(s):

Emma Dann ◽

Neil C. Henderson ◽

Sarah A. Teichmann ◽

Michael D. Morgan ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Differential Abundance ◽

Cell Data

SPRING: a kinetic interface for visualizing high dimensional single-cell expression data

10.1101/090332 ◽

2016 ◽

Cited By ~ 10

Author(s):

Caleb Weinreb ◽

Samuel Wolock ◽

Allon Klein

Keyword(s):

Gene Expression ◽

Single Cell ◽

Nearest Neighbor ◽

High Dimensional ◽

K Nearest Neighbor ◽

Link Type ◽

Cell Gene Expression ◽

Graph Layouts ◽

Cell Expression ◽

Cell Data

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]

SCOUT: Single-cell outlier analysis in cancer

10.1101/2020.03.25.007518 ◽

2020 ◽

Author(s):

Giovana Ravizzoni Onzi ◽

Juliano Luiz Faccioni ◽

Alvaro G. Alvarado ◽

Paula Andreghetto Bracco ◽

Harley I. Kornblum ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Biological Markers ◽

Rna Seq ◽

Outlier Analysis ◽

Mass Cytometry ◽

Wide Range ◽

Cell Data

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.

Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008569 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008569

Author(s):

Andreas Tjärnberg ◽

Omar Mahmood ◽

Christopher A. Jackson ◽

Giuseppe-Antonio Saldi ◽

Kyunghyun Cho ◽

...

Keyword(s):

Objective Function ◽

Single Cell ◽

Missing Values ◽

Nearest Neighbor ◽

Projection Methods ◽

Specific Gene ◽

K Nearest Neighbor ◽

Single Cell Genomics ◽

Nearest Neighbor Graph ◽

And Diffusion

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.

MOFA+: a probabilistic framework for comprehensive integration of structured single-cell data

10.1101/837104 ◽

2019 ◽

Cited By ~ 8

Author(s):

Ricard Argelaguet ◽

Damien Arnol ◽

Danila Bredikhin ◽

Yonatan Deloro ◽

Britta Velten ◽

...

Keyword(s):

Factor Analysis ◽

Single Cell ◽

Cell Fate ◽

Joint Analysis ◽

Experimental Conditions ◽

Joint Modelling ◽

Stochastic Variational Inference ◽

Technological Advances ◽

Low Dimensional ◽

Cell Data

AbstractTechnological advances have enabled the joint analysis of multiple molecular layers at single cell resolution. At the same time, increased experimental throughput has facilitated the study of larger numbers of experimental conditions. While methods for analysing single-cell data that model the resulting structure of either of these dimensions are beginning to emerge, current methods do not account for complex experimental designs that include both multiple views (modalities or assays) and groups (conditions or experiments). Here we present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of structured single cell multi-modal data. MOFA+ builds upon a Bayesian Factor Analysis framework combined with fast GPU-accelerated stochastic variational inference. Similar to existing factor models, MOFA+ allows for interpreting variation in single-cell datasets by pooling information across cells and features to reconstruct a low-dimensional representation of the data. Uniquely, the model supports flexible group-level sparsity constraints that allow joint modelling of variation across multiple groups and views.To illustrate MOFA+, we applied it to single-cell data sets of different scales and designs, demonstrating practical advantages when analyzing datasets with complex group and/or view structure. In a multi-omics analysis of mouse gastrulation this joint modelling reveals coordinated changes between gene expression and epigenetic variation associated with cell fate commitment.

scGAD: single-cell gene associating domain scores for exploratory analysis of scHi-C data

10.1101/2021.10.22.465520 ◽

2021 ◽

Author(s):

Siqi Shen ◽

Ye Zheng ◽

Sunduz Keles

Keyword(s):

Single Cell ◽

Cell Types ◽

Exploratory Analysis ◽

Analysis Tool ◽

Chromatin Conformation ◽

Chromatin Interactions ◽

Gene Level ◽

Low Dimensional ◽

Cell Data ◽

Cell Gene

Quantitative tools are needed to leverage the unprecedented resolution of single-cell high-throughput chromatin conformation (scHi-C) data and to integrate it with other single-cell data modalities. We present single-cell gene associating domain (scGAD) scores as a dimension reduction and exploratory analysis tool for scHi-C data. scGAD enables summarization at the gene level while accounting for inherent gene-level genomic biases. Low-dimensional projections with scGAD capture clustering of cells based on their 3D structures. scGAD enables identifying genes with significant chromatin interactions within and between cell types. We further show that scGAD facilitates the integration of scHi-C data with other single-cell data modalities by enabling its projection onto reference low-dimensional embeddings.

Efficient and precise single-cell reference atlas mapping with Symphony

10.1101/2020.11.18.389189 ◽

2020 ◽

Author(s):

Joyce B. Kang ◽

Aparna Nathan ◽

Nghia Millard ◽

Laurie Rumker ◽

D. Branch Moody ◽

...

Keyword(s):

Single Cell ◽

Fetal Liver ◽

Surface Protein ◽

Cell Types ◽

Developmental Trajectory ◽

Model Framework ◽

Pancreatic Cell ◽

Linear Mixture Model ◽

Sequencing Platforms ◽

Low Dimensional

AbstractRecent advances in single-cell technologies and integration algorithms make it possible to construct large, comprehensive reference atlases from multiple datasets encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map new query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony, a novel algorithm for building compressed, integrated reference atlases of ≥106 cells and enabling efficient query mapping within seconds. Based on a linear mixture model framework, Symphony precisely localizes query cells within a low-dimensional reference embedding without the need to reintegrate the reference cells, facilitating the downstream transfer of many types of reference-defined annotations to the query cells. We demonstrate the power of Symphony by (1) mapping a query containing multiple levels of experimental design to predict pancreatic cell types in human and mouse, (2) localizing query cells along a smooth developmental trajectory of human fetal liver hematopoiesis, and (3) harnessing a multimodal CITE-seq reference atlas to infer query surface protein expression in memory T cells. Symphony will enable the sharing of comprehensive integrated reference atlases in a convenient, portable format that powers fast, reproducible querying and downstream analyses.

SCIM: Universal Single-Cell Matching with Unpaired Feature Sets

10.1101/2020.06.11.146845 ◽

2020 ◽

Author(s):

Stefan G. Stark ◽

Joanna Ficek ◽

Francesco Locatello ◽

Ximena Bonilla ◽

Stéphane Chevrier ◽

...

Keyword(s):

Single Cell ◽

Underlying Structure ◽

Scalable Algorithms ◽

Melanoma Tumor ◽

Bipartite Matching ◽

Technological Advances ◽

Latent Space ◽

Latent Representations ◽

Low Dimensional ◽

Cell Data

AbstractMotivationRecent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed.ResultsWe propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an auto-encoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 93% and 84% cell-matching accuracy for each one of the samples respectively.Availabilityhttps://github.com/ratschlab/scim

scGAE: topology-preserving dimensionality reduction for single-cell RNA-seq data using graph autoencoder

10.1101/2021.02.16.431357 ◽

2021 ◽

Author(s):

Zixiang Luo ◽

Chenyu Xu ◽

Zhen Zhang ◽

Wenfei Jin

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Topological Structure ◽

Dimensional Space ◽

Simulated Data ◽

Oriented Graph ◽

Developmental Trajectory ◽

Structure Information ◽

Low Dimensional ◽

Cell Graph

ABSTRACTDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.

Meeting the challenges of high-dimensional data analysis in immunology

10.1101/473215 ◽

2018 ◽

Cited By ~ 1

Author(s):

Subarna Palit ◽

Fabian J. Theis ◽

Christina E. Zielinski

Keyword(s):

Data Analysis ◽

Single Cell ◽

Immune Regulation ◽

Cellular Heterogeneity ◽

High Dimensionality ◽

High Dimensional ◽

Mass Cytometry ◽

Analysis Techniques ◽

The Impact ◽

Cell Data

AbstractRecent advances in cytometry have radically altered the fate of single-cell proteomics by allowing a more accurate understanding of complex biological systems. Mass cytometry (CyTOF) provides simultaneous single-cell measurements that are crucial to understand cellular heterogeneity and identify novel cellular subsets. High-dimensional CyTOF data were traditionally analyzed by gating on bivariate dot plots, which are not only laborious given the quadratic increase of complexity with dimension but are also biased through manual gating. This review aims to discuss the impact of new analysis techniques for in-depths insights into the dynamics of immune regulation obtained from static snapshot data and to provide tools to immunologists to address the high dimensionality of their single-cell data.