Exploring dimension-reduced embeddings with Sleepwalk

Mapping Intimacies ◽

10.1101/603589 ◽

2019 ◽

Author(s):

Svetlana Ovchinnikova ◽

Simon Anders

Keyword(s):

Big Data ◽

Dimension Reduction ◽

Single Cell ◽

Single Cells ◽

High Dimensional ◽

Rna Seq ◽

Mouse Cursor ◽

Sample Data ◽

Reduction Methods ◽

Full Power

AbstractDimension-reduction methods, such as t-SNE or UMAP, are widely used when exploring high-dimensional data describing many entities, e.g., RNA-seq data for many single cells. However, dimension reduction is commonly prone to introducing artefacts, and we hence need means to see where a dimension-reduced embedding is a faithful representation of the local neighbourhood and where it is not.We present Sleepwalk, a simple but powerful tool that allows the user to interactively explore an embedding, using colour to depict original or any other distances from all points to the cell under the mouse cursor. We show how this approach not only highlights distortions, but also reveals otherwise hidden characteristics of the data, and how Sleep-walk’s comparative modes help integrate multi-sample data and understand differences between embedding and preprocessing methods. Sleepwalk is a versatile and intuitive tool that unlocks the full power of dimension reduction and will be of value not only in single-cell RNA-seq but also in any other area with matrix-shaped big data.

Download Full-text

Comparative Research of Different Dimension Reduction Methods Combined with RWR Network Smoothing in Single Cell RNA-seq Data

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/495/1/012043 ◽

2020 ◽

Vol 495 ◽

pp. 012043

Author(s):

Xuesong Xiao ◽

Pengchao Ye ◽

Wenbin Ye ◽

Guoli Ji

Keyword(s):

Dimension Reduction ◽

Single Cell ◽

Comparative Research ◽

Rna Seq ◽

Reduction Methods

Download Full-text

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

International Journal of Molecular Sciences ◽

10.3390/ijms21062181 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2181 ◽

Cited By ~ 3

Author(s):

Chao Feng ◽

Shufen Liu ◽

Hao Zhang ◽

Renchu Guan ◽

Dan Li ◽

...

Keyword(s):

Feature Extraction ◽

Dimension Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Sparse Matrix ◽

Component Analysis ◽

Extraction Methods ◽

High Dimensional ◽

Single Cell Rna Sequencing ◽

Reduction Methods

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.

Download Full-text

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124693 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Single Cells ◽

High Dimensional ◽

Expression Data ◽

Rna Seq ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data

Download Full-text

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124735 ◽

2017 ◽

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Embryo Development ◽

Single Cells ◽

High Dimensional ◽

Data Sets ◽

Rna Seq ◽

Expression Levels ◽

Cell Lineages ◽

Cell Data

Measurements of gene expression levels for multiple genes in single cells provide a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the lineage relationship) are not directly evident from the measurement. Classifying cellular states and identifying transitions among those states are challenging due to many factors, including the small number of cells versus the large number of genes collected in the data. In this paper we adapt a classical self-organizing-map approach to single-cell gene expression data, such as those based on qPCR and RNA-seq. In this method (SOMSC), a cellular state map (CSM) is derived and employed to identify cellular states inherited in a population of measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers between the basins provide information on transitions among the cellular states. Consequently, paths of cellular state transitions (e.g. differentiation) and a temporal ordering of the measured single cells are obtained. Applied to a set of synthetic data, two single-cell qPCR data sets and two single-cell RNA-seq data sets for a simulated model of cell differentiation, and systems on the early embryo development, haematopoietic cell lineages, human preimplanation embryo development, and human skeletal muscle myoblasts differentiation, the SOMSC shows good capabilities in identifying cellular states and their transitions in the high-dimensional single-cell data. This approach will have broad applications in studying cell lineages and cellular fate specification.

Download Full-text

A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data

Frontiers in Genetics ◽

10.3389/fgene.2021.646936 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ruizhi Xiang ◽

Wencan Wang ◽

Lei Yang ◽

Shiyuan Wang ◽

Chaohan Xu ◽

...

Keyword(s):

Dimensionality Reduction ◽

Dimension Reduction ◽

Single Cell ◽

High Throughput Sequencing ◽

Cellular Heterogeneity ◽

Specific Situation ◽

Rna Seq ◽

Reduction Methods ◽

The Stability ◽

Downstream Analysis

Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology performed at the level of an individual cell, which can have a potential to understand cellular heterogeneity. However, scRNA-seq data are high-dimensional, noisy, and sparse data. Dimension reduction is an important step in downstream analysis of scRNA-seq. Therefore, several dimension reduction methods have been developed. We developed a strategy to evaluate the stability, accuracy, and computing cost of 10 dimensionality reduction methods using 30 simulation datasets and five real datasets. Additionally, we investigated the sensitivity of all the methods to hyperparameter tuning and gave users appropriate suggestions. We found that t-distributed stochastic neighbor embedding (t-SNE) yielded the best overall performance with the highest accuracy and computing cost. Meanwhile, uniform manifold approximation and projection (UMAP) exhibited the highest stability, as well as moderate accuracy and the second highest computing cost. UMAP well preserves the original cohesion and separation of cell populations. In addition, it is worth noting that users need to set the hyperparameters according to the specific situation before using the dimensionality reduction methods based on non-linear model and neural network.

Download Full-text

SUGAR-seq enables simultaneous detection of glycans, epitopes, and the transcriptome in single cells

Science Advances ◽

10.1126/sciadv.abe3610 ◽

2021 ◽

Vol 7 (8) ◽

pp. eabe3610

Author(s):

Conor J. Kearney ◽

Stephin J. Vervoort ◽

Kelly M. Ramsbottom ◽

Izabela Todorovski ◽

Emily J. Lelliott ◽

...

Keyword(s):

Single Cell ◽

Posttranslational Modification ◽

Cellular Differentiation ◽

Single Cells ◽

Simultaneous Detection ◽

Surface Protein ◽

Rna Seq ◽

Cell Level ◽

Simultaneous Integration ◽

Unique Surface

Multimodal single-cell RNA sequencing enables the precise mapping of transcriptional and phenotypic features of cellular differentiation states but does not allow for simultaneous integration of critical posttranslational modification data. Here, we describe SUrface-protein Glycan And RNA-seq (SUGAR-seq), a method that enables detection and analysis of N-linked glycosylation, extracellular epitopes, and the transcriptome at the single-cell level. Integrated SUGAR-seq and glycoproteome analysis identified tumor-infiltrating T cells with unique surface glycan properties that report their epigenetic and functional state.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

Characterizing Intercellular Communication of Pan-Cancer Reveals SPP1+ Tumor-Associated Macrophage Expanded in Hypoxia and Promoting Cancer Malignancy Through Single-Cell RNA-Seq Data

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.749210 ◽

2021 ◽

Vol 9 ◽

Author(s):

Jinfen Wei ◽

Zixi Chen ◽

Meiling Hu ◽

Ziqing He ◽

Dawei Jiang ◽

...

Keyword(s):

Single Cell ◽

Cancer Cells ◽

Cancer Progression ◽

Epithelial Mesenchymal Transition ◽

Single Cells ◽

Matrix Remodeling ◽

Rna Seq ◽

Mesenchymal Transition ◽

Tumor Associated Macrophage ◽

Cancer Types

Hypoxia is a characteristic of tumor microenvironment (TME) and is a major contributor to tumor progression. Yet, subtype identification of tumor-associated non-malignant cells at single-cell resolution and how they influence cancer progression under hypoxia TME remain largely unexplored. Here, we used RNA-seq data of 424,194 single cells from 108 patients to identify the subtypes of cancer cells, stromal cells, and immune cells; to evaluate their hypoxia score; and also to uncover potential interaction signals between these cells in vivo across six cancer types. We identified SPP1+ tumor-associated macrophage (TAM) subpopulation potentially enhanced epithelial–mesenchymal transition (EMT) by interaction with cancer cells through paracrine pattern. We prioritized SPP1 as a TAM-secreted factor to act on cancer cells and found a significant enhanced migration phenotype and invasion ability in A549 lung cancer cells induced by recombinant protein SPP1. Besides, prognostic analysis indicated that a higher expression of SPP1 was found to be related to worse clinical outcome in six cancer types. SPP1 expression was higher in hypoxia-high macrophages based on single-cell data, which was further validated by an in vitro experiment that SPP1 was upregulated in macrophages under hypoxia-cultured compared with normoxic conditions. Additionally, a differential analysis demonstrated that hypoxia potentially influences extracellular matrix remodeling, glycolysis, and interleukin-10 signal activation in various cancer types. Our work illuminates the clearer underlying mechanism in the intricate interaction between different cell subtypes within hypoxia TME and proposes the guidelines for the development of therapeutic targets specifically for patients with high proportion of SPP1+ TAMs in hypoxic lesions.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

Clustering single cells: a review of approaches on high-and low-depth single-cell RNA-seq data

Briefings in Functional Genomics ◽

10.1093/bfgp/ely001 ◽

2018 ◽

Vol 18 (6) ◽

pp. 434-434 ◽

Cited By ~ 3

Author(s):

Vilas Menon

Keyword(s):

Single Cell ◽

Single Cells ◽

Rna Seq

Download Full-text