Laplacian eigenmaps and principal curves for high resolution pseudotemporal ordering of single-cell RNA-seq profiles

Advances in RNA-seq technologies provide unprecedented insight into the variability and heterogeneity of gene expression at the single-cell level. However, such data offers only a snapshot of the transcriptome, whereas it is often the progression of cells through dynamic biological processes that is of interest. As a result, one outstanding challenge is to infer such progressions by ordering gene expression from single cell data alone, known as the cell ordering problem. Here, we introduce a new method that constructs a low-dimensional non-linear embedding of the data using laplacian eigenmaps before assigning each cell a pseudotime using principal curves. We characterise why on a theoretical level our method is more robust to the high levels of noise typical of single-cell RNA-seq data before demonstrating its utility on two existing datasets of differentiating cells.

Download Full-text

Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics Alignment and Integration

10.1101/2021.12.12.472268 ◽

2021 ◽

Author(s):

Xuesong Wang ◽

Zhihang Hu ◽

Tingyang Yu ◽

Ruijie Wang ◽

Yumeng Wei ◽

...

Keyword(s):

Single Cell ◽

Simulated Data ◽

Dimensional Manifold ◽

Omics Data ◽

Rna Seq ◽

High Dimensions ◽

Low Dimensional ◽

Low Dimensional Manifold ◽

Cell Data ◽

Insight Into

Muilti-modality data are ubiquitous in biology, especially that we have entered the multi-omics era, when we can measure the same biological object (cell) from different aspects (omics) to provide a more comprehensive insight into the cellular system. When dealing with such multi-omics data, the first step is to determine the correspondence among different modalities. In other words, we should match data from different spaces corresponding to the same object. This problem is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Secondly, matched single-cell multi-omics data are rare and hard to collect. Furthermore, due to the limitations of the experimental environment, the data are usually highly noisy. To promote the single-cell multi-omics research, we overcome the above challenges, proposing a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Our approach can efficiently map the above data with high sparsity and noise from different spaces to a low-dimensional manifold in a unified space, making the downstream alignment and integration straightforward. Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data. The proposed method is helpful for the single-cell multi-omics research. The improvement for integration on the simulated data is significant.

Download Full-text

A novel single-cell based method for breast cancer prognosis

10.1101/2020.04.26.062794 ◽

2020 ◽

Author(s):

Xiaomei Li ◽

Lin Liu ◽

Greg Goodall ◽

Andreas Schreiber ◽

Taosheng Xu ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Breast Cancer Prognosis ◽

Cancer Prognosis ◽

Biological Processes ◽

Expression Data ◽

Rna Seq ◽

Novel Method

AbstractBreast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels. In this paper, we propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, we use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.Author summaryVarious computational methods have been developed for breast cancer prognosis. However, those methods mainly use the gene expression data generated by the bulk RNA sequencing techniques, which average the expression level of a gene across different cell types. As breast cancer is a heterogenous disease, the bulk gene expression may not be the ideal resource for cancer prognosis. In this study, we propose a novel method to improve breast cancer prognosis using scRNA-seq data. The proposed method has been applied to the EMT scRNA-seq dataset for identifying breast cancer signatures for prognosis. In comparison with existing bulk expression data based methods in breast cancer prognosis, our method shows a better performance. Our single-cell-based signatures provide clues to the relation between EMT and clinical outcomes of breast cancer. In addition, the proposed method can also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.

Download Full-text

Meta-Transcriptome Detector (MTD): a novel pipeline for metatranscriptome analysis of bulk and single-cell RNAseq data

10.1101/2021.11.16.468881 ◽

2021 ◽

Author(s):

Fei Wu ◽

Yaozhong Liu ◽

Binhua Ling

Keyword(s):

Gene Expression ◽

Single Cell ◽

Host Cells ◽

Host Responses ◽

Rna Seq ◽

Cell Level ◽

Gene Expressions ◽

Rnaseq Data ◽

Software Program ◽

User Friendly

RNA-seq data contains not only host transcriptomes but also non-host information that comprises transcripts from active microbiota in the host cells. Therefore, metatranscriptomics can reveal gene expression of the entire microbial community in a given sample. However, there is no single tool that can simultaneously analyze host-microbiota interactions and to quantify microbiome at the single-cell level, particularly for users with limited expertise of bioinformatics. Here, we developed a novel software program that can comprehensively and synergistically analyze gene expression of the host and microbiome as well as their association using bulk and single-cell RNA-seq data. Our pipeline, named Meta-Transcriptome Detector (MTD), can identify and quantify microbiome extensively, including viruses, bacteria, protozoa, fungi, plasmids, and vectors. MTD is easy to install and is user-friendly. This novel software program empowers researchers to study the interactions between microbiota and the host by analyzing gene expressions and pathways, which provides further insights into host responses to microorganisms.

Download Full-text

Biological Process Activity Transformation of Single Cell Gene Expression for Cross-Species Alignment

10.1101/555268 ◽

2019 ◽

Author(s):

Hongxu Ding ◽

Andrew Blair ◽

Ying Yang ◽

Joshua M. Stuart

Keyword(s):

Gene Expression ◽

Single Cell ◽

Biological Process ◽

Biological Processes ◽

Rna Seq ◽

Gene Set ◽

Cell Gene Expression ◽

Cell Gene

ABSTRACTThe maintenance and transition of cellular states are controlled by biological processes. Here we present a gene set-based transformation of single cell RNA-Seq data into biological process activities that provides a robust description of cellular states. Moreover, as these activities represent species-independent descriptors, they facilitate the alignment of single cell states across different organisms.

Download Full-text

A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

10.1101/2021.07.10.451910 ◽

2021 ◽

Author(s):

Wenpin Hou ◽

Zhicheng Ji ◽

Zeyu Chen ◽

E John Wherry ◽

Stephanie C Hicks ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Biological Processes ◽

Rna Seq ◽

Experimental Conditions ◽

Computational Framework ◽

Statistical Framework ◽

Gene Regulatory ◽

Multiple Samples ◽

False Discoveries

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns with multiple samples (or replicates) across different experimental conditions are lacking. Lamian is a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. It can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions, and also to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for cross-sample variability and hence substantially reduces sample-specific false discoveries that are not generalizable to new samples. Using both simulations and real scRNA-seq data, including an analysis of differential immune response programs between COVID-19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.

Download Full-text

Single Cell Transcriptome-Based Dissection of Lineage Fate Decisions in Myelopoiesis

Blood ◽

10.1182/blood.v124.21.1395.1395 ◽

2014 ◽

Vol 124 (21) ◽

pp. 1395-1395

Author(s):

Andre Olsson ◽

H. Leighton Grimes ◽

Virendra K Chaudhri ◽

Philip Dexheimer ◽

Bruce J Aronow ◽

...

Keyword(s):

Single Cell ◽

Cell Fate ◽

Molecular Mechanisms ◽

Conflicts Of Interest ◽

Expression Patterns ◽

Rna Seq ◽

Cell Transcriptome ◽

Single Cell Transcriptome ◽

Cell Data ◽

Insight Into

Abstract In spite of tremendous advances in the analysis of hematopoietic progenitors and transcription factors that give rise to different lineages, molecular insight into the mechanisms that underlie cell fate choice at the level of individual cells is lacking. We utilized single-cell RNA sequencing of murine granulocyte-monocyte progenitors (GMPs) to analyze the molecular basis of cell fate choice. Over 200 libraries were generated with average read depths of 4 million per library and an expressed gene call of over 3,800 genes with FPKM >3. Our data reveal a varied but coherent spectrum of gene expression patterns in individual murine GMPs. The majority of cells could be clustered into ones expressing either granulocytic or monocytic genes, suggesting that they were primed for lineage determination. A minority of GMPs expressed a mixed-lineage pattern of genes. The single-cell data suggested an antagonistic transcription factor circuit involving Gfi1 and IRF8 that was validated with both loss- and gain-of-function experiments in GMPs. Our data highlight the utility of single cell RNA-Seq analysis to reveal molecular mechanisms controlling lineage fate decisions in hematopoiesis. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

10.1101/052225 ◽

2016 ◽

Cited By ~ 7

Author(s):

Bo Wang ◽

Junjie Zhu ◽

Emma Pierson ◽

Daniele Ramazzotti ◽

Serafim Batzoglou

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Cell Populations ◽

Gene Expression Measurement ◽

Data Sets ◽

Similarity Learning ◽

Rna Seq ◽

High Level ◽

Cell Data

AbstractSingle-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. Here, we propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications. Benchmarking against state-of-the-art methods for these applications, we used SIMLR to re-analyse seven representative single-cell data sets, including high-throughput droplet-based data sets with tens of thousands of cells. We show that SIMLR greatly improves clustering sensitivity and accuracy, as well as the visualization and interpretability of the data.

Download Full-text

Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa198 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4291-4295

Author(s):

Philipp Angerer ◽

David S Fischer ◽

Fabian J Theis ◽

Antonio Scialdone ◽

Carsten Marr

Keyword(s):

Single Cell ◽

Principal Component ◽

R Package ◽

Ease Of Use ◽

Supplementary Information ◽

Automatic Identification ◽

Biological Processes ◽

Rna Seq ◽

Sequencing Data ◽

Low Dimensional

Abstract Motivation Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single-cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it difficult to characterize the underlying biological processes. Results In this article, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined sub-region. We apply our method to single-cell RNA-seq datasets from different experimental protocols and to different low-dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes. Availability and implementation To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions

10.1101/124693 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tao Peng ◽

Qing Nie

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Single Cells ◽

High Dimensional ◽

Expression Data ◽

Rna Seq ◽

Cell Gene Expression ◽

Cell Data ◽

Cell Gene

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data

Download Full-text

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009118 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009118

Author(s):

Jing Qi ◽

Yang Zhou ◽

Zicen Zhao ◽

Shuilin Jin

Keyword(s):

Gene Expression ◽

Single Cell ◽

Differential Expression Analysis ◽

Cell Types ◽

Rna Seq ◽

Cell Level ◽

Gene Level ◽

Level Information ◽

Downstream Analysis ◽

Gene Expression Levels

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.

Download Full-text