Continuous-state HMMs for modeling time-series single-cell RNA-Seq data

Abstract Motivation Methods for reconstructing developmental trajectories from time-series single-cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy. Results We developed a new method based on continuous-state HMMs (CSHMMs) for representing and modeling time-series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single-cell datasets, we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types. Availability and implementation Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Continuous State HMMs for Modeling Time Series Single Cell RNA-Seq Data

10.1101/380568 ◽

2018 ◽

Author(s):

Chieh Lin ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Single Cell ◽

Developmental Process ◽

Developmental Trajectories ◽

Cell Types ◽

Supplementary Information ◽

Rna Seq ◽

Inference Algorithms ◽

Continuous State ◽

Efficient Learning

AbstractMotivationMethods for reconstructing developmental trajectories from time series single cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods, are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy.ResultsWe developed a new method based on continuous state HMMs (CSHMMs) for representing and modeling time series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single cell datasets we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types.AvailabilitySoftware and Supporting website: www.andrew.cmu.edu/user/chiehll/CSHMM/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa082 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Kaikun Xie ◽

Yu Huang ◽

Feng Zeng ◽

Zehua Liu ◽

Ting Chen

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Trajectories ◽

Cell Types ◽

Random Projection ◽

Good Representation ◽

Rna Seq ◽

Unsupervised Deep Learning ◽

High Level ◽

Computational Resources

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.

Download Full-text

Asc-Seurat – Analytical single-cell Seurat-based web application

10.1101/2021.03.19.436196 ◽

2021 ◽

Author(s):

WJ Pereira ◽

FM Almeida ◽

KM Balmant ◽

DC Rodriguez ◽

PM Triozzi ◽

...

Keyword(s):

Single Cell ◽

Web Application ◽

Iterative Process ◽

Developmental Trajectories ◽

Cell Types ◽

Biological Information ◽

Supplementary Information ◽

Popular Approach ◽

Single Cell Rna Sequencing ◽

User Friendly

AbstractSummarySingle-cell RNA sequencing (scRNA-seq) has become a popular approach for studying the transcriptome, providing a powerful tool for discovering and characterizing cell types and their developmental trajectories. However, scRNA-seq analysis is complex, requiring a continuous, iterative process to refine the data processing and uncover relevant biological information. We present Asc-Seurat, a feature rich workbench, providing a user-friendly and easy-to-install web application encapsulating the necessary tools for an all-encompassing and fluid scRNA-seq data analysis.Availability and implementationAsc-Seurat is available at https://github.com/KirstLab/asc_seurat/ and released under GNU 3 [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

scTIM: seeking cell-type-indicative marker from single cell RNA-seq data by consensus optimization

Bioinformatics ◽

10.1093/bioinformatics/btz936 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2474-2485 ◽

Cited By ~ 2

Author(s):

Zhanying Feng ◽

Xianwen Ren ◽

Yuan Fang ◽

Yining Yin ◽

Chutian Huang ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cell Types ◽

Mouse Cell ◽

Supplementary Information ◽

Rna Seq ◽

Cell Type ◽

Robust Solution ◽

Development Trajectory ◽

Consensus Optimization

Abstract Motivation Single cell RNA-seq data offers us new resource and resolution to study cell type identity and its conversion. However, data analyses are challenging in dealing with noise, sparsity and poor annotation at single cell resolution. Detecting cell-type-indicative markers is promising to help denoising, clustering and cell type annotation. Results We developed a new method, scTIM, to reveal cell-type-indicative markers. scTIM is based on a multi-objective optimization framework to simultaneously maximize gene specificity by considering gene-cell relationship, maximize gene’s ability to reconstruct cell–cell relationship and minimize gene redundancy by considering gene–gene relationship. Furthermore, consensus optimization is introduced for robust solution. Experimental results on three diverse single cell RNA-seq datasets show scTIM’s advantages in identifying cell types (clustering), annotating cell types and reconstructing cell development trajectory. Applying scTIM to the large-scale mouse cell atlas data identifies critical markers for 15 tissues as ‘mouse cell marker atlas’, which allows us to investigate identities of different tissues and subtle cell types within a tissue. scTIM will serve as a useful method for single cell RNA-seq data mining. Availability and implementation scTIM is freely available at https://github.com/Frank-Orwell/scTIM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology

Bioinformatics ◽

10.1093/bioinformatics/btz363 ◽

2019 ◽

Vol 35 (14) ◽

pp. i436-i445 ◽

Cited By ~ 71

Author(s):

Gregor Sturm ◽

Francesca Finotello ◽

Florent Petitprez ◽

Jitao David Zhang ◽

Jan Baumbach ◽

...

Keyword(s):

Single Cell ◽

Computational Methods ◽

Immune Cell ◽

Comprehensive Evaluation ◽

Cell Types ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Cell Type ◽

Real World Datasets

Abstract Motivation The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing. Results We developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures. Availability and implementation A snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep learning of gene relationships from single cell time-course expression data

10.1101/2020.09.21.306332 ◽

2020 ◽

Author(s):

Ye Yuan ◽

Ziv Bar-Joseph

Keyword(s):

Time Series ◽

Deep Learning ◽

Single Cell ◽

Time Course ◽

Expression Profiles ◽

Regulatory Gene ◽

Supplementary Information ◽

Expression Data ◽

Rna Seq ◽

Time Course Data

AbstractMotivationTime-course gene expression data has been widely used to infer regulatory and signaling relationships between genes. Most of the widely used methods for such analysis were developed for bulk expression data. Single cell RNA-Seq (scRNA-Seq) data offers several advantages including the large number of expression profiles available and the ability to focus on individual cells rather than averages. However, this data also raises new computational challenges.ResultsUsing a novel encoding for scRNA-Seq expression data we develop deep learning methods for interaction prediction from time-course data. Our methods use a supervised framework which represents the data as a 3D tensor and train convolutional and recurrent neural networks (CNN and RNN) for predicting interactions. We tested our Time-course Deep Learning (TDL) models on five different time series scRNA-Seq datasets. As we show, TDL can accurately identify causal and regulatory gene-gene interactions and can also be used to assign new function to genes. TDL improves on prior methods for the above tasks and can be generally applied to new time series scRNA-Seq data.Availability and ImplementationFreely available at https://github.com/xiaoyeye/[email protected] informationSupplementary data are available at XXX online.

Download Full-text

Integrating Spatial Transcriptomics and Single-Cell RNA-seq Reveals the Gene Expression Profling of the Human Embryonic Liver

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.652408 ◽

2021 ◽

Vol 9 ◽

Author(s):

Xianliang Hou ◽

Yane Yang ◽

Ping Li ◽

Zhipeng Zeng ◽

Wenlong Hu ◽

...

Keyword(s):

Liver Disease ◽

Liver Regeneration ◽

Single Cell ◽

Developmental Process ◽

Fetal Liver ◽

Cell Types ◽

Developmental Time ◽

Clear Understanding ◽

Rna Seq ◽

Liver Replacement

The liver is one of vital organs of the human body, and it plays an important role in the metabolism and detoxification. Moreover, fetal liver is one of the hematopoietic places during ontogeny. Understanding how this complex organ develops during embryogenesis will yield insights into how functional liver replacement tissue can be engineered and how liver regeneration can be promoted. Here, we combine the advantages of single-cell RNA sequencing and Spatial Transcriptomics (ST) technology for unbiased analysis of fetal livers over developmental time from 8 post-conception weeks (PCW) and 17 PCW in humans. We systematically identified nine cell types, and defined the developmental pathways of the major cell types. The results showed that human fetal livers experienced blood rapid growth and immigration during the period studied in our experiments, and identified the differentially expressed genes, and metabolic changes in the developmental process of erythroid cells. In addition, we focus on the expression of liver disease related genes, and found that 17 genes published and linked to liver disease mainly expressed in megakaryocyte and endothelial, hardly expressed in any other cell types. Together, our findings provide a comprehensive and clear understanding of the differentiation processes of all main cell types in the human fetal livers, which may provide reference data and information for liver disease treatment and liver regeneration.

Download Full-text

Single-cell RNA-seq analysis maps the development of human fetal retina

10.1101/423830 ◽

2018 ◽

Cited By ~ 7

Author(s):

Yufeng Lu ◽

Wenyang Yi ◽

Qian Wu ◽

Suijuan Zhong ◽

Zhentao Zuo ◽

...

Keyword(s):

Single Cell ◽

Visual Information ◽

Developmental Trajectories ◽

Retinal Development ◽

Neuronal Cell ◽

Cell Types ◽

Retinal Cell ◽

Rna Seq ◽

Fetal Retina ◽

The Impact

AbstractVision starts with image formation at the retina, which contains diverse neuronal cell types that extract, process, and relay visual information to higher order processing centers in the brain. Though there has been steady progress in defining retinal cell types, very little is known about retinal development in humans, which starts well before birth. In this study, we performed transcriptomic profiling of developing human fetal retina from gestational weeks 12 to 27 using single-cell RNA-seq (scRNA-seq) and used pseudotime analysis to reconstruct the developmental trajectories of retinogenesis. Our analysis reveals transcriptional programs driving differentiation down four different cell types and suggests that Müller glia (MG) can serve as embryonic progenitors in early retinal development. In addition, we also show that transcriptional differences separate retinal progenitor cells (RPCs) into distinct subtypes and use this information to reconstruct RPC developmental trajectories and cell fate. Our results support a hierarchical program of differentiation governing cell-type diversity in the developing human retina. In summary, our work details comprehensive molecular classification of retinal cells, reconstructs their relationships, and paves the way for future mechanistic studies on the impact of gene regulation upon human retinogenesis.

Download Full-text

Rejoinder for “Exponential-Family Embedding With Application to Cell Developmental Trajectories for Single-Cell RNA-Seq Data”

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1892701 ◽

2021 ◽

Vol 116 (534) ◽

pp. 478-480

Author(s):

Kevin Z. Lin ◽

Jing Lei ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

Exponential Family ◽

Developmental Trajectories ◽

Rna Seq

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text