Model-based branching point detection in single-cell data by K-branches clustering

MotivationThe identification of heterogeneities in cell populations by utilizing single-cell technologies such as single-cell RNA-Seq, enables inference of cellular development and lineage trees. Several methods have been proposed for such inference from high-dimensional single-cell data. They typically assign each cell to a branch in a differentiation trajectory. However, they commonly assume specific geometries such as tree-like developmental hierarchies and lack statistically sound methods to decide on the number of branching events.ResultsWe present K-Branches, a solution to the above problem by locally fitting half-lines to single-cell data, introducing a clustering algorithm similar to K-Means. These halflines are proxies for branches in the differentiation trajectory of cells. We propose a modified version of the GAP statistic for model selection, in order to decide on the number of lines that best describe the data locally. In this manner, we identify the location and number of subgroups of cells that are associated with branching events and full differentiation, respectively. We evaluate the performance of our method on single-cell RNA-Seq data describing the differentiation of myeloid progenitors during hematopoiesis, single-cell qPCR data of mouse blastocyst development and artificial data.AvailabilityAn R implementation of K-Branches is freely available at https://github.com/theislab/[email protected]

Download Full-text

Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression

Statistical Science ◽

10.1214/19-sts714 ◽

2020 ◽

Vol 35 (1) ◽

pp. 2-13 ◽

Cited By ~ 2

Author(s):

Zhixiang Lin ◽

Mahdi Zamanighomi ◽

Timothy Daley ◽

Shining Ma ◽

Wing Hung Wong

Keyword(s):

Gene Expression ◽

Single Cell ◽

Chromatin Accessibility ◽

Joint Analysis ◽

Model Based ◽

Cell Data

Download Full-text

MAAPER: model-based analysis of alternative polyadenylation using 3′ end-linked reads

Genome Biology ◽

10.1186/s13059-021-02429-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Wei Vivian Li ◽

Dinghai Zheng ◽

Ruijia Wang ◽

Bin Tian

Keyword(s):

Single Cell ◽

Robust Statistics ◽

Alternative Polyadenylation ◽

Model Based ◽

Eukaryotic Genes ◽

Different Types ◽

Cell Transcriptome ◽

Single Cell Transcriptome ◽

Cell Data ◽

Model Based Analysis

AbstractMost eukaryotic genes express alternative polyadenylation (APA) isoforms. A growing number of RNA sequencing methods, especially those used for single-cell transcriptome analysis, generate reads close to the polyadenylation site (PAS), termed nearSite reads, hence inherently containing information about APA isoform abundance. Here, we present a probabilistic model-based method named MAAPER to utilize nearSite reads for APA analysis. MAAPER predicts PASs with high accuracy and sensitivity and examines different types of APA events with robust statistics. We show MAAPER’s performance with both bulk and single-cell data and its applicability in unpaired or paired experimental designs.

Download Full-text

Faculty Opinions recommendation of Systems biology. Conditional density-based analysis of T cell signaling in single-cell data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.723891088.793520867 ◽

2016 ◽

Author(s):

Anuj Kumar

Keyword(s):

Systems Biology ◽

T Cell ◽

Cell Signaling ◽

Single Cell ◽

Conditional Density ◽

T Cell Signaling ◽

Cell Data

Download Full-text

Model-based autoencoders for imputing discrete single-cell RNA-seq data

Methods ◽

10.1016/j.ymeth.2020.09.010 ◽

2020 ◽

Author(s):

Tian Tian ◽

Martin Renqiang Min ◽

Zhi Wei

Keyword(s):

Single Cell ◽

Rna Seq ◽

Model Based

Download Full-text

Prioritization of cell types responsive to biological perturbations in single-cell data with Augur

Nature Protocols ◽

10.1038/s41596-021-00561-x ◽

2021 ◽

Author(s):

Jordan W. Squair ◽

Michael A. Skinnider ◽

Matthieu Gautier ◽

Leonard J. Foster ◽

Grégoire Courtine

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Data

Download Full-text

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text