scholarly journals Model-based branching point detection in single-cell data by K-Branches clustering

2016 ◽  
Author(s):  
Nikolaos K. Chlis ◽  
F. Alexander Wolf ◽  
Fabian J. Theis

MotivationThe identification of heterogeneities in cell populations by utilizing single-cell technologies such as single-cell RNA-Seq, enables inference of cellular development and lineage trees. Several methods have been proposed for such inference from high-dimensional single-cell data. They typically assign each cell to a branch in a differentiation trajectory. However, they commonly assume specific geometries such as tree-like developmental hierarchies and lack statistically sound methods to decide on the number of branching events.ResultsWe present K-Branches, a solution to the above problem by locally fitting half-lines to single-cell data, introducing a clustering algorithm similar to K-Means. These halflines are proxies for branches in the differentiation trajectory of cells. We propose a modified version of the GAP statistic for model selection, in order to decide on the number of lines that best describe the data locally. In this manner, we identify the location and number of subgroups of cells that are associated with branching events and full differentiation, respectively. We evaluate the performance of our method on single-cell RNA-Seq data describing the differentiation of myeloid progenitors during hematopoiesis, single-cell qPCR data of mouse blastocyst development and artificial data.AvailabilityAn R implementation of K-Branches is freely available at https://github.com/theislab/[email protected]

2017 ◽  
Vol 33 (20) ◽  
pp. 3211-3219 ◽  
Author(s):  
Nikolaos K Chlis ◽  
F Alexander Wolf ◽  
Fabian J Theis

2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chunxiang Wang ◽  
Xin Gao ◽  
Juntao Liu

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


2020 ◽  
Author(s):  
Julia Eve Olivieri ◽  
Roozbeh Dehghannasiri ◽  
Julia Salzman

AbstractTo date, the field of single-cell genomics has viewed robust splicing analysis as completely out of reach in droplet-based platforms, preventing biological discovery of single-cell regulated splicing. Here, we introduce a novel, robust, and computationally efficient statistical method, the Splicing Z Score (SZS), to detect differential alternative splicing in single cell RNA-Seq technologies including 10x Chromium. We applied the SZS to primary human cells to discover new regulated, cell type-specific splicing patterns. Illustrating the power of the SZS method, splicing of a small set of genes has high predictive power for tissue compartment in the human lung, and the SZS identifies un-annotated, conserved splicing regulation in the human spermatogenesis. The SZS is a method that can rapidly identify regulated splicing events from single cell data and prioritize genes predicted to have functionally significant splicing programs.


2020 ◽  
Author(s):  
Giovana Ravizzoni Onzi ◽  
Juliano Luiz Faccioni ◽  
Alvaro G. Alvarado ◽  
Paula Andreghetto Bracco ◽  
Harley I. Kornblum ◽  
...  

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.


2021 ◽  
Author(s):  
James Anibal ◽  
Alexandre Day ◽  
Erol Bahadiroglu ◽  
Liam O'Neill ◽  
Long Phan ◽  
...  

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner. We make HAL-x publicly available at: https://pypi.org/project/hal-x/


2019 ◽  
Author(s):  
Joshua Batson ◽  
Loïc Royer ◽  
James Webber

Single-cell RNA sequencing enables researchers to study the gene expression of individual cells. However, in high-throughput methods the portrait of each individual cell is noisy, representing thousands of the hundreds of thousands of mRNA molecules originally present. While many methods for denoising single-cell data have been proposed, a principled procedure for selecting and calibrating the best method for a given dataset has been lacking. We present “molecular cross-validation,” a statistically principled and data-driven approach for estimating the accuracy of any denoising method without the need for ground-truth. We validate this approach for three denoising methods—principal component analysis, network diffusion, and a deep autoencoder—on a dataset of deeply-sequenced neurons. We show that molecular cross-validation correctly selects the optimal parameters for each method and identifies the best method for the dataset.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Huijian Feng ◽  
Lihui Lin ◽  
Jiekai Chen

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.


2019 ◽  
Author(s):  
Duoduo Wu ◽  
Joe Yeong Poh Sheng ◽  
Grace Tan Su-En ◽  
Marion Chevrier ◽  
Josh Loh Jie Hua ◽  
...  

AbstractUsing human hepatocellular carcinoma (HCC) tissue samples stained with seven immune markers including one nuclear counterstain, we compared and evaluated the use of a new dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP), as an alternative to t-Distributed Stochastic Neighbor Embedding (t-SNE) in analysing multiplex-immunofluorescence (mIF) derived single-cell data. We adopted an unsupervised clustering algorithm called FlowSOM to identify eight major cell types present in human HCC tissues. UMAP and t-SNE were ran independently on the dataset to qualitatively compare the distribution of clustered cell types in both reduced dimensions. Our comparison shows that UMAP is superior in runtime. Both techniques provide similar arrangements of cell clusters, with the key difference being UMAP’s extensive characteristic branching. Most interestingly, UMAP’s branching was able to highlight biological lineages, especially in identifying potential hybrid tumour cells (HTC). Survival analysis shows patients with higher proportion of HTC have a worse prognosis (p-value = 0.019). We conclude that both techniques are similar in their visualisation capabilities, but UMAP has a clear advantage over t-SNE in runtime, making it highly plausible to employ UMAP as an alternative to t-SNE in mIF data analysis.


2021 ◽  
Author(s):  
Klebea Carvalho ◽  
Elisabeth Rebboah ◽  
Camden Jansen ◽  
Katherine Williams ◽  
Andrew Dowey ◽  
...  

SummaryGene regulatory networks (GRNs) provide a powerful framework for studying cellular differentiation. However, it is less clear how GRNs encode cellular responses to everyday microenvironmental cues. Macrophages can be polarized and potentially repolarized based on environmental signaling. In order to identify the GRNs that drive macrophage polarization and the heterogeneous single-cell subpopulations that are present in the process, we used a high-resolution time course of bulk and single-cell RNA-seq and ATAC-seq assays of HL-60-derived macrophages polarized towards M1 or M2 over 24 hours. We identified transient M1 and M2 markers, including the main transcription factors that underlie polarization, and subpopulations of naive, transitional, and terminally polarized macrophages. We built bulk and single-cell polarization GRNs to compare the recovered interactions and found that each technology recovered only a subset of known interactions. Our data provide a resource to study the GRN of cellular maturation in response to microenvironmental stimuli in a variety of contexts in homeostasis and disease.


Sign in / Sign up

Export Citation Format

Share Document