Statistical test of structured continuous trees based on discordance matrix

2019 ◽  
Vol 35 (23) ◽  
pp. 4962-4970
Author(s):  
Xiangqi Bai ◽  
Liang Ma ◽  
Lin Wan

Abstract Motivation Cell fate determination is a continuous process in which one cell type diversifies to other cell types following a hierarchical path. Advancements in single-cell technologies provide the opportunity to reveal the continuum of cell progression which forms a structured continuous tree (SCTree). Computational algorithms, which are usually based on a priori assumptions on the hidden structures, have previously been proposed as a means of recovering pseudo trajectory along cell differentiation process. However, there still lack of statistical framework on the assessments of intrinsic structure embedded in high-dimensional gene expression profile. Inherit noise and cell-to-cell variation underlie the single-cell data, however, pose grand challenges to testing even basic structures, such as linear versus bifurcation. Results In this study, we propose an adaptive statistical framework, termed SCTree, to test the intrinsic structure of a high-dimensional single-cell dataset. SCTree test is conducted based on the tools derived from metric geometry and random matrix theory. In brief, by extending the Gromov–Farris transform and utilizing semicircular law, we formulate the continuous tree structure testing problem into a signal matrix detection problem. We show that the SCTree test is most powerful when the signal-to-noise ratio exceeds a moderate value. We also demonstrate that SCTree is able to robustly detect linear, single and multiple branching events with simulated datasets and real scRNA-seq datasets. Overall, the SCTree test provides a unified statistical assessment of the significance of the hidden structure of single-cell data. Availability and implementation SCTree software is available at https://github.com/XQBai/SCTree-test. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3585-3587
Author(s):  
Lin Wang ◽  
Francisca Catalan ◽  
Karin Shamardani ◽  
Husam Babikir ◽  
Aaron Diaz

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Ivan Croydon Veleslavov ◽  
Michael P.H. Stumpf

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.


Author(s):  
Tamim Abdelaal ◽  
Paul de Raadt ◽  
Boudewijn P.F. Lelieveldt ◽  
Marcel J.T. Reinders ◽  
Ahmed Mahfouz

AbstractMotivationSingle cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.ResultsWe developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable timeframes. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.Availability and ImplementationImplementation is available on GitHub (https://github.com/paulderaadt/HSNE-clustering)[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Michael A. Skinnider ◽  
Jordan W. Squair ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Matthieu Gautier ◽  
...  

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i849-i856
Author(s):  
Tamim Abdelaal ◽  
Paul de Raadt ◽  
Boudewijn P F Lelieveldt ◽  
Marcel J T Reinders ◽  
Ahmed Mahfouz

Abstract Motivation Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Availability and implementation Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Shashank Jatav ◽  
Saksham Malhotra ◽  
Freda D Miller ◽  
Abhishek Jha ◽  
Sidhartha Goyal

AbstractMetabolism is intricately linked with cell fate changes. Much of this understanding comes from detailed metabolomics studies averaged across a population of cells which may be composed of multiple cell types. Currently, there are no quantitative techniques sensitive enough to assess metabolomics broadly at the single cell level. Here we present scMetNet, a technique that interrogates metabolic rewiring at the single cell resolution and we apply it to murine embryonic development. Our method first confirms the key metabolic pathways, categorized into bioenergetic, epigenetic and biosynthetic, that change as embryonic neural stem cells differentiate and age. It then goes beyond to identify specific sub-networks, such as the cholesterol and mevalonate biosynthesis pathway, that drive the global metabolic changes during neural cortical development. Having such contextual information about metabolic rewiring provides putative mechanisms driving stem cell differentiation and identifies potential targets for regulating neural stem cell and neuronal biology.


2021 ◽  
Author(s):  
Tejas Guha ◽  
Elana Fertig ◽  
Atul Deshpande

Summary: Color is often used as a primary differentiating factor in visualization of single-cell and multi-omics analyses. However, color-based visualizations are extremely limiting and require additional considerations to account for the wide range of color perceptions in the population. The scatterHatch package provides software for accessible single-cell visualizations that use patterns in conjunction with colors to amplify the distinction between different cell types, states, and groups. Availability: scatterHatch is available on Github at https://github.com/FertigLab/scatterHatch. Supplementary information: Supplementary figures are provided in the attached document.


2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Michael A. Skinnider ◽  
Matthieu Gautier ◽  
Leonard J. Foster ◽  
Grégoire Courtine
Keyword(s):  

2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yuanyuan Li ◽  
Ping Luo ◽  
Yi Lu ◽  
Fang-Xiang Wu

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.


Sign in / Sign up

Export Citation Format

Share Document