scholarly journals GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

2019 ◽  
Author(s):  
Magdalena E Strauss ◽  
Paul DW Kirk ◽  
John E Reid ◽  
Lorenz Wernisch

AbstractMotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotem-poral ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.AvailabilityAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/[email protected] informationSupplementary materials are available.

2019 ◽  
Author(s):  
Magdalena E Strauss ◽  
Paul D W Kirk ◽  
John E Reid ◽  
Lorenz Wernisch

Abstract Motivation Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. Results The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation.We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. Availability An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. Supplementary Information Supplementary data are available at Bioinformatics online.


Author(s):  
Tamim Abdelaal ◽  
Paul de Raadt ◽  
Boudewijn P.F. Lelieveldt ◽  
Marcel J.T. Reinders ◽  
Ahmed Mahfouz

AbstractMotivationSingle cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.ResultsWe developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable timeframes. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.Availability and ImplementationImplementation is available on GitHub (https://github.com/paulderaadt/HSNE-clustering)[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i849-i856
Author(s):  
Tamim Abdelaal ◽  
Paul de Raadt ◽  
Boudewijn P F Lelieveldt ◽  
Marcel J T Reinders ◽  
Ahmed Mahfouz

Abstract Motivation Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Availability and implementation Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Ming Tang ◽  
Yasin Kaymaz ◽  
Brandon Logeman ◽  
Stephen Eichhorn ◽  
ZhengZheng S. Liang ◽  
...  

AbstractMotivationOne major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor, and the resolution parameters, among others.ResultsHere, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat, and estimation of cluster stability using the Jaccard similarity index. The Snakemake workflow takes advantage of high-performance computing clusters and dispatches jobs in parallel to available CPUs to speed up the analysis. The scclusteval package provides functions to facilitate the analysis of the output, including a series of rich visualizations.AvailabilityR package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2018 ◽  
Vol 35 (18) ◽  
pp. 3387-3396 ◽  
Author(s):  
Yannis Pantazis ◽  
Ioannis Tsamardinos

Abstract Motivation Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously. Results In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway. Availability and implementation Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3585-3587
Author(s):  
Lin Wang ◽  
Francisca Catalan ◽  
Karin Shamardani ◽  
Husam Babikir ◽  
Aaron Diaz

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Andres M Cifuentes-Bernal ◽  
Vu Vh Pham ◽  
Xiaomei Li ◽  
Lin Liu ◽  
Jiuyong Li ◽  
...  

Abstract Motivation microRNAs (miRNAs) are important gene regulators and they are involved in many biological processes, including cancer progression. Therefore, correctly identifying miRNA–mRNA interactions is a crucial task. To this end, a huge number of computational methods has been developed, but they mainly use the data at one snapshot and ignore the dynamics of a biological process. The recent development of single cell data and the booming of the exploration of cell trajectories using ‘pseudotime’ concept have inspired us to develop a pseudotime-based method to infer the miRNA–mRNA relationships characterizing a biological process by taking into account the temporal aspect of the process. Results We have developed a novel approach, called pseudotime causality, to find the causal relationships between miRNAs and mRNAs during a biological process. We have applied the proposed method to both single cell and bulk sequencing datasets for Epithelia to Mesenchymal Transition, a key process in cancer metastasis. The evaluation results show that our method significantly outperforms existing methods in finding miRNA–mRNA interactions in both single cell and bulk data. The results suggest that utilizing the pseudotemporal information from the data helps reveal the gene regulation in a biological process much better than using the static information. Availability and implementation R scripts and datasets can be found at https://github.com/AndresMCB/PTC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3642-3650 ◽  
Author(s):  
Ruiqing Zheng ◽  
Min Li ◽  
Zhenlan Liang ◽  
Fang-Xiang Wu ◽  
Yi Pan ◽  
...  

Abstract Motivation The development of single-cell RNA-sequencing (scRNA-seq) provides a new perspective to study biological problems at the single-cell level. One of the key issues in scRNA-seq analysis is to resolve the heterogeneity and diversity of cells, which is to cluster the cells into several groups. However, many existing clustering methods are designed to analyze bulk RNA-seq data, it is urgent to develop the new scRNA-seq clustering methods. Moreover, the high noise in scRNA-seq data also brings a lot of challenges to computational methods. Results In this study, we propose a novel scRNA-seq cell type detection method based on similarity learning, called SinNLRR. The method is motivated by the self-expression of the cells with the same group. Specifically, we impose the non-negative and low rank structure on the similarity matrix. We apply alternating direction method of multipliers to solve the optimization problem and propose an adaptive penalty selection method to avoid the sensitivity to the parameters. The learned similarity matrix could be incorporated with spectral clustering, t-distributed stochastic neighbor embedding for visualization and Laplace score for prioritizing gene markers. In contrast to other scRNA-seq clustering methods, our method achieves more robust and accurate results on different datasets. Availability and implementation Our MATLAB implementation of SinNLRR is available at, https://github.com/zrq0123/SinNLRR. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Shuoguo Wang ◽  
Constance Brett ◽  
Mohan Bolisetty ◽  
Ryan Golhar ◽  
Isaac Neuhaus ◽  
...  

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]


Sign in / Sign up

Export Citation Format

Share Document