GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

AbstractMotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotem-poral ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.AvailabilityAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/[email protected] informationSupplementary materials are available.

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

Bioinformatics ◽

10.1093/bioinformatics/btz778 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul D W Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Rna Seq ◽

Clustering Methods ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

Abstract Motivation Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. Results The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation.We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. Availability An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. Supplementary Information Supplementary data are available at Bioinformatics online.

Download Full-text

SCHNEL: Scalable clustering of high dimensional single-cell data

10.1101/2020.03.30.015925 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tamim Abdelaal ◽

Paul de Raadt ◽

Boudewijn P.F. Lelieveldt ◽

Marcel J.T. Reinders ◽

Ahmed Mahfouz

Keyword(s):

Single Cell ◽

Graph Clustering ◽

Original Data ◽

Large Datasets ◽

Supplementary Information ◽

High Dimensional ◽

Sequencing Data ◽

Novel Approach ◽

Cellular Markers ◽

Cell Data

AbstractMotivationSingle cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.ResultsWe developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable timeframes. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.Availability and ImplementationImplementation is available on GitHub (https://github.com/paulderaadt/HSNE-clustering)[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

SCHNEL: scalable clustering of high dimensional single-cell data

Bioinformatics ◽

10.1093/bioinformatics/btaa816 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i849-i856

Author(s):

Tamim Abdelaal ◽

Paul de Raadt ◽

Boudewijn P F Lelieveldt ◽

Marcel J T Reinders ◽

Ahmed Mahfouz

Keyword(s):

Single Cell ◽

Graph Clustering ◽

Original Data ◽

Large Datasets ◽

Supplementary Information ◽

High Dimensional ◽

Sequencing Data ◽

Novel Approach ◽

Cellular Markers ◽

Cell Data

Abstract Motivation Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Availability and implementation Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Evaluating single-cell cluster stability using the Jaccard similarity index

10.1101/2020.05.26.116640 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ming Tang ◽

Yasin Kaymaz ◽

Brandon Logeman ◽

Stephen Eichhorn ◽

ZhengZheng S. Liang ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithms ◽

Similarity Index ◽

R Package ◽

Supplementary Information ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Jaccard Similarity ◽

Cluster Stability ◽

Link Type

AbstractMotivationOne major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor, and the resolution parameters, among others.ResultsHere, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat, and estimation of cluster stability using the Jaccard similarity index. The Snakemake workflow takes advantage of high-performance computing clusters and dispatches jobs in parallel to available CPUs to speed up the analysis. The scclusteval package provides functions to facilitate the analysis of the output, including a series of rich visualizations.AvailabilityR package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

A unified approach for sparse dynamical system inference from temporal measurements

Bioinformatics ◽

10.1093/bioinformatics/btz065 ◽

2018 ◽

Vol 35 (18) ◽

pp. 3387-3396 ◽

Cited By ~ 5

Author(s):

Yannis Pantazis ◽

Ioannis Tsamardinos

Keyword(s):

Dynamical System ◽

Dynamical Systems ◽

Single Cell ◽

Time Course ◽

Cell Mass ◽

Supplementary Information ◽

Unified Approach ◽

Unknown Parameters ◽

Inference Problem ◽

Linear Dynamical Systems

Abstract Motivation Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously. Results In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway. Availability and implementation Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Ensemble learning for classifying single-cell data and projection across reference atlases

Bioinformatics ◽

10.1093/bioinformatics/btaa137 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3585-3587

Author(s):

Lin Wang ◽

Francisca Catalan ◽

Karin Shamardani ◽

Husam Babikir ◽

Aaron Diaz

Keyword(s):

Single Cell ◽

Cell Types ◽

Status Quo ◽

Supplementary Information ◽

Published Data ◽

Supplementary Data ◽

Cell Type ◽

Low Sensitivity ◽

Project Data ◽

Cell Data

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A pseudotemporal causality approach to identifying miRNA–mRNA interactions during biological processes

Bioinformatics ◽

10.1093/bioinformatics/btaa899 ◽

2020 ◽

Author(s):

Andres M Cifuentes-Bernal ◽

Vu Vh Pham ◽

Xiaomei Li ◽

Lin Liu ◽

Jiuyong Li ◽

...

Keyword(s):

Single Cell ◽

Cancer Progression ◽

Cancer Metastasis ◽

Biological Process ◽

Supplementary Information ◽

Biological Processes ◽

Mesenchymal Transition ◽

Novel Approach ◽

Static Information ◽

Temporal Aspect

Abstract Motivation microRNAs (miRNAs) are important gene regulators and they are involved in many biological processes, including cancer progression. Therefore, correctly identifying miRNA–mRNA interactions is a crucial task. To this end, a huge number of computational methods has been developed, but they mainly use the data at one snapshot and ignore the dynamics of a biological process. The recent development of single cell data and the booming of the exploration of cell trajectories using ‘pseudotime’ concept have inspired us to develop a pseudotime-based method to infer the miRNA–mRNA relationships characterizing a biological process by taking into account the temporal aspect of the process. Results We have developed a novel approach, called pseudotime causality, to find the causal relationships between miRNAs and mRNAs during a biological process. We have applied the proposed method to both single cell and bulk sequencing datasets for Epithelia to Mesenchymal Transition, a key process in cancer metastasis. The evaluation results show that our method significantly outperforms existing methods in finding miRNA–mRNA interactions in both single cell and bulk data. The results suggest that utilizing the pseudotemporal information from the data helps reveal the gene regulation in a biological process much better than using the static information. Availability and implementation R scripts and datasets can be found at https://github.com/AndresMCB/PTC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation

Bioinformatics ◽

10.1093/bioinformatics/btz139 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3642-3650 ◽

Cited By ~ 13

Author(s):

Ruiqing Zheng ◽

Min Li ◽

Zhenlan Liang ◽

Fang-Xiang Wu ◽

Yi Pan ◽

...

Keyword(s):

Single Cell ◽

Low Rank ◽

Supplementary Information ◽

Similarity Matrix ◽

Similarity Learning ◽

Clustering Methods ◽

Cell Type ◽

Gene Markers ◽

Adaptive Penalty ◽

New Perspective

Abstract Motivation The development of single-cell RNA-sequencing (scRNA-seq) provides a new perspective to study biological problems at the single-cell level. One of the key issues in scRNA-seq analysis is to resolve the heterogeneity and diversity of cells, which is to cluster the cells into several groups. However, many existing clustering methods are designed to analyze bulk RNA-seq data, it is urgent to develop the new scRNA-seq clustering methods. Moreover, the high noise in scRNA-seq data also brings a lot of challenges to computational methods. Results In this study, we propose a novel scRNA-seq cell type detection method based on similarity learning, called SinNLRR. The method is motivated by the self-expression of the cells with the same group. Specifically, we impose the non-negative and low rank structure on the similarity matrix. We apply alternating direction method of multipliers to solve the optimization problem and propose an adaptive penalty selection method to avoid the sensitivity to the parameters. The learned similarity matrix could be incorporated with spectral clustering, t-distributed stochastic neighbor embedding for visualization and Laplace score for prioritizing gene markers. In contrast to other scRNA-seq clustering methods, our method achieves more robust and accurate results on different datasets. Availability and implementation Our MATLAB implementation of SinNLRR is available at, https://github.com/zrq0123/SinNLRR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text