Faculty Opinions recommendation of Challenges in unsupervised clustering of single-cell RNA-seq data.

Author(s):  
Sushmita Roy
2019 ◽  
Vol 20 (5) ◽  
pp. 310-310 ◽  
Author(s):  
Vladimir Yu Kiselev ◽  
Tallulah S. Andrews ◽  
Martin Hemberg

Author(s):  
Davide Risso ◽  
Stefano Maria Pagnotta

Abstract Motivation Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear. Results Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications. Availability The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst\_analysis. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tian Tian ◽  
Jie Zhang ◽  
Xiang Lin ◽  
Zhi Wei ◽  
Hakon Hakonarson

AbstractClustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.


Author(s):  
Lili Blumenberg ◽  
Kelly V. Ruggles

AbstractUnsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. To streamline this process, we present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Hypercluster is available on bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster.Author summaryUnsupervised clustering is a technique for grouping similar samples within a dataset. It is extremely common when analyzing big data from patient samples, or high throughput techniques like single cell RNA-seq. When researchers use unsupervised clustering, they have to select parameters that affect the final result—for instance, how many groups they expect to find or what the smallest group is allowed to be. Some methods require setting even less intuitive parameters. For most applications, it is extremely challenging to guess what the values of these parameters should be; therefore to prevent introducing bias into the final results, researchers should test many different parameters and methods to find the best groups. This process is cumbersome, slow and challenging to perform in a reproducible way. We developed hypercluster, a tool that automates this process, make it much faster, and presenting the results in a reproducible and helpful manner.


2020 ◽  
Author(s):  
Davide Risso ◽  
Stefano M. Pagnotta

AbstractMotivationData transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformations on the outcome of unsupervised clustering procedures is still unclear.ResultsHere, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AvailabilityThe AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.


2019 ◽  
Vol 20 (5) ◽  
pp. 273-282 ◽  
Author(s):  
Vladimir Yu Kiselev ◽  
Tallulah S. Andrews ◽  
Martin Hemberg

2017 ◽  
Author(s):  
Vladimir Yu Kiselev ◽  
Andrew Yiu ◽  
Martin Hemberg

AbstractSingle-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues1–9 since the technology allows researchers to define cell-types using unsupervised clustering of the transcriptome8,10. However, due to differences in experimental methods and computational analyses, it is often challenging to directly compare the cells identified in two different experiments. Here, we present scmap (http://bioconductor.org/packages/scmap), a method for projecting cells from a scRNA-seq experiment onto the cell-types or individual cells identified in other experiments (the application can be run for free, without restrictions, from http://www.hemberg-lab.cloud/scmap).


Sign in / Sign up

Export Citation Format

Share Document