scholarly journals Protocol for executing and benchmarking eight computational doublet-detection methods in single-cell RNA sequencing data analysis

2021 ◽  
Vol 2 (3) ◽  
pp. 100699
Author(s):  
Nan Miles Xi ◽  
Jingyi Jessica Li
2020 ◽  
Vol 200 ◽  
pp. 108204 ◽  
Author(s):  
Andrew P. Voigt ◽  
S. Scott Whitmore ◽  
Nicholas D. Lessing ◽  
Adam P. DeLuca ◽  
Budd A. Tucker ◽  
...  

2020 ◽  
Author(s):  
Tamim Abdelaal ◽  
Jeroen Eggermont ◽  
Thomas Höllt ◽  
Ahmed Mahfouz ◽  
Marcel J.T. Reinders ◽  
...  

SummaryThe ever-increasing number of analyzed cells in Single-cell RNA sequencing (scRNA-seq) experiments imposes several challenges on the data analysis. Current analysis methods lack scalability to large datasets hampering interactive visual exploration of the data. We present Cytosplore-Transcriptomics, a framework to analyze scRNA-seq data, including data preprocessing, visualization and downstream analysis. At its core, it uses a hierarchical, manifold preserving representation of the data that allows the inspection and annotation of scRNA-seq data at different levels of detail. Consequently, Cytosplore-Transcriptomics provides interactive analysis of the data using low-dimensional visualizations that scales to millions of cells.AvailabilityCytosplore-Transcriptomics can be freely downloaded from [email protected]


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment vs control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly detect significant DE, with better performance than many parametric tools specifically developed for scRNA-seq data analysis. However, these rank tests cannot be used for complex experimental designs involving multiple groups, multiple factors and confounding variables. Further, rank based tests do not provide an interpretable measure of the effect size. We propose a semi-parametric approach based on probabilistic index models (PIM) that form a flexible class of models that generalize classical rank tests. Our method does not rely on strong distributional assumptions and it allows accounting for confounding factors. Moreover, it allows for the estimation of the effect size in terms of a probabilistic index. Real data analysis demonstrate that PIM is capable of identifying biologically meaningful DE. Our simulation studies also show that DE tests succeed well in controlling the false discovery rate at its nominal level, while maintaining good sensitivity as compared to competing methods.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Fenglin Liu ◽  
Yuanyuan Zhang ◽  
Lei Zhang ◽  
Ziyi Li ◽  
Qiao Fang ◽  
...  

Abstract Background Systematic interrogation of single-nucleotide variants (SNVs) is one of the most promising approaches to delineate the cellular heterogeneity and phylogenetic relationships at the single-cell level. While SNV detection from abundant single-cell RNA sequencing (scRNA-seq) data is applicable and cost-effective in identifying expressed variants, inferring sub-clones, and deciphering genotype-phenotype linkages, there is a lack of computational methods specifically developed for SNV calling in scRNA-seq. Although variant callers for bulk RNA-seq have been sporadically used in scRNA-seq, the performances of different tools have not been assessed. Results Here, we perform a systematic comparison of seven tools including SAMtools, the GATK pipeline, CTAT, FreeBayes, MuTect2, Strelka2, and VarScan2, using both simulation and scRNA-seq datasets, and identify multiple elements influencing their performance. While the specificities are generally high, with sensitivities exceeding 90% for most tools when calling homozygous SNVs in high-confident coding regions with sufficient read depths, such sensitivities dramatically decrease when calling SNVs with low read depths, low variant allele frequencies, or in specific genomic contexts. SAMtools shows the highest sensitivity in most cases especially with low supporting reads, despite the relatively low specificity in introns or high-identity regions. Strelka2 shows consistently good performance when sufficient supporting reads are provided, while FreeBayes shows good performance in the cases of high variant allele frequencies. Conclusions We recommend SAMtools, Strelka2, FreeBayes, or CTAT, depending on the specific conditions of usage. Our study provides the first benchmarking to evaluate the performances of different SNV detection tools for scRNA-seq data.


2021 ◽  
Author(s):  
Ke-Xu Xiong ◽  
Han-Lin Zhou ◽  
Jian-Hua Yin ◽  
Karsten Kristiansen ◽  
Huan-Ming Yang ◽  
...  

High-throughput single-cell RNA sequencing (scRNA-seq) is a popular method, but it is accompanied by doublet rate problems that disturb the downstream analysis. Several computational approaches have been developed to detect doublets. However, most of these methods have good performance in some datasets but lack stability in others; thus, it is difficult to regard a single method as the gold standard for each scenario, and it is a difficult and time-consuming task for researcher to choose the most appropriate software. To address these issues, we propose Chord which implements a machine learning algorithm that integrates multiple doublet detection methods. Chord had a higher accuracy and stability than the individual approaches on different datasets containing real and synthetic data. Moreover, Chord was designed with a modular architecture port, which has high flexibility and adaptability to the incorporation of any new tools. Chord is a general solution to the doublet detection problem.


Sign in / Sign up

Export Citation Format

Share Document