scholarly journals Confronting false discoveries in single-cell differential expression

2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

Differential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulation. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. Our results suggest an urgent need for a paradigm shift in the methods used to perform differential expression analysis in single-cell data.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

AbstractDifferential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulations. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. To exemplify these principles, we exposed true and false discoveries of differentially expressed genes in the injured mouse spinal cord.


2019 ◽  
Author(s):  
Mahmoud M Ibrahim ◽  
Rafael Kramann

ABSTRACTMarker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers.Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data.genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.


2018 ◽  
Author(s):  
Jesse M. Zhang ◽  
Govinda M. Kamath ◽  
David N. Tse

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.


2014 ◽  
Author(s):  
Zong Hong Zhang ◽  
Dhanisha J. Jhaveri ◽  
Vikki M. Marshall ◽  
Denis C. Bauer ◽  
Janette Edson ◽  
...  

Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.


2021 ◽  
Author(s):  
Shaoheng Liang ◽  
Qingnan Liang ◽  
Rui Chen ◽  
Ken Chen

Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.


2018 ◽  
Vol 34 (19) ◽  
pp. 3340-3348 ◽  
Author(s):  
Zhijin Wu ◽  
Yi Zhang ◽  
Michael L Stitzel ◽  
Hao Wu

2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Wenan Chen ◽  
Yan Li ◽  
John Easton ◽  
David Finkelstein ◽  
Gang Wu ◽  
...  

2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29470 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Nicolas Dominguez ◽  
Krista Bean ◽  
Susan R. Macwana ◽  
Virginia Roberts ◽  
...  

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 5201-5201
Author(s):  
Chieh Lee Wong ◽  
Baoshan Ma ◽  
Gareth Gerrard ◽  
Martyna Adamowicz-Brice ◽  
Zainul Abidin Norziha ◽  
...  

Abstract Background The past decade has witnessed a significant progress in the understanding of the molecular pathogenesis of myeloproliferative neoplasms (MPN). A large number of genes have now been implicated in the pathogenesis of MPN but their relative importance, the mechanisms by which they cause different cell types to predominate and their implications for prognosis remain unknown. We hypothesized that there are other genes which may contribute to the pathogenesis of the different disease subtypes detectable only by cell-type specific analysis. Aim The aim of this study was to perform gene expression profiling on different cell types from patients with MPN in order to identify novel variants and driver mutations, to elucidate the pathogenesis and to identify predictors of survival in patients with MPN in a multiracial country. Methods We performed gene expression profiling on normal controls (NC) and patients with MPN from 3 different races (Malay, Chinese and Indian) in Malaysia who were diagnosed with essential thrombocythemia (ET), polycythemia vera (PV) and primary myelofibrosis (PMF) according to the 2008 WHO diagnostic criteria for MPN. Two cohorts of patients, the patient and validation cohorts, from 3 tertiary-level hospitals were recruited prospectively over 3 years and informed consents were obtained. Peripheral blood samples were taken and sorted into polymorphonuclear cells (PMNs), mononuclear cells (MNCs) and T cells. RNA was extracted from each cell population. Gene expression profiling was performed using the Illumina HumanHT-12 Expression Beadchip for microarray and the Illumina Nextera XT DNA Sample Preparation Kit for next generation sequencing on the patient and validation cohorts respectively. Results Twenty-eight patients (10 ET, 11 PV and 7 PMF) and 11 NC were recruited into the patient cohort. Twelve patients (4 ET, 4 PV and 4 PMF) and 4 NC were recruited into the validation cohort. Gene expression levels for each cell type in each disease were compared with NC. In the patient cohort, the number of differentially expressed genes in ET, PV and PMF was 0, 141 and 15 respectively for PMNs (p < 0.05 after multiple testing correction) and 5, 170 and 562 respectively for MNCs (p < 0.05). No differentially expressed genes were identified for T cells in any of the three disease groups. RNA-seq analysis of samples from the validation cohort was used to corroborate these findings. After combination, we were able to confirm differential expression of 0, 14 and 7 genes in ET, PV and PMF respectively for PMNs (p < 0.05) and 51 genes in only PMF for MNCs (p < 0.05). The validated differentially expressed genes for PMNs and MNCs were mutually exclusive except for one gene. The differentially expressed genes in PV and PMF for PMNs were involved in cellular processes and metabolic pathways whereas the differentially expressed genes for PMF in MNCs were involved in regulation of cytoskeleton, focal adhesion and cell signaling pathways. Conclusion This is the first study to use microarray and next generation sequencing techniques to compare cell type-specific expression of genes between different subtypes of MPN. The lack of differential expression in T cells validates the techniques used and indicates that they are not part of the neoplastic clone. Differential expression of genes for MNCs was seen only in PMF which may be related to their more severe phenotype. Interestingly, there were fewer differentially expressed genes in PMF compared to PV for PMNs. The lack of differential expression in ET may either reflect the relatively milder phenotype of the disease or that differential expression is limited to megakaryocytes-platelets which were not studied. The lists of mutually exclusive cell type-specific differentially expressed genes for PMNs and MNCs provide further insight into the pathogenesis of MPN and into the differences between its different forms. The identified genes also indicate further routes for investigation of pathogenesis and possible disease-specific targets for therapy. Disclosures Aitman: Illumina: Honoraria.


Sign in / Sign up

Export Citation Format

Share Document