scholarly journals Confronting false discoveries in single-cell differential expression

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

AbstractDifferential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulations. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. To exemplify these principles, we exposed true and false discoveries of differentially expressed genes in the injured mouse spinal cord.

2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

Differential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulation. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. Our results suggest an urgent need for a paradigm shift in the methods used to perform differential expression analysis in single-cell data.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 5201-5201
Author(s):  
Chieh Lee Wong ◽  
Baoshan Ma ◽  
Gareth Gerrard ◽  
Martyna Adamowicz-Brice ◽  
Zainul Abidin Norziha ◽  
...  

Abstract Background The past decade has witnessed a significant progress in the understanding of the molecular pathogenesis of myeloproliferative neoplasms (MPN). A large number of genes have now been implicated in the pathogenesis of MPN but their relative importance, the mechanisms by which they cause different cell types to predominate and their implications for prognosis remain unknown. We hypothesized that there are other genes which may contribute to the pathogenesis of the different disease subtypes detectable only by cell-type specific analysis. Aim The aim of this study was to perform gene expression profiling on different cell types from patients with MPN in order to identify novel variants and driver mutations, to elucidate the pathogenesis and to identify predictors of survival in patients with MPN in a multiracial country. Methods We performed gene expression profiling on normal controls (NC) and patients with MPN from 3 different races (Malay, Chinese and Indian) in Malaysia who were diagnosed with essential thrombocythemia (ET), polycythemia vera (PV) and primary myelofibrosis (PMF) according to the 2008 WHO diagnostic criteria for MPN. Two cohorts of patients, the patient and validation cohorts, from 3 tertiary-level hospitals were recruited prospectively over 3 years and informed consents were obtained. Peripheral blood samples were taken and sorted into polymorphonuclear cells (PMNs), mononuclear cells (MNCs) and T cells. RNA was extracted from each cell population. Gene expression profiling was performed using the Illumina HumanHT-12 Expression Beadchip for microarray and the Illumina Nextera XT DNA Sample Preparation Kit for next generation sequencing on the patient and validation cohorts respectively. Results Twenty-eight patients (10 ET, 11 PV and 7 PMF) and 11 NC were recruited into the patient cohort. Twelve patients (4 ET, 4 PV and 4 PMF) and 4 NC were recruited into the validation cohort. Gene expression levels for each cell type in each disease were compared with NC. In the patient cohort, the number of differentially expressed genes in ET, PV and PMF was 0, 141 and 15 respectively for PMNs (p < 0.05 after multiple testing correction) and 5, 170 and 562 respectively for MNCs (p < 0.05). No differentially expressed genes were identified for T cells in any of the three disease groups. RNA-seq analysis of samples from the validation cohort was used to corroborate these findings. After combination, we were able to confirm differential expression of 0, 14 and 7 genes in ET, PV and PMF respectively for PMNs (p < 0.05) and 51 genes in only PMF for MNCs (p < 0.05). The validated differentially expressed genes for PMNs and MNCs were mutually exclusive except for one gene. The differentially expressed genes in PV and PMF for PMNs were involved in cellular processes and metabolic pathways whereas the differentially expressed genes for PMF in MNCs were involved in regulation of cytoskeleton, focal adhesion and cell signaling pathways. Conclusion This is the first study to use microarray and next generation sequencing techniques to compare cell type-specific expression of genes between different subtypes of MPN. The lack of differential expression in T cells validates the techniques used and indicates that they are not part of the neoplastic clone. Differential expression of genes for MNCs was seen only in PMF which may be related to their more severe phenotype. Interestingly, there were fewer differentially expressed genes in PMF compared to PV for PMNs. The lack of differential expression in ET may either reflect the relatively milder phenotype of the disease or that differential expression is limited to megakaryocytes-platelets which were not studied. The lists of mutually exclusive cell type-specific differentially expressed genes for PMNs and MNCs provide further insight into the pathogenesis of MPN and into the differences between its different forms. The identified genes also indicate further routes for investigation of pathogenesis and possible disease-specific targets for therapy. Disclosures Aitman: Illumina: Honoraria.


2020 ◽  
Author(s):  
Hongyu Li ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
Hongyu Zhao

Abstract Background: Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. However, the often-low sample size of single cell data limits the statistical power to identify DE genes. In this article, we propose to borrow information through known biological networks. Results: We develop MRFscRNAseq, which is based on a Markov Random Field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DE genes. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DE genes than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls.Conclusions: The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method provides differential expression analysis for scRNA-seq data with increased statistical power.


2021 ◽  
Author(s):  
Shaoheng Liang ◽  
Qingnan Liang ◽  
Rui Chen ◽  
Ken Chen

Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.


2020 ◽  
Vol 11 ◽  
Author(s):  
Lori Garman ◽  
Richard C. Pelikan ◽  
Astrid Rasmussen ◽  
Caleb A. Lareau ◽  
Kathryn A. Savoy ◽  
...  

Sarcoidosis is a systemic inflammatory disease characterized by infiltration of immune cells into granulomas. Previous gene expression studies using heterogeneous cell mixtures lack insight into cell-type-specific immune dysregulation. We performed the first single-cell RNA-sequencing study of sarcoidosis in peripheral immune cells in 48 patients and controls. Following unbiased clustering, differentially expressed genes were identified for 18 cell types and bioinformatically assessed for function and pathway enrichment. Our results reveal persistent activation of circulating classical monocytes with subsequent upregulation of trafficking molecules. Specifically, classical monocytes upregulated distinct markers of activation including adhesion molecules, pattern recognition receptors, and chemokine receptors, as well as enrichment of immunoregulatory pathways HMGB1, mTOR, and ephrin receptor signaling. Predictive modeling implicated TGFβ and mTOR signaling as drivers of persistent monocyte activation. Additionally, sarcoidosis T cell subsets displayed patterns of dysregulation. CD4 naïve T cells were enriched for markers of apoptosis and Th17/Treg differentiation, while effector T cells showed enrichment of anergy-related pathways. Differentially expressed genes in regulatory T cells suggested dysfunctional p53, cell death, and TNFR2 signaling. Using more sensitive technology and more precise units of measure, we identify cell-type specific, novel inflammatory and regulatory pathways. Based on our findings, we suggest a novel model involving four convergent arms of dysregulation: persistent hyperactivation of innate and adaptive immunity via classical monocytes and CD4 naïve T cells, regulatory T cell dysfunction, and effector T cell anergy. We further our understanding of the immunopathology of sarcoidosis and point to novel therapeutic targets.


2020 ◽  
Author(s):  
Dustin J. Sokolowski ◽  
Mariela Faykoo-Martinez ◽  
Lauren Erdman ◽  
Huayun Hou ◽  
Cadia Chan ◽  
...  

AbstractRNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.HighlightsscMappR integrates scRNA-seq and bulk RNA-seq to re-calibrate bulk differentially expressed genes (DEGs).scMappR correctly identified immune-cell expressed DEGs from a bulk RNA-seq analysis of mouse kidney regeneration.scMappR is deployed as a user-friendly R package available at CRAN.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hongyu Li ◽  
Biqing Zhu ◽  
Zhichao Xu ◽  
Taylor Adams ◽  
Naftali Kaminski ◽  
...  

Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.


2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29470 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Nicolas Dominguez ◽  
Krista Bean ◽  
Susan R. Macwana ◽  
Virginia Roberts ◽  
...  

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Wang ◽  
Daofu Cheng ◽  
Chengang Fan ◽  
Cong Zhang ◽  
Chao Zhang ◽  
...  

Background: As Oryza sativa ssp. indica and Oryza sativa ssp. japonica are the two major subspecies of Asian cultivated rice, the adaptative evolution of these varieties in divergent environments is an important topic in both theoretical and practical studies. However, the cell type-specific differentiation between indica and japonica rice varieties in response to divergent habitat environments, which facilitates an understanding of the genetic basis underlying differentiation and environmental adaptation between rice subspecies at the cellular level, is little known.Methods: We analyzed a published single-cell RNA sequencing dataset to explore the differentially expressed genes between indica and japonica rice varieties in each cell type. To estimate the relationship between cell type-specific differentiation and environmental adaptation, we focused on genes in the WRKY, NAC, and BZIP transcription factor families, which are closely related to abiotic stress responses. In addition, we integrated five bulk RNA sequencing datasets obtained under conditions of abiotic stress, including cold, drought and salinity, in this study. Furthermore, we analyzed quiescent center cells in rice root tips based on orthologous markers in Arabidopsis.Results: We found differentially expressed genes between indica and japonica rice varieties with cell type-specific patterns, which were enriched in the pathways related to root development and stress reposes. Some of these genes were members of the WRKY, NAC, and BZIP transcription factor families and were differentially expressed under cold, drought or salinity stress. In addition, LOC_Os01g16810, LOC_Os01g18670, LOC_Os04g52960, and LOC_Os08g09350 may be potential markers of quiescent center cells in rice root tips.Conclusion: These results identified cell type-specific differentially expressed genes between indica-japonica rice varieties that were related to various environmental stresses and provided putative markers of quiescent center cells. This study provides new clues for understanding the development and physiology of plants during the process of adaptative divergence, in addition to identifying potential target genes for the improvement of stress tolerance in rice breeding applications.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bobby Ranjan ◽  
Florian Schmidt ◽  
Wenjie Sun ◽  
Jinyu Park ◽  
Mohammad Amin Honardoost ◽  
...  

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.


Sign in / Sign up

Export Citation Format

Share Document