Faculty Opinions recommendation of Identifying differentially expressed genes using false discovery rate controlling procedures.

Author(s):  
Jurg Ott
2007 ◽  
Vol 3 ◽  
pp. 117693510700300
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Takashi Sozu ◽  
Chikuma Hamada ◽  
Isao Yoshimura

The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Alassane Thiam ◽  
Michel Sanka ◽  
Rokhaya Ndiaye Diallo ◽  
Magali Torres ◽  
Babacar Mbengue ◽  
...  

Abstract Background Plasmodium falciparum malaria remains a major health problem in Africa. The mechanisms of pathogenesis are not fully understood. Transcriptomic studies may provide new insights into molecular pathways involved in the severe form of the disease. Methods Blood transcriptional levels were assessed in patients with cerebral malaria, non-cerebral malaria, or mild malaria by using microarray technology to look for gene expression profiles associated with clinical status. Multi-way ANOVA was used to extract differentially expressed genes. Network and pathways analyses were used to detect enrichment for biological pathways. Results We identified a set of 443 genes that were differentially expressed in the three patient groups after applying a false discovery rate of 10%. Since the cerebral patients displayed a particular transcriptional pattern, we focused our analysis on the differences between cerebral malaria patients and mild malaria patients. We further found 842 differentially expressed genes after applying a false discovery rate of 10%. Unsupervised hierarchical clustering of cerebral malaria-informative genes led to clustering of the cerebral malaria patients. The support vector machine method allowed us to correctly classify five out of six cerebral malaria patients and six of six mild malaria patients. Furthermore, the products of the differentially expressed genes were mapped onto a human protein-protein network. This led to the identification of the proteins with the highest number of interactions, including GSK3B, RELA, and APP. The enrichment analysis of the gene functional annotation indicates that genes involved in immune signalling pathways play a role in the occurrence of cerebral malaria. These include BCR-, TCR-, TLR-, cytokine-, FcεRI-, and FCGR- signalling pathways and natural killer cell cytotoxicity pathways, which are involved in the activation of immune cells. In addition, our results revealed an enrichment of genes involved in Alzheimer’s disease. Conclusions In the present study, we examine a set of genes whose expression differed in cerebral malaria patients and mild malaria patients. Moreover, our results provide new insights into the potential effect of the dysregulation of gene expression in immune pathways. Host genetic variation may partly explain such alteration of gene expression. Further studies are required to investigate this in African populations.


Scientifica ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Emily Hansen ◽  
Kathleen F. Kerr

The goal of many microarray studies is to identify genes that are differentially expressed between two classes or populations. Many data analysts choose to estimate the false discovery rate (FDR) associated with the list of genes declared differentially expressed. Estimating an FDR largely reduces to estimatingπ1, the proportion of differentially expressed genes among all analyzed genes. Estimatingπ1is usually done throughP-values, but computingP-values can be viewed as a nuisance and potentially problematic step. We evaluated methods for estimatingπ1directly from test statistics, circumventing the need to computeP-values. We adapted existing methodology for estimatingπ1fromt- andz-statistics so thatπ1could be estimated from other statistics. We compared the quality of these estimates to estimates generated by two established methods for estimatingπ1fromP-values. Overall, methods varied widely in bias and variability. The least biased and least variable estimates ofπ1, the proportion of differentially expressed genes, were produced by applying the “convest” mixture model method toP-values computed from a pooled permutation null distribution. Estimates computed directly from test statistics rather thanP-values did not reliably perform well.


2008 ◽  
Vol 2 ◽  
pp. BBI.S473 ◽  
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Chikuma Hamada ◽  
Isao Yoshimura

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.


Author(s):  
Jun Wang ◽  
Cheng Peng ◽  
Catherine Guranich ◽  
Yujing J Heng ◽  
Gabrielle M Baker ◽  
...  

Abstract Background Cumulative epidemiologic evidence has shown that early-life adiposity is strongly inversely associated with breast cancer risk throughout life, independent of adult obesity. However, the molecular mechanisms remain poorly understood. Methods We assessed the association of early-life adiposity, defined as self-reported body size during ages 10-20 years from a validated 9-level pictogram, with the transcriptome of breast tumor (N = 835) and tumor-adjacent histologically normal tissue (N = 663) in the Nurses’ Health Study. We conducted multivariable linear regression analysis to identify differentially expressed genes in tumor and tumor-adjacent tissue, respectively. Molecular pathway analysis using Hallmark gene sets (N = 50) was further performed to gain biological insights. Analysis was stratified by tumor estrogen receptor (ER) protein expression status (n = 673 for ER+ and 162 for ER− tumors). Results No gene was statistically significantly differentially expressed by early-life body size after multiple comparison adjustment. However, pathway analysis revealed several statistically significantly (false discovery rate &lt; 0.05) upregulated or downregulated gene sets. In stratified analyses by tumor ER status, larger body size during ages 10-20 years was associated with decreased cellular proliferation pathways, including MYC target genes, in both ER+ and ER− tumors. In ER+ tumors, larger body size was also associated with upregulation in genes involved in TNFα/NFkB signaling. In ER− tumors, larger body size was additionally associated with downregulation in genes involved in interferon α and interferon γ immune response and Phosphatidylinositol 3-kinase (PI3K)/AKT/mammalian target of rapamycin (mTOR) signaling; the INFγ response pathway was also downregulated in ER− tumor-adjacent tissue, though at borderline statistical significance (false discovery rate = 0.1). Conclusions These findings provide new insights into the biological and pathological underpinnings of the early-life adiposity and breast cancer association.


2005 ◽  
Vol 45 (8) ◽  
pp. 859 ◽  
Author(s):  
G. J. McLachlan ◽  
R. W. Bean ◽  
L. Ben-Tovim Jones ◽  
J. X. Zhu

An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 4246-4246 ◽  
Author(s):  
Hee-Don Chae ◽  
Lara C. Murphy ◽  
Michele Donato ◽  
Alex G. Lee ◽  
E. Alejandro Sweet-Cordero ◽  
...  

Abstract Introduction Pediatric chronic myeloid leukemia (CML) accounts for 10 to 15% of children with myeloid leukemia and 2 to 9% of all pediatric leukemias. Prior to the discovery of tyrosine kinase inhibitors (TKI) such as imatinib, stem cell transplantation was the only curative treatment for both adults and children with CML. However, due to the small numbers of patients, standardized treatment approaches for pediatric CML have not been established. There are several unique characteristics of CML diagnosed in children and adolescents, and young adults (AYA; 16-29 years), compared to adults. Children and AYA with CML present with a higher white blood count and have larger spleens, higher peripheral blast counts, and lower hemoglobin levels, suggesting that the biology of pediatric CML is different than adult CML. In addition, potential side effects of TKIs unique to pediatric CML patients include impaired bone growth, fertility and immune function, however none have been extensively studied. We hypothesize that the differences in clinical presentation of pediatric CML patients are due to unique molecular characteristics that are absent in adult CML patients. To test this hypothesis, we studied the transcriptomic signature of pediatric CD34+ CML cells compared to adult CML and normal age-matched bone marrow CD34+ cells. Methods CD34+ cells were isolated from pediatric CML (n=7), adult CML (n=8), pediatric normal (n=2) and adult normal (n=3) bone marrow samples. Total RNA was isolated from cells, and then cDNA libraries were generated. Prepared libraries were sequenced on the Illumina HiSeq 4000 instrument. We aligned reads using the HISAT2 alignment software, and mapped to genes with HT-Seq. We removed genes that had zero reads across all the samples, resulting in a set of 4,696 genes that were detected in one or more samples. In case of technical replicates, we used mean of replicates. We performed three differential expression comparisons with edgeR: (1) Pediatric CML vs Adult CML, (2) Adult CML vs Adult Normal, and (3) Pediatric CML vs Pediatric Normal. We used a False Discovery Rate (FDR) of £ 20% and absolute log2 fold-change ³ 1 for selecting differentially expressed genes in each comparison. We used Fisher's exact test to identify significant KEGG pathways for the differentially expressed genes in each comparison. Results Pediatric CML vs Adult CML We found 24 differentially expressed genes (15 over- and 9 under-expressed). Though no pathway was found to be significant at the false discovery rate (FDR) £ 20%, we identified a number of sub-pathways that are relevant. For example, the Chemokine Signaling pathway shows at the top of the list (ordered by raw p-value) because of two genes, XCR1 and HCK, associated with VEGF and MAPK pathways involved in cell proliferation, angiogenesis, DNA repair, and cancer pathogenesis. Adult CML vs Adult Normal We found 60 genes (30 over- and 30 under-expressed) differentially expressed when comparing adult CML patients to normal adults. Ten genes overlapped with 24 genes we identified when comparing pediatric and adult CML patients. We found 11 pathways as significant at FDR £ 10%. Multiple pathways, including Cell adhesion, allograft rejection, Graft versus Host Disease, and Type I diabetes pathways, showed downregulation of MHC, with subsequent downstream reduction in expression of apoptosis-related genes. The IL-17 pathway makes sense, as MAPK, well-known to be associated with various cancers, is down-regulated. Lastly, in the NK pathway the gene DAP12 is up-regulated. This gene is known as a tyrosine kinase binding protein, and although tyrosine kinase inhibitors are the standard treatment for CML, the role of DAP12 in relation to leukemia has not yet been described. Pediatric CML vs Pediatric Normal We found 509 genes (350 over- and 159 under-expressed) differentially expressed in pediatric CML patients compared to normal. Interestingly, transcriptional regulators are differentially enriched in the hematopoietic stem cell differentiation function group including GATA1, GATA2, KLF1 and KLF2. RFC is down-regulated. RFC is a mismatch repair gene known to be involved in colorectal cancer. Many of the significant pathways are involved in glucose and fatty acid metabolism. Our pilot study identified novel molecular features of pediatric CML bone marrow stem cells, providing new insights into the novel biomarkers and pathogenesis of pediatric CML. Disclosures Gotlib: Blueprint Medicines: Consultancy, Honoraria, Research Funding; Promedior: Research Funding; Deciphera: Consultancy, Honoraria, Research Funding; Incyte: Consultancy, Honoraria, Research Funding; Kartos: Consultancy; Celgene: Consultancy, Honoraria, Research Funding; Gilead: Consultancy, Research Funding; Novartis: Consultancy, Honoraria, Research Funding.


Sign in / Sign up

Export Citation Format

Share Document