scholarly journals Estimating the False Discovery Rate Using Mixed Normal Distribution for Identifying Differentially Expressed Genes in Microarray Data Analysis

2007 ◽  
Vol 3 ◽  
pp. 117693510700300
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Takashi Sozu ◽  
Chikuma Hamada ◽  
Isao Yoshimura

The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.

2008 ◽  
Vol 2 ◽  
pp. BBI.S473 ◽  
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Chikuma Hamada ◽  
Isao Yoshimura

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Alassane Thiam ◽  
Michel Sanka ◽  
Rokhaya Ndiaye Diallo ◽  
Magali Torres ◽  
Babacar Mbengue ◽  
...  

Abstract Background Plasmodium falciparum malaria remains a major health problem in Africa. The mechanisms of pathogenesis are not fully understood. Transcriptomic studies may provide new insights into molecular pathways involved in the severe form of the disease. Methods Blood transcriptional levels were assessed in patients with cerebral malaria, non-cerebral malaria, or mild malaria by using microarray technology to look for gene expression profiles associated with clinical status. Multi-way ANOVA was used to extract differentially expressed genes. Network and pathways analyses were used to detect enrichment for biological pathways. Results We identified a set of 443 genes that were differentially expressed in the three patient groups after applying a false discovery rate of 10%. Since the cerebral patients displayed a particular transcriptional pattern, we focused our analysis on the differences between cerebral malaria patients and mild malaria patients. We further found 842 differentially expressed genes after applying a false discovery rate of 10%. Unsupervised hierarchical clustering of cerebral malaria-informative genes led to clustering of the cerebral malaria patients. The support vector machine method allowed us to correctly classify five out of six cerebral malaria patients and six of six mild malaria patients. Furthermore, the products of the differentially expressed genes were mapped onto a human protein-protein network. This led to the identification of the proteins with the highest number of interactions, including GSK3B, RELA, and APP. The enrichment analysis of the gene functional annotation indicates that genes involved in immune signalling pathways play a role in the occurrence of cerebral malaria. These include BCR-, TCR-, TLR-, cytokine-, FcεRI-, and FCGR- signalling pathways and natural killer cell cytotoxicity pathways, which are involved in the activation of immune cells. In addition, our results revealed an enrichment of genes involved in Alzheimer’s disease. Conclusions In the present study, we examine a set of genes whose expression differed in cerebral malaria patients and mild malaria patients. Moreover, our results provide new insights into the potential effect of the dysregulation of gene expression in immune pathways. Host genetic variation may partly explain such alteration of gene expression. Further studies are required to investigate this in African populations.


Author(s):  
Glenn Heller ◽  
Jing Qin

An objective of microarray data analysis is to identify gene expressions that are associated with a disease related outcome. For each gene, a test statistic is computed to determine if an association exists, and this statistic generates a marginal p-value. In an effort to pool this information across genes, a p-value density function is derived. The p-value density is modeled as a mixture of a uniform (0,1) density and a scaled ratio of normal densities derived from the asymptotic normality of the test statistic. The p-values are assumed to be weakly dependent and a quasi-likelihood is used to estimate the parameters in the mixture density. The quasi-likelihood and the weak dependence assumption enables estimation and asymptotic inference on the false discovery rate for a given rejection region, and its inverse, the p-value threshold parameter for a fixed false discovery rate. A false discovery rate analysis on a localized prostate cancer data set is used to illustrate the methodology. Simulations are performed to assess the performance of this methodology.


Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 905-914 ◽  
Author(s):  
Hakkyo Lee ◽  
Jack C M Dekkers ◽  
M Soller ◽  
Massoud Malek ◽  
Rohan L Fernando ◽  
...  

Abstract Controlling the false discovery rate (FDR) has been proposed as an alternative to controlling the genomewise error rate (GWER) for detecting quantitative trait loci (QTL) in genome scans. The objective here was to implement FDR in the context of regression interval mapping for multiple traits. Data on five traits from an F2 swine breed cross were used. FDR was implemented using tests at every 1 cM (FDR1) and using tests with the highest test statistic for each marker interval (FDRm). For the latter, a method was developed to predict comparison-wise error rates. At low error rates, FDR1 behaved erratically; FDRm was more stable but gave similar significance thresholds and number of QTL detected. At the same error rate, methods to control FDR gave less stringent significance thresholds and more QTL detected than methods to control GWER. Although testing across traits had limited impact on FDR, single-trait testing was recommended because there is no theoretical reason to pool tests across traits for FDR. FDR based on FDRm was recommended for QTL detection in interval mapping because it provides significance tests that are meaningful, yet not overly stringent, such that a more complete picture of QTL is revealed.


2006 ◽  
Vol 45 (9) ◽  
pp. 1181-1189 ◽  
Author(s):  
D. S. Wilks

Abstract The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.


Sign in / Sign up

Export Citation Format

Share Document