scholarly journals A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments

2008 ◽  
Vol 2 ◽  
pp. BBI.S473 ◽  
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Chikuma Hamada ◽  
Isao Yoshimura

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.

2007 ◽  
Vol 3 ◽  
pp. 117693510700300
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Takashi Sozu ◽  
Chikuma Hamada ◽  
Isao Yoshimura

The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.


Scientifica ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Emily Hansen ◽  
Kathleen F. Kerr

The goal of many microarray studies is to identify genes that are differentially expressed between two classes or populations. Many data analysts choose to estimate the false discovery rate (FDR) associated with the list of genes declared differentially expressed. Estimating an FDR largely reduces to estimatingπ1, the proportion of differentially expressed genes among all analyzed genes. Estimatingπ1is usually done throughP-values, but computingP-values can be viewed as a nuisance and potentially problematic step. We evaluated methods for estimatingπ1directly from test statistics, circumventing the need to computeP-values. We adapted existing methodology for estimatingπ1fromt- andz-statistics so thatπ1could be estimated from other statistics. We compared the quality of these estimates to estimates generated by two established methods for estimatingπ1fromP-values. Overall, methods varied widely in bias and variability. The least biased and least variable estimates ofπ1, the proportion of differentially expressed genes, were produced by applying the “convest” mixture model method toP-values computed from a pooled permutation null distribution. Estimates computed directly from test statistics rather thanP-values did not reliably perform well.


2019 ◽  
Author(s):  
Lulu Chen ◽  
Yingzhou Lu ◽  
Guoqiang Yu ◽  
Robert Clarke ◽  
Jennifer E. Van Eyk ◽  
...  

Tissue or cell subtype-specific and differentially-expressed genes (SDEGs) are defined as being differentially expressed in a particular tissue or cell subtype among multiple subtypes. Detecting SDEGs plays a critical rolse in molecularly characterizing and identifying tissue or cell subtypes, and facilitating supervised deconvolution of complex tissues. Unfortunately, classic differential analysis assumes a convenient null hypothesis and associated test statistic that is subtype-non-specific and thus, resulting in a high false positive rate and/or lower detection power with respect to particular subtypes. Here we introduce One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. To assess the statistical significance of such test, we also propose the scaled test statistic OVE-sFC together with a mixture null distribution model and a tailored permutation scheme. Validated with realistic synthetic data sets on both type 1 error and detection power, OVE-FC/sFC test applied to two benchmark gene expression data sets detects many known and de novo SDEGs. Subsequent supervised deconvolution results, obtained using the SDEGs detected by OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.


2010 ◽  
Vol 44-47 ◽  
pp. 905-909
Author(s):  
Yuan Tian ◽  
Gui Xia Liu ◽  
Chun Guang Zhou

One of the main purposes in analysis of microarray experiments is to identify differentially expressed genes under two experimental conditions. The Meta-analysis method, rank product meta-analysis approach, considered a powerful tool for identification of differentially expressed genes. However, rank product meta-analysis approach used the each dataset in the computation of the fold changes, which leaded to less computational efficiency. Here we modified the rank product meta-analysis approach to obtain an improved model for identifying different gene expression. The new model, grouping rank product approach, adds competitive classification of samples to group datasets before the computation of the fold changes. We used the grouping rank product approach on two simulated datasets and two breast datasets and showed that the grouping rank product approach is not only as accurate as the rank product meta-analysis approach, but also more computational efficient in identifying differentially expressed genes.


2003 ◽  
Vol 19 (6) ◽  
pp. 694-703 ◽  
Author(s):  
T. Park ◽  
S.-G. Yi ◽  
S. Lee ◽  
S. Y. Lee ◽  
D.-H. Yoo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document