Prioritizing hypothesis tests for high throughput data

2015 ◽  
Vol 32 (6) ◽  
pp. 850-858 ◽  
Author(s):  
Sangjin Kim ◽  
Paul Schliekelman

Abstract Motivation: The advent of high throughput data has led to a massive increase in the number of hypothesis tests conducted in many types of biological studies and a concomitant increase in stringency of significance thresholds. Filtering methods, which use independent information to eliminate less promising tests and thus reduce multiple testing, have been widely and successfully applied. However, key questions remain about how to best apply them: When is filtering beneficial and when is it detrimental? How good does the independent information need to be in order for filtering to be effective? How should one choose the filter cutoff that separates tests that pass the filter from those that don’t? Result: We quantify the effect of the quality of the filter information, the filter cutoff and other factors on the effectiveness of the filter and show a number of results: If the filter has a high probability (e.g. 70%) of ranking true positive features highly (e.g. top 10%), then filtering can lead to dramatic increase (e.g. 10-fold) in discovery probability when there is high redundancy in information between hypothesis tests. Filtering is less effective when there is low redundancy between hypothesis tests and its benefit decreases rapidly as the quality of the filter information decreases. Furthermore, the outcome is highly dependent on the choice of filter cutoff. Choosing the cutoff without reference to the data will often lead to a large loss in discovery probability. However, naïve optimization of the cutoff using the data will lead to inflated type I error. We introduce a data-based method for choosing the cutoff that maintains control of the family-wise error rate via a correction factor to the significance threshold. Application of this approach offers as much as a several-fold advantage in discovery probability relative to no filtering, while maintaining type I error control. We also introduce a closely related method of P-value weighting that further improves performance. Availability and implementation: R code for calculating the correction factor is available at http://www.stat.uga.edu/people/faculty/paul-schliekelman. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Megha Joshi ◽  
James E Pustejovsky ◽  
S. Natasha Beretvas

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.


2019 ◽  
Vol 35 (24) ◽  
pp. 5155-5162 ◽  
Author(s):  
Chengzhong Ye ◽  
Terence P Speed ◽  
Agus Salim

Abstract Motivation Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. Results We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. Availability and implementation The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (20) ◽  
pp. 3898-3905 ◽  
Author(s):  
Ziyi Li ◽  
Zhijin Wu ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. Results We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. Availability and implementation The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 20 (6) ◽  
pp. 2055-2065 ◽  
Author(s):  
Johannes Brägelmann ◽  
Justo Lorenzo Bermejo

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.


2009 ◽  
Vol 27 (15_suppl) ◽  
pp. 9513-9513
Author(s):  
C. L. Loprinzi ◽  
R. Qin ◽  
P. J. Stella ◽  
K. M. Rowland ◽  
D. L. Graham ◽  
...  

9513 Background: Hot flashes are a major problem in many women for which better treatment options are needed. Given the known efficacy of gabapentin for decreasing hot flashes, it was decided to evaluate pregabalin, with hopes that it would work better and/or with fewer toxicities. Methods: A three-arm, double-blinded, placebo-controlled randomized trial was developed. Women with bothersome hot flashes (at least 28/week) were randomized to receive either a placebo or target pregabalin oral doses of 75 mg bid or 150 mg bid (starting at 50 mg/d and then increasing the dose at weekly intervals to 50 mg bid, then 75 mg bid, and then, in the higher dose arm, 150 mg bid); patients were treated for 6 weeks. Hot flash numbers and scores (hot flash number times mean severity) were measured using a validated daily hot flash diary. A one-week baseline period preceded initiation of study tablets. The primary endpoint was the average intra-patient difference in hot flash score between baseline and week six, comparing the higher dose pregabalin arm and the placebo arm. With the planned sample size of 55 patients per arm, there was an 80% power and two-sided 5% Type I error rate to detect a difference of 0.54 standard deviations, or 1.08 hot flashes per day, or 2.7 units of hot flash score per day. Results: 207 patients were randomized between 6/20/2008 and 8/21/2008. The study arms were well balanced. Mean/median daily hot flash scores and frequencies for all pts at baseline were 15.7/13.4 and 8.3/7.7, respectively. The table shows the decreases in hot flashes from the baseline to the sixth treatment week. Larger numbers illustrate greater hot flash reductions. Toxicity information, quality of life information, and information regarding the effects of hot flashes on subjective symptoms will be available at the meeting time. Conclusions: Pregabalin reduces hot flashes in women. There appears to be similar effects with both studied doses. [Table: see text] No significant financial relationships to disclose.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 3544-3544 ◽  
Author(s):  
Julia Quidde ◽  
Hans-Joachim Schmoll ◽  
Benjamin Garlipp ◽  
Christian Junghanss ◽  
Malte Leithaeuser ◽  
...  

3544 Background: FOLFOXIRI/bev is a highly efficacious first line regimen in MCRC. Despite higher rates of neutropenia, diarrhea and stomatitis, FOLFOXIRI/bev is tolerable and feasible in MCRC patients. To date nothing is known about the impact of this regimen on HRQOL. Methods: 250 patients were randomized to FOLFOX/bev (arm A) or FOLFOXIRI/bev (arm B). HRQOL were assessed at baseline, every 8 weeks during induction treatment (6 months) and every 12 weeks during maintenance treatment, using the EORTC QLQ-C30, QLQ-CR29 and QLQ-CIPN20. The mean values of every score were calculated as the average of week 8, 16 and 24 assessment. Test concerning mean values were performed as t-test, with global type I error set at 0.05. HRQOL deterioration and improvement rates were analyzed and compared between treatment groups using chi² tests. Results: For HRQOL analysis, 237 patients were eligible (arm A: 118; arm B: 119). Compliance rate with the HRQOL questionnaires was 95.4% at baseline, 72.6% at week 8, 59.5 % at week 16 and 43.5% at week 24. Whereas mean global quality of life score (GHS/QOL) was similar between arm A and B (59.8 vs. 58.8; p = 0.726), mean scores for nausea/vomiting (9.4 vs. 16.0; p = 0.015) and diarrhea (23.7 vs. 32.1; p = 0.051) significantly or borderline significantly favored arm A during induction period. Furthermore, at week 8 scores of nausea/vomiting (9.2 versus 17.3, p = 0.006) appetite loss (19.5 vs. 29.4; p = 0.035) and financial problems (18.3 vs. 29.5; p = 0.021) and at the end of treatment physical functioning (75.0 vs. 65.8; p = 0.048) were significantly better for arm A compared to arm B. No significant differences were observed in the remaining EORTC scores. The rates of deterioration and improvement between baseline and week 8 of at least 10 points in the EORTC scores were similar (e.g. deterioration-rate GHS/QOL score 21.5% vs. 26.5% for arm A vs. B; p = 0.461). Conclusions: Although no remarkable detriment in HRQOL was noted, the better efficacy of FOLFOXIRI/bev compared to FOLFOX/bev is associated with a decrease in mainly gastrointestinal QOL scores. Further subgroup-analyses will be presented at the meeting. Clinical trial information: NCT01321957.


2020 ◽  
Vol 36 (10) ◽  
pp. 3099-3106
Author(s):  
Burim Ramosaj ◽  
Lubna Amro ◽  
Markus Pauly

Abstract Motivation Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This article closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand. Results Our findings indicate that machine-learning schemes for (multiply) imputing missing values may inflate type I error or result in comparably low power in small-to-moderate matched pairs, even after modifying the test statistics using Rubin’s multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered. Availability and implementation The corresponding R-code can be accessed through the authors and the gene expression data can be downloaded at www.gdac.broadinstitute.org. Supplementary information Supplementary data are available at Bioinformatics online.


2001 ◽  
Vol 95 (5) ◽  
pp. 1068-1073 ◽  
Author(s):  
Hwee Leng Pua ◽  
Jerrold Lerman ◽  
Mark W. Crawford ◽  
James G. Wright

Background The authors evaluated the quality of clinical trials published in four anesthesia journals during the 20-yr period from 1981-2000. Methods Trials published in four major anesthesia journals during the periods 1981-1985, 1991-1995, and the first 6 months of 2000 were grouped according to journal and year. Using random number tables, four trials were selected from all of the eligible clinical trials in each journal in each year for the periods 1981-1985 and 1991-1995, and five trials were selected from all of the trials in each journal in the first 6 months of 2000. Methods and results sections from the 160 trials from 1981-1985 and 1991-1995 were randomly ordered and distributed to three of the authors for blinded review of the quality of the study design according to 10 predetermined criteria (weighted equally, maximum score of 10): informed consent and ethics approval, eligibility criteria, sample size calculation, random allocation, method of randomization, blind assessment of outcome, adverse outcomes, statistical analysis, type I error, and type II error. After these trials were evaluated, 20 trials from the first 6 months of 2000 were randomly ordered, distributed, and evaluated as described. Results The mean (+/- SD) analysis scores pooled for the four journals increased from 5.5 +/- 1.4 in 1981-1985 to 7.0 +/- 1.1 in 1991-1995 (P < 0.00001) and to 7.8 +/- 1.5 in 2000. For 7 of the 10 criteria, the percentage of trials from the four journals that fulfilled the criteria increased significantly between 1981-1985 and 1991-1995. During the 20-yr period, the reporting of sample size calculation and method of randomization increased threefold to fourfold, whereas the frequency of type I statistical errors remained unchanged. Conclusion Although the quality of clinical trials in four major anesthesia journals has increased steadily during the past two decades, specific areas of trial methodology require further attention.


2015 ◽  
Vol 23 (4) ◽  
pp. 471-487 ◽  
Author(s):  
Bear F. Braumoeller

The various methodological techniques that fall under the umbrella description of qualitative comparative analysis (QCA) are increasingly popular for modeling causal complexity and necessary or sufficient conditions in medium-N settings. Because QCA methods are not designed as statistical techniques, however, there is no way to assess the probability that the patterns they uncover are the result of chance. Moreover, the implications of the multiple hypothesis tests inherent in these techniques for the false positive rate of the results are not widely understood. This article fills both gaps by tailoring a simple permutation test to the needs of QCA users and adjusting the Type I error rate of the test to take into account the multiple hypothesis tests inherent in QCA. An empirical application–a reexamination of a study of protest-movement success in the Arab Spring–highlights the need for such a test by showing that even very strong QCA results may plausibly be the result of chance.


Sign in / Sign up

Export Citation Format

Share Document