scholarly journals Making Decisions with Data: Understanding Hypothesis Testing & Statistical Significance

2019 ◽  
Vol 81 (8) ◽  
pp. 535-542
Author(s):  
Robert A. Cooper

Statistical methods are indispensable to the practice of science. But statistical hypothesis testing can seem daunting, with P-values, null hypotheses, and the concept of statistical significance. This article explains the concepts associated with statistical hypothesis testing using the story of “the lady tasting tea,” then walks the reader through an application of the independent-samples t-test using data from Peter and Rosemary Grant's investigations of Darwin's finches. Understanding how scientists use statistics is an important component of scientific literacy, and students should have opportunities to use statistical methods like this in their science classes.

Author(s):  
Sach Mukherjee

A number of important problems in data mining can be usefully addressed within the framework of statistical hypothesis testing. However, while the conventional treatment of statistical significance deals with error probabilities at the level of a single variable, practical data mining tasks tend to involve thousands, if not millions, of variables. This Chapter looks at some of the issues that arise in the application of hypothesis tests to multi-variable data mining problems, and describes two computationally efficient procedures by which these issues can be addressed.


Author(s):  
Sach Mukherjee

A number of important problems in data mining can be usefully addressed within the framework of statistical hypothesis testing. However, while the conventional treatment of statistical significance deals with error probabilities at the level of a single variable, practical data mining tasks tend to involve thousands, if not millions, of variables. This Chapter looks at some of the issues that arise in the application of hypothesis tests to multi-variable data mining problems, and describes two computationally efficient procedures by which these issues can be addressed.


2019 ◽  
Vol 35 (19) ◽  
pp. 3592-3598 ◽  
Author(s):  
Justin G Chitpin ◽  
Aseel Awdeh ◽  
Theodore J Perkins

Abstract Motivation Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. Results Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. Availability and implementation The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 14 (2) ◽  
pp. 7-27
Author(s):  
BIRGIT C. AQUILONIUS ◽  
MARY E. BRENNER

Results from a study of 16 community college students are presented. The research question concerned how students reasoned about p-values. Students' approach to p-values in hypothesis testing was procedural. Students viewed p-values as something that one compares to alpha values in order to arrive at an answer and did not attach much meaning to p-values as an independent concept. Therefore it is not surprising that students often were puzzled over how to translate their statistical answer to an answer of the question asked in the problem. Some reflections on how instruction in statistical hypothesis testing can be improved are given. First published November 2015 at Statistics Education Research Journal Archives


2018 ◽  
Author(s):  
Justin G. Chitpin ◽  
Aseel Awdeh ◽  
Theodore J. Perkins

AbstractMotivationChlP-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, and as a consequence, invalidates false discovery rate estimates. Thus, the true significance or reliability of peak calls remains unknown.ResultsUsing simulated and real ChIP-seq data sets, we show that three well-known peak callers, MACS, SICER and diffReps, output optimistically biased p-values, and therefore optimistic false discovery rate estimates—in some cases, many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate and correct for biases built into peak calling algorithms. P-values recalibrated by RECAP are approximately uniformly distributed when applied to null hypothesis data, in which ChIP-seq and control come from the same genomic distributions. When applied to non-null data, RECAP p-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls.AvailabilityThe RECAP software is available on github at https://github.com/theodorejperkins/[email protected]


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1176 ◽  
Author(s):  
Nicholas Graves ◽  
Adrian G. Barnett ◽  
Edward Burn ◽  
David Cook

Background: Clinical trials might be larger than needed because arbitrary high levels of statistical confidence are sought in the results. Traditional sample size calculations ignore the marginal value of the information collected for decision making. The statistical hypothesis testing objective is misaligned with the goal of generating information necessary for decision-making. The aim of the present study was to show that a clinical trial designed to test a prior hypothesis against an arbitrary threshold of confidence may recruit too many participants, wasting scarce research dollars and exposing participants to research unnecessarily. Methods: We used data from a recent RCT powered for traditional rules of statistical significance. The data were also used for an economic analysis to show the intervention led to cost savings and improved health outcomes. Adoption represented a good investment for decision-makers. We examined the effect of reducing the trial’s sample size on the results of the statistical hypothesis-testing analysis and the conclusions that would be drawn by decision-makers reading the economic analysis. Results: As the sample size reduced it became more likely that the null hypothesis of no difference in the primary outcome between groups would fail to be rejected. For decision-makers reading the economic analysis, reducing the sample size had little effect on the conclusion about whether to adopt the intervention. There was always high probability the intervention reduced costs and improved health. Conclusions: Decision makers managing health services are largely invariant to the sample size of the primary trial and the arbitrary p-value of 0.05. If the goal is to make a good decision about whether the intervention should be adopted widely, then that could have been achieved with a much smaller trial. It is plausible that hundreds of millions of research dollars are wasted each year recruiting more participants than required for RCTs.


Author(s):  
Helena Kraemer

“As ye sow. So shall ye reap”: For almost 100 years, researchers have been taught that the be-all and end-all in data-based research is the p-value. The resulting problems have now generated concern, often from us who have long so taught researchers. We must bear a major responsibility for the present situation and must alter our teachings. Despite the fact that the Zhang and Hughes paper is titled “Beyond p-value”, the total focus remains on statistical hypothesis testing studies (HTS) and p-values(1). Instead, I would propose that there are three distinct, necessary, and important phases of research: 1) Hypothesis Generation Studies (HGS) or Exploratory Research (2-4); 2) Hypothesis Testing Studies (HTS); 3) Replication and Application of Results. Of these, HTS is undoubtedly the most important, but without HGS, HTS is often weak and wasteful, and without Replication and Application, the results of HTS are often misleading.


Sign in / Sign up

Export Citation Format

Share Document