The use and abuse of hypothesis tests: how to present P values

2010 ◽  
Vol 25 (3) ◽  
pp. 107-112 ◽  
Author(s):  
C J Smith ◽  
Z V Fox

This overview highlights some of the key issues involved in performing and interpreting hypothesis tests. We describe the general approach taken in performing a hypothesis test with a focus on how to state the null and alternative hypothesis, and why two-sided tests are usually more appropriate than one-sided tests. We describe best practice techniques in performing and presenting the results of hypothesis tests. We recommend that, alongside any p-values, authors should also present estimates of the size of any treatment effects and their confidence intervals. Furthermore, they should specify the exact p-value rather than using terms such as 'NS' or the commonly used asterix notation. We discuss other pitfalls that are encountered at the analysis stage such as the use of repeated observations on individuals, the use of multiple tests on the data and the erroneous use of parametric tests when data are not normally distributed and vice versa. We highlight these points using two different examples: one looking at the use of compression stockings for preventing the occurrence of DVT on long-haul flights and a second hypothetical study comparing laser versus surgery techniques for the removal of varicose veins.

Author(s):  
Jeffrey T Leek ◽  
John D. Storey

Simultaneously performing many hypothesis tests is a problem commonly encountered in high-dimensional biology. In this setting, a large set of p-values is calculated from many related features measured simultaneously. Classical statistics provides a criterion for defining what a “correct” p-value is when performing a single hypothesis test. We show here that even when each p-value is marginally correct under this single hypothesis criterion, it may be the case that the joint behavior of the entire set of p-values is problematic. On the other hand, there are cases where each p-value is marginally incorrect, yet the joint distribution of the set of p-values is satisfactory. Here, we propose a criterion defining a well behaved set of simultaneously calculated p-values that provides precise control of common error rates and we introduce diagnostic procedures for assessing whether the criterion is satisfied with simulations. Multiple testing p-values that satisfy our new criterion avoid potentially large study specific errors, but also satisfy the usual assumptions for strong control of false discovery rates and family-wise error rates. We utilize the new criterion and proposed diagnostics to investigate two common issues in high-dimensional multiple testing for genomics: dependent multiple hypothesis tests and pooled versus test-specific null distributions.


2018 ◽  
Vol 14 (31) ◽  
pp. 213
Author(s):  
Elsa Dhuli

This paper examines the differences resulting from calculating the means of Pay Roll records and personal revenues used as secondary data with results from business survey in an empirical study taking a panel. The use of “secondary data” as primary source for producing the official indicators is a challenge worldwide. In the past decades has also been considered as the way forward for raising productivity and reducing burden on businesses. If the Short Term Survey is sample survey the Pay Roll records are administrative data. The purpose for what they are gathered is different. But both could be used for providing statistical indicators. In this paper the panel not weighted data are taken into consideration where the same business is analyzed from two related sources. The paired t-test is used to compare the values of means from two related sources. In those conditions the difference between the means of the two sources is unlikely to be equal to zero. In this study the hypothesis test is designed to answer the question "Is the observed difference sufficiently large enough to indicate that the alternative hypothesis is true?" What does it mean in our case study the answer which comes in the form of a probability - the p-value? The paper shows some interesting findings about the means difference between the two sources within a year. The differences resulting from the conducted analysis come as a result of the definition used in both sources for the same indicator, errors in reporting and treatment of non-response in the survey and administrative data source, coding errors.


2009 ◽  
Vol 33 (2) ◽  
pp. 81-86 ◽  
Author(s):  
Douglas Curran-Everett

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This second installment of Explorations in Statistics delves into test statistics and P values, two concepts fundamental to the test of a scientific null hypothesis. The essence of a test statistic is that it compares what we observe in the experiment to what we expect to see if the null hypothesis is true. The P value associated with the magnitude of that test statistic answers this question: if the null hypothesis is true, what proportion of possible values of the test statistic are at least as extreme as the one I got? Although statisticians continue to stress the limitations of hypothesis tests, there are two realities we must acknowledge: hypothesis tests are ingrained within science, and the simple test of a null hypothesis can be useful. As a result, it behooves us to explore the notions of hypothesis tests, test statistics, and P values.


2015 ◽  
Vol 3 (3) ◽  
pp. 139-144 ◽  
Author(s):  
Stephanie L. Pugh ◽  
Annette Molinaro

Abstract When reading an article published in a medical journal, statistical tests are mentioned and the results are often supported by a P value. What are these tests? What is a P value and what is its meaning? P values are used to interpret the result of a statistical test. Both are intrinsic parts of hypothesis testing, which is a decision-making tool based on probability. Most medical and epidemiological studies are designed using a hypothesis test so understanding the key principles of a hypothesis test are crucial to interpreting results of a study. From null and alternative hypotheses to the issue of multiple tests, this paper introduces concepts related to hypothesis testing that are crucial to its implementation and interpretation.


2018 ◽  
Vol 7 (3) ◽  
pp. 23 ◽  
Author(s):  
Patrick Breheny ◽  
Arnold Stromberg ◽  
Joshua Lambert

It is increasingly common for experiments in biology and medicine to involve large numbers of hypothesis tests. A natural graphical method for visualizing these tests is to construct a histogram from the p-values of these tests. In this article, we examine the shapes, both regular and irregular, that these histograms can take on, as well as present simple inferential procedures that help to interpret the shapes in terms of diagnosing potential problems with the experiment. We examine potential causes of these problems in detail, and discuss potential remedies. Throughout, examples of irregular-looking p-value histograms are provided and based on case studies involving real biological experiments.


2019 ◽  
Author(s):  
Daniel Lakens

Due to the strong overreliance on p-values in the scientific literature some researchers have argued that p-values should be abandoned or banned, and that we need to move beyond p-values and embrace practical alternatives. When proposing alternatives to p-values statisticians often commit the ‘Statistician’s Fallacy’, where they declare which statistic researchers really ‘want to know’. Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. In some situations, the answer to the question they are most interested in will be the p-value. As long as null-hypothesis tests have been criticized, researchers have suggested to include minimum-effect tests and equivalence tests in our statistical toolbox, and these tests (even though they return p-values) have the potential to greatly improve the questions researchers ask. It is clear there is room for improvement in how we teach p-values. If anyone really believes p-values are an important cause of problems in science, preventing the misinterpretation of p-values by developing better evidence-based education and user-centered statistical software should be a top priority. Telling researchers which statistic they should use has distracted us from examining more important questions, such as asking researchers what they want to know when they do scientific research. Before we can improve our statistical inferences, we need to improve our statistical questions.


2021 ◽  
pp. 088506662110537
Author(s):  
Sarah Nostedt ◽  
Ari R. Joffe

Background Misinterpretations of the p-value in null-hypothesis statistical testing are common. We aimed to determine the implications of observed p-values in critical care randomized controlled trials (RCTs). Methods We included three cohorts of published RCTs: Adult-RCTs reporting a mortality outcome, Pediatric-RCTs reporting a mortality outcome, and recent Consecutive-RCTs reporting p-value ≤.10 in six higher-impact journals. We recorded descriptive information from RCTs. Reverse Bayesian implications of obtained p-values were calculated, reported as percentages with inter-quartile ranges. Results Obtained p-value was ≤.005 in 11/216 (5.1%) Adult-RCTs, 2/120 (1.7%) Pediatric-RCTs, and 37/90 (41.1%) Consecutive-RCTs. An obtained p-value .05–.0051 had high False Positive Rates; in Adult-RCTs, minimum (assuming prior probability of the alternative hypothesis was 50%) and realistic (assuming prior probability of the alternative hypothesis was 10%) False Positive Rates were 16.7% [11.2, 21.8] and 64.3% [53.2, 71.4]. An obtained p-value ≤.005 had lower False Positive Rates; in Adult-RCTs the realistic False Positive Rate was 7.7% [7.7, 16.0]. The realistic probability of the alternative hypothesis for obtained p-value .05–.0051 (ie, Positive Predictive Value) was 28.0% [24.1, 34.8], 30.6% [27.7, 48.5], 29.3% [24.3, 41.0], and 32.7% [24.1, 43.5] for Adult-RCTs, Pediatric-RCTs, Consecutive-RCTs primary and secondary outcome, respectively. The maximum Positive Predictive Value for p-value category .05–.0051 was median 77.8%, 79.8%, 78.8%, and 81.4% respectively. To have maximum or realistic Positive Predictive Value >90% or >80%, RCTs needed to have obtained p-value ≤.005. The credibility of p-value .05–.0051 findings were easy to challenge, and the credibility to rule-out an effect with p-value >.05 to .10 was low. The probability that a replication study would obtain p-value ≤.05 did not approach 90% unless the obtained p-value was ≤.005. Conclusions Unless the obtained p-value was ≤.005, the False Positive Rate was high, and the Positive Predictive Value and probability of replication of “statistically significant” findings were low.


Author(s):  
Patrick Breheny ◽  
Arnold Stromberg ◽  
Joshua Lambert

It is increasingly common for experiments in biology and medicine to involve large numbers of hypothesis tests. A natural graphical method for visualizing these tests is to construct a histogram from the p-values of these tests. In this article, we examine the shapes, both normal and abnormal, that these histograms can take on, as well as present simple inferential procedures that help to interpret the shapes in terms of diagnosing potential problems with the experiment. We examine potential causes of these problems in detail, and discuss potential remedies. Throughout, examples of abnormal-looking p-value histograms are provided and based on case studies involving real biological experiments.


Author(s):  
María Dolores VELASCO-PALACIOS ◽  
Gabriel SUYO-CRUZ

Currently, the education environment is experiencing great challenges; In other words, education in Peru, especially Regular Basic Education (EBR), presents a series of difficulties in the writing of administrative and academic documents. The objective of the study is to establish the relationship between the level of written communication, the linguistic deficiencies, and the achievement of competences in students entering an Universidad Pública del Cusco. The study approach is quantitative and applicative, with a causal correlational cross-sectional design; It is made up of a total of 1406 students entering the National University of San Antonio Abad del Cusco, choosing for the study of 142 students through non-probability sampling for convenience, compiling the information using the Likert scale, having examined its validity and reliability. Obtaining the fundamental conclusions that if there is a significant association between the level of written communication, linguistic deficiencies, and the achievement of competencies in students entering an Universidad Pública del Cusco. As well as it is observed with the contrast of the hypothesis test whose "P-value" = 0.000 is less than the significance value 0.05. Evidence using the Rho Spearman statistical test denies the null hypothesis, and we admit the alternative hypothesis.


Phlebologie ◽  
2010 ◽  
Vol 39 (03) ◽  
pp. 133-137
Author(s):  
H. Partsch

SummaryBackground: Compression stockings are widely used in patients with varicose veins. Methods: Based on published literature three main points are discussed: 1. the rationale of compression therapy in primary varicose veins, 2. the prescription of compression stockings in daily practice, 3. studies required in the future. Results: The main objective of prescribing compression stockings for patients with varicose veins is to improve subjective leg complaints and to prevent swelling after sitting and standing. No convincing data are available concerning prevention of progression or of complications. In daily practice varicose veins are the most common indication to prescribe compression stockings. The compliance depends on the severity of the disorder and is rather poor in less severe stages. Long-term studies are needed to proof the cost-effectiveness of compression stockings concerning subjective symptoms and objective signs of varicose veins adjusted to their clinical severity. Conclusion: Compression stockings in primary varicose veins are able to improve leg complaints and to prevent swelling.


Sign in / Sign up

Export Citation Format

Share Document