ASA Statement on Statistical Significance and p-Values

Author(s):  
Ronald L. Wasserstein ◽  
Nicole A. Lazar
Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 603
Author(s):  
Leonid Hanin

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.


2014 ◽  
Vol 99 (6) ◽  
pp. 729-733 ◽  
Author(s):  
Tasiopoulos Konstantinos ◽  
Komnos Apostolos ◽  
Paraforos Georgios ◽  
Tepetes Konstantinos

Abstract Studies on surgical patients provide some evidence of prompt detection of enteric ischemia with microdialysis. The purpose of the study was to measure intraperitoneal microdialysis values (glucose, glycerol, pyruvate, and lactate) in patients hospitalized in an intensive care unit (ICU) with an underlying abdominal surgical condition and to correlate these values with patients' outcomes. Twenty-one patients, 10 female, were enrolled in the study. The intraperitoneal metabolite values were measured for 3 consecutive days, starting from the first day of ICU hospitalization. Descriptive and inferential statistics were performed. The t-test, repeated measures analysis, Holm's test, and a logistic regression model were applied. Level of statistical significance was set at P = 0.05. Mean age of participants was 68.10 ± 8.02 years old. Survivors exhibited statistically significantly higher glucose values on day 3 (6.61 ± 2.01 against 3.67 ± 1.62; P = 0.002). Mean lactate/ pyruvate (L/P) values were above 20 (35.35 ± 27.11). All non-survivors had a mean three day L/P values greater than 25.94. Low L/P values were related to increased survival possibilities. High microdialysis glucose concentration, high L/P ratio and low glucose concentration were the major findings during the first three ICU hospitalization days in non-survivors. Intraperitoneal microdialysis may serve as a useful tool in understanding enteric ischemia pathophysiology.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Sarah E Wetzel-Strong ◽  
Shantel M Weinsheimer ◽  
Jeffrey Nelson ◽  
Ludmila Pawlikowska ◽  
Dewi Clark ◽  
...  

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.


Author(s):  
Valentin Amrhein ◽  
Fränzi Korner-Nievergelt ◽  
Tobias Roth

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (American Statistical Association, Wasserstein & Lazar 2016). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values can tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p≤0.05) is hardly replicable: at a realistic statistical power of 40%, given that there is a true effect, only one in six studies will significantly replicate the significant result of another study. Even at a good power of 80%, results from two studies will be conflicting, in terms of significance, in one third of the cases if there is a true effect. This means that a replication cannot be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgement based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to publication bias against nonsignificant findings. Data dredging, p-hacking and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the observed effect size, e.g., from a sample average, and from a measure of uncertainty, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, such as 'we need more stringent decision rules', 'sample sizes will decrease' or 'we need to get rid of p-values'.


2019 ◽  
Vol 101-B (10) ◽  
pp. 1179-1183
Author(s):  
Nick Parsons ◽  
Richard Carey-Smith ◽  
Melina Dritsaki ◽  
Xavier Griffin ◽  
David Metcalfe ◽  
...  

2019 ◽  
Vol 81 (8) ◽  
pp. 535-542
Author(s):  
Robert A. Cooper

Statistical methods are indispensable to the practice of science. But statistical hypothesis testing can seem daunting, with P-values, null hypotheses, and the concept of statistical significance. This article explains the concepts associated with statistical hypothesis testing using the story of “the lady tasting tea,” then walks the reader through an application of the independent-samples t-test using data from Peter and Rosemary Grant's investigations of Darwin's finches. Understanding how scientists use statistics is an important component of scientific literacy, and students should have opportunities to use statistical methods like this in their science classes.


2011 ◽  
Vol 57 (Special Issue) ◽  
pp. S1-S6
Author(s):  
R. Gálik ◽  
Z. Poláková ◽  
Š. Boďo ◽  
M. Denker

The paper discusses the relations between some physical indicators of market eggs of laying hens housed in conventional and enriched cage batteries. The measured results were evaluated by the multiple regression dependence method. They show that in the case of both the conventional as well as the enriched cages a statistically significant dependence exists between the eggshell deflection (dependent variable) and thickness, or the force needed for the eggshell destruction (independent variable). The respective P values are given in brackets (0.002 &lt; 0.05; 0.03 &lt; 0.05; 1.16 &times; 10<sup>&ndash;10 </sup>&lt; 0.05; 8.31 &times; 10<sup>&ndash;4 </sup>&lt; 0.05); in the case of the conventional cage and enriched cage also a statistically significant dependence existed (3.81 &times; 10<sup>&ndash;91 </sup>&lt; 0.05; 3.86 &times; 10<sup>&ndash;81</sup>; 1.27 &times; 10<sup>&ndash;97 </sup>&lt; 0.05; 3.46 &times; 10<sup>&ndash;57 </sup>&lt; 0.05) between the shell weight (dependent variable) and shell thickness, or egg weight (independent variable); in the conventional cage, statistical dependence also occurred between the eggshell weight and egg shape index, (1.07 &times; 10<sup>&ndash;6 </sup>&lt; 0.05), in the enriched cage this was on the verge of statistical significance (0.062 &gt; 0.05); if in the conventional cage the eggshell thickness was increased by 1 mm, the shell deflection decreased by 0.08 mm,and if the force necessary for the eggshell destruction was increased by 1 N, the shell deflection decreased by 0.0003 mm; if in the conventional cage the shell thickness was increased by 1 mm, the shell weight increasee by 15.509 g and if the egg weight was increased by 1 g, the shell weight increased by 0.061 g. Our work brings further knowledge concerning the monitored characteristics and their mutual relations.


Sign in / Sign up

Export Citation Format

Share Document