Understanding significance and p-values

Correction 7th May 2016: p.523, Column 1, second paragraph, line 10 - 'the quantiles at the extremes - say 5%, 1%, which were useful' was changed TO 'the quantiles at the extremes - say 10%, 5%, 2%, and 1%, which were useful'.p.523, Column 2, second paragraph, line 3 - 'this is misconstrued with 'statistical significance' changed TO 'mistaken with 'statistical significance'.Since the p-value is a single index, following the ASA’s statement, we strongly support that it cannot and should not be considered as the sole basis for scientific reasoning. Given the misuses and misconceptions concerning p-values, the recommendation is to present the estimate of the effect, provide a measure of uncertainty of the estimation (e.g. confidence interval), and interpret the results in terms of scientific importance.

Download Full-text

Understanding significance and p-values

Nepal Journal of Epidemiology ◽

10.3126/nje.v1i1.14732 ◽

2016 ◽

Vol 6 (1) ◽

pp. 522

Author(s):

Shrikant I Bangdiwala

Keyword(s):

Confidence Interval ◽

Scientific Reasoning ◽

P Value ◽

P Values ◽

Single Index ◽

Measure Of Uncertainty

<p>Since the p-value is a single index, following the ASA’s statement, we strongly support that it cannot and should not be considered as the sole basis for scientific reasoning. Given the misuses and misconceptions concerning p-values, the recommendation is to present the estimate of the effect, provide a measure of uncertainty of the estimation (e.g. confidence interval), and interpret the results in terms of scientific importance. </p>

Download Full-text

Visualization Strategies for Regression Estimates with Randomization Inference

10.31235/osf.io/bsd7g ◽

2019 ◽

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.

Download Full-text

Visualization strategies for regression estimates with randomization inference

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x20930999 ◽

2020 ◽

Vol 20 (2) ◽

pp. 309-335

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.

Download Full-text

Abstract MP11: Circulating Plasma Biomarkers Associated With Brain Arteriovenous Malformations

Stroke ◽

10.1161/str.52.suppl_1.mp11 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Sarah E Wetzel-Strong ◽

Shantel M Weinsheimer ◽

Jeffrey Nelson ◽

Ludmila Pawlikowska ◽

Dewi Clark ◽

...

Keyword(s):

Multiple Testing ◽

Statistical Significance ◽

Protein Profiling ◽

P Value ◽

P Values ◽

Plasma Biomarkers ◽

Standard Curve ◽

Disease States ◽

Heparin Plasma ◽

Circulating Levels

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.

Download Full-text

The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too

Journal of Biostatistics and Epidemiology ◽

10.18502/jbe.v5i4.3862 ◽

2020 ◽

Author(s):

Abhaya Indrayan

Keyword(s):

Type I Error ◽

Dominant Role ◽

Statistical Significance ◽

Empirical Studies ◽

P Value ◽

Selective Reporting ◽

Type I ◽

Practical Application ◽

P Values ◽

Zero Effect

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.

Download Full-text

P values: from suggestion to superstition

Journal of Investigative Medicine ◽

10.1136/jim-2016-000206 ◽

2016 ◽

Vol 64 (7) ◽

pp. 1166-1171 ◽

Cited By ~ 13

Author(s):

John Concato ◽

John A Hartigan

Keyword(s):

Clinical Research ◽

Statistical Significance ◽

Historical Context ◽

P Value ◽

Threshold Probability ◽

Conceptual Approach ◽

P Values ◽

Clinical Investigations ◽

Current Usage ◽

Genomic Studies

A threshold probability value of ‘p≤0.05’ is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10−8 and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure–outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure–outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided ‘win’ (p≤0.05) or ‘lose’ (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10−8 has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.

Download Full-text

StaTips Part VIII: Confidence interval for the sample mean

South European Journal of Orthodontics and Dentofacial Research ◽

10.5937/sejodr7-27200 ◽

2020 ◽

Vol 7 (1) ◽

pp. 2-3

Author(s):

Giuseppe Perinetti

Keyword(s):

Confidence Interval ◽

Small Group ◽

Statistical Significance ◽

P Value ◽

Sample Mean ◽

Type Population ◽

The Difference ◽

Conducting Research

When conducting research on a given type of patients, it is impossible to examine all the existing subjects of that type (population)to derive the true mean of the parameter of interest. More realistically, by the investigation of a small group of subjects (sample) fromthe whole population, researchers can estimate an interval into which the true mean of the population lies. In statistics, such interval isreferred to as confidence interval (CI). The calculation of the CI from a sample mean is simple and gives important information, not onlyregarding the true mean of the population, but also on the statistical significance of the difference between groups being compared. Forthese reasons, the reporting of the CIs is preferred over the p value alone.

Download Full-text

P543Cardiac resynchronization therapy in left ventricular non-compaction: long-term results in a series of 40 patients

EP Europace ◽

10.1093/europace/euaa162.155 ◽

2020 ◽

Vol 22 (Supplement_1) ◽

Author(s):

D A Radu ◽

C N Iorgulescu ◽

S N Bogdan ◽

A I Deaconu ◽

A Badiul ◽

...

Keyword(s):

Statistical Significance ◽

Systolic Dysfunction ◽

Serum Levels ◽

Left Ventricular ◽

Long Term Results ◽

P Value ◽

P Values ◽

Mean Differences

Abstract Background Left ventricular non-compaction (LVNC) is a structural cardiomyopathy (SC) with a high probability of LV systolic dysfunction. Left bundle branch block (LBBB) frequently occurs in SCs. Purpose We sought to analyse the evolution of LVNC-CRT (LC) patients in general and compare it with the non-LVNC-CRT group (nLC). Methods We analysed 40 patients with contrast-MRI documented LVNC (concomitant positive Petersen and Jacquier criteria) implanted with CRT devices in CEHB. The follow-up included 7 hospital visits for each patient (between baseline and 3 years). Demographics, risk factors, usual serum levels, pre-procedural planning factors, clinical, ECG, TTE and biochemical markers were recorded. Statistical analysis was performed using software. Notable differences were reported as either p-values from crosstabs (discrete) or mean differences, p-values and confidence intervals from t-tests (continuous). A p-value of .05 was chosen for statistical significance (SS). Results Subjects in LC were younger (-7.52 ys; <.000; (-3.617;-11.440)), with no sex predominance, more obese (45.9 vs. 28.3%; <0.24) and had less ischaemic disease (17.9 vs. 39.7%; <.007). LC implants were usually CRT-Ds (91 vs. 49.5%; <.000) and more frequently MPP-ready (35.8 vs. 8.4%; <.000). At baseline, sinus rhythm was predominant in LC (97.4 vs. 79.8%; <.007) and permitted frequent use of optimal fusion CRT (75.5 vs. 46.6%; <.002). Although initial LVEFs were similar, LCs had much larger EDVs (+48.91 ml; <.020; (+7.705;+90.124)) and ESVs (+34.91; <.05; (+1.657;+71.478)). After an initial encouraging ⁓ 1 year evolution the LC-CRT group crashed its performance in terms of both LVEF and volumes. Thus, at 1 year follow-up, when compared to nLCs, LVEFs were far lower (-22.02%; <.000; (-32.29;-11.76)) while EDVs and ESVs much higher – (+70.8 ml; <.037; (+49.27;+ 189.65)) and (+100.13; <.039; (+5.25;+195)) respectively – in LCs in spite of similarly corrected dyssynchrony. The mean mitral regurgitation (MR) degree at 1 year was much higher in LCs (+1.8 classes; <.002; (+0.69;+2.97)) certainly contributing to the poor results. The cumulated super-responder/responder (SR/R) rates were constantly lower and decreasing at both 1 year (37.5 vs. 72.4; <.040) and 2 years of follow-up (10.1 vs. 80%; NS). Conclusions CRT candidates with LVNC are significantly more severe at the time of implant. After an initial short-term improvement (probably due to acute correction of dyssynchrony) most patients fail to respond in the long term. Severe dilation with important secondary MR probably plays an important role.

Download Full-text

Can texture analysis of pre-immunotherapy CT imaging predict clinical outcomes for patients with advanced NSCLC treated with Nivolumab?

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e20720 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e20720-e20720 ◽

Cited By ~ 1

Author(s):

Benjamin Oren Spieler ◽

Diana Saravia ◽

Gilberto Lopes ◽

Gregory Azzam ◽

Deukwoo Kwon ◽

...

Keyword(s):

Texture Analysis ◽

Clinical Outcomes ◽

Multiple Testing ◽

Advanced Nsclc ◽

Statistical Significance ◽

Texture Features ◽

Patient Characteristics ◽

Ct Imaging ◽

P Value ◽

P Values

e20720 Background: Targeted therapies are ineffective in most NSCLC patients and response rates remain < 20% for patients with advanced NSCLC on immuno-monotherapy. Predictive models that distinguish responders from non-responders to immunotherapy could help guide clinical practice. Texture analysis is a data-mining tool used to identify intensity patterns in diagnostic imaging. We hypothesized that texture features on pre-immunotherapy CT imaging can be associated with clinical outcomes for patients with advanced NSCLC treated with Nivolumab. Methods: In an IRB-approved database containing 159 patients with advanced NSCLC treated with Nivolumab monotherapy, 20 patients with the longest overall survival (OS) and 20 with the shortest were selected for retrospective analysis. Patient characteristics were compared using paired t-tests. The last pre-immunotherapy PET CT for each patient was transferred to MIM software for segmentation. All FDG-avid intrathoracic tumors were delineated on the CT scan per RTOG contouring guidelines. Ninety-two texture features within each tumor were analyzed for association with the primary endpoint, OS. OS time was dichotomized to less than 1 year vs. more than 1 year. A univariate logistic regression model was used to estimate odds ratio (OR), 95% confidence interval and p-value for each feature. Multiple testing adjustments were performed using false discovery rate. Results: Eleven out of 92 texture features showed significant association with OS time (p-values from 0.009 to 0.044), of which 7 exhibited large effect (OR < 0.5 or > 1.5). Fifteen additional texture features trended toward statistical significance with p-values from .05 to .10. In all, 26 out of the 92 texture features showed significant association or trended toward significance with duration of OS. Conclusions: This preliminary study suggests that texture features on pre-immunotherapy CT imaging may help in predicting OS duration for patients with advanced NSCLC treated with Nivolumab monotherapy. We are in the process of validating a multivariate predictive model. Future directions include expansion of this study across the full database, survival analyses and correlation of texture features with tissue biology.

Download Full-text

H-Tuple Approach to Evaluate Statistical Significance of Biological Sequence Comparison with Gaps

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1272 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Afshin Fayyaz movaghar ◽

Sabine Mercier ◽

Louis Ferré

Keyword(s):

Sequence Comparison ◽

Numerical Experiments ◽

Statistical Significance ◽

P Value ◽

Biological Sequence ◽

P Values ◽

Local Score ◽

Approximate Distribution ◽

Biological Sequence Comparison ◽

New Scoring

We propose an approximate distribution for the gapped local score of a two sequence comparison. Our method stands on combining an adapted scoring scheme that includes the gaps and an approximate distribution of the ungapped local score of two independent sequences of i.i.d. random variables. The new scoring scheme is defined on h-tuples of the sequences, using the gapped global score. The influence of h and the accuracy of the p-value are numerically studied and compared with obtained p-value of BLAST. The numerical experiments emphasize that our approximate p-values outperform the BLAST ones, particularly for both simulated and real short sequences.

Download Full-text