An observational analysis of the trope "A p-value of less-than 0.05 was considered statistically significant" and other cut-and-paste statistical methods

Appropriate descriptions of statistical methods are essential for evaluating research quality and reproducibility. Despite continued efforts to improve reporting in publications, inadequate descriptions of statistical methods persist. At times, reading statistical methods sections can conjure feelings of deja vu, with content resembling cut-and-pasted or "boilerplate text" from already published work.We analyzed text extracted from published statistical methods sections to evaluate the amount of recycled text. Topic modeling was applied to analyze data from 111,731 papers published in PLOS ONE and 9,632 studies from the Australian and New Zealand Clinical Trials Registry (ANZCTR). PLOS ONE topics emphasized definitions of statistical significance, software and descriptive statistics. One in three PLOS ONE papers contained at least 1 sentence that was a direct copy from another paper. 12,498 papers (11%) closely matched to the sentence "a p-value < 0.05 was considered statistically significant". Common topics across ANZCTR studies differentiated between study designs and analysis methods, with matching text found in approximately 3% of records.Our findings quantify a serious problem affecting the reporting of statistical methods and shed light on perceptions about the communication of statistics as part of the scientific process. Results further emphasize the importance of rigorous statistical review to ensure that adequate descriptions of methods are prioritized over relatively minor details such as p-values and software when reporting research outcomes.

Download Full-text

Abstract MP11: Circulating Plasma Biomarkers Associated With Brain Arteriovenous Malformations

Stroke ◽

10.1161/str.52.suppl_1.mp11 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Sarah E Wetzel-Strong ◽

Shantel M Weinsheimer ◽

Jeffrey Nelson ◽

Ludmila Pawlikowska ◽

Dewi Clark ◽

...

Keyword(s):

Multiple Testing ◽

Statistical Significance ◽

Protein Profiling ◽

P Value ◽

P Values ◽

Plasma Biomarkers ◽

Standard Curve ◽

Disease States ◽

Heparin Plasma ◽

Circulating Levels

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.

Download Full-text

Making Decisions with Data: Understanding Hypothesis Testing & Statistical Significance

The American Biology Teacher ◽

10.1525/abt.2019.81.8.535 ◽

2019 ◽

Vol 81 (8) ◽

pp. 535-542

Author(s):

Robert A. Cooper

Keyword(s):

Hypothesis Testing ◽

Statistical Methods ◽

Scientific Literacy ◽

Statistical Significance ◽

Statistical Hypothesis ◽

Statistical Hypothesis Testing ◽

P Values ◽

Using Data ◽

Science Classes ◽

Practice Of Science

Statistical methods are indispensable to the practice of science. But statistical hypothesis testing can seem daunting, with P-values, null hypotheses, and the concept of statistical significance. This article explains the concepts associated with statistical hypothesis testing using the story of “the lady tasting tea,” then walks the reader through an application of the independent-samples t-test using data from Peter and Rosemary Grant's investigations of Darwin's finches. Understanding how scientists use statistics is an important component of scientific literacy, and students should have opportunities to use statistical methods like this in their science classes.

Download Full-text

The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too

Journal of Biostatistics and Epidemiology ◽

10.18502/jbe.v5i4.3862 ◽

2020 ◽

Author(s):

Abhaya Indrayan

Keyword(s):

Type I Error ◽

Dominant Role ◽

Statistical Significance ◽

Empirical Studies ◽

P Value ◽

Selective Reporting ◽

Type I ◽

Practical Application ◽

P Values ◽

Zero Effect

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.

Download Full-text

P values: from suggestion to superstition

Journal of Investigative Medicine ◽

10.1136/jim-2016-000206 ◽

2016 ◽

Vol 64 (7) ◽

pp. 1166-1171 ◽

Cited By ~ 13

Author(s):

John Concato ◽

John A Hartigan

Keyword(s):

Clinical Research ◽

Statistical Significance ◽

Historical Context ◽

P Value ◽

Threshold Probability ◽

Conceptual Approach ◽

P Values ◽

Clinical Investigations ◽

Current Usage ◽

Genomic Studies

A threshold probability value of ‘p≤0.05’ is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10−8 and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure–outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure–outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided ‘win’ (p≤0.05) or ‘lose’ (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10−8 has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.

Download Full-text

Comparison of Six Statistical Methods for Interrupted time Series Studies: Empirical Evaluation of 190 Published Series

10.21203/rs.3.rs-118335/v1 ◽

2020 ◽

Author(s):

Simon Turner ◽

Amalia Karahalios ◽

Andrew Forbes ◽

Monica Taljaard ◽

Jeremy Grimshaw ◽

...

Keyword(s):

Time Series ◽

Statistical Method ◽

Statistical Methods ◽

Statistical Significance ◽

Empirical Evaluation ◽

Interrupted Time Series ◽

Series Data ◽

Standard Errors ◽

P Values ◽

The Impact

Abstract Background The Interrupted Time Series (ITS) is a quasi-experimental design commonly used in public health to evaluate the impact of interventions or exposures. Multiple statistical methods are available to analyse data from ITS studies, but no empirical investigation has examined how the different methods compare when applied to real-world datasets. MethodsA random sample of 200 ITS studies identified in a previous methods review were included. Time series data from each of these studies was sought. Each dataset was re-analysed using six statistical methods. Point and confidence interval estimates for level and slope changes, standard errors, p-values and estimates of autocorrelation were compared between methods. ResultsFrom the 200 ITS studies, including 230 time series, 190 datasets were obtained. We found that the choice of statistical method can importantly affect the level and slope change point estimates, their standard errors, width of confidence intervals and p-values. Statistical significance (categorised at the 5% level) often differed across the pairwise comparisons of methods, ranging from 4% to 25% disagreement. Estimates of autocorrelation differed depending on the method used and the length of the series. ConclusionsThe choice of statistical method in ITS studies can lead to substantially different conclusions about the impact of the interruption. Pre-specification of the statistical method is encouraged, and naive conclusions based on statistical significance should be avoided.

Download Full-text

Visualization Strategies for Regression Estimates with Randomization Inference

10.31235/osf.io/bsd7g ◽

2019 ◽

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.

Download Full-text

P543Cardiac resynchronization therapy in left ventricular non-compaction: long-term results in a series of 40 patients

EP Europace ◽

10.1093/europace/euaa162.155 ◽

2020 ◽

Vol 22 (Supplement_1) ◽

Author(s):

D A Radu ◽

C N Iorgulescu ◽

S N Bogdan ◽

A I Deaconu ◽

A Badiul ◽

...

Keyword(s):

Statistical Significance ◽

Systolic Dysfunction ◽

Serum Levels ◽

Left Ventricular ◽

Long Term Results ◽

P Value ◽

P Values ◽

Mean Differences

Abstract Background Left ventricular non-compaction (LVNC) is a structural cardiomyopathy (SC) with a high probability of LV systolic dysfunction. Left bundle branch block (LBBB) frequently occurs in SCs. Purpose We sought to analyse the evolution of LVNC-CRT (LC) patients in general and compare it with the non-LVNC-CRT group (nLC). Methods We analysed 40 patients with contrast-MRI documented LVNC (concomitant positive Petersen and Jacquier criteria) implanted with CRT devices in CEHB. The follow-up included 7 hospital visits for each patient (between baseline and 3 years). Demographics, risk factors, usual serum levels, pre-procedural planning factors, clinical, ECG, TTE and biochemical markers were recorded. Statistical analysis was performed using software. Notable differences were reported as either p-values from crosstabs (discrete) or mean differences, p-values and confidence intervals from t-tests (continuous). A p-value of .05 was chosen for statistical significance (SS). Results Subjects in LC were younger (-7.52 ys; <.000; (-3.617;-11.440)), with no sex predominance, more obese (45.9 vs. 28.3%; <0.24) and had less ischaemic disease (17.9 vs. 39.7%; <.007). LC implants were usually CRT-Ds (91 vs. 49.5%; <.000) and more frequently MPP-ready (35.8 vs. 8.4%; <.000). At baseline, sinus rhythm was predominant in LC (97.4 vs. 79.8%; <.007) and permitted frequent use of optimal fusion CRT (75.5 vs. 46.6%; <.002). Although initial LVEFs were similar, LCs had much larger EDVs (+48.91 ml; <.020; (+7.705;+90.124)) and ESVs (+34.91; <.05; (+1.657;+71.478)). After an initial encouraging ⁓ 1 year evolution the LC-CRT group crashed its performance in terms of both LVEF and volumes. Thus, at 1 year follow-up, when compared to nLCs, LVEFs were far lower (-22.02%; <.000; (-32.29;-11.76)) while EDVs and ESVs much higher – (+70.8 ml; <.037; (+49.27;+ 189.65)) and (+100.13; <.039; (+5.25;+195)) respectively – in LCs in spite of similarly corrected dyssynchrony. The mean mitral regurgitation (MR) degree at 1 year was much higher in LCs (+1.8 classes; <.002; (+0.69;+2.97)) certainly contributing to the poor results. The cumulated super-responder/responder (SR/R) rates were constantly lower and decreasing at both 1 year (37.5 vs. 72.4; <.040) and 2 years of follow-up (10.1 vs. 80%; NS). Conclusions CRT candidates with LVNC are significantly more severe at the time of implant. After an initial short-term improvement (probably due to acute correction of dyssynchrony) most patients fail to respond in the long term. Severe dilation with important secondary MR probably plays an important role.

Download Full-text

Can texture analysis of pre-immunotherapy CT imaging predict clinical outcomes for patients with advanced NSCLC treated with Nivolumab?

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e20720 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e20720-e20720 ◽

Cited By ~ 1

Author(s):

Benjamin Oren Spieler ◽

Diana Saravia ◽

Gilberto Lopes ◽

Gregory Azzam ◽

Deukwoo Kwon ◽

...

Keyword(s):

Texture Analysis ◽

Clinical Outcomes ◽

Multiple Testing ◽

Advanced Nsclc ◽

Statistical Significance ◽

Texture Features ◽

Patient Characteristics ◽

Ct Imaging ◽

P Value ◽

P Values

e20720 Background: Targeted therapies are ineffective in most NSCLC patients and response rates remain < 20% for patients with advanced NSCLC on immuno-monotherapy. Predictive models that distinguish responders from non-responders to immunotherapy could help guide clinical practice. Texture analysis is a data-mining tool used to identify intensity patterns in diagnostic imaging. We hypothesized that texture features on pre-immunotherapy CT imaging can be associated with clinical outcomes for patients with advanced NSCLC treated with Nivolumab. Methods: In an IRB-approved database containing 159 patients with advanced NSCLC treated with Nivolumab monotherapy, 20 patients with the longest overall survival (OS) and 20 with the shortest were selected for retrospective analysis. Patient characteristics were compared using paired t-tests. The last pre-immunotherapy PET CT for each patient was transferred to MIM software for segmentation. All FDG-avid intrathoracic tumors were delineated on the CT scan per RTOG contouring guidelines. Ninety-two texture features within each tumor were analyzed for association with the primary endpoint, OS. OS time was dichotomized to less than 1 year vs. more than 1 year. A univariate logistic regression model was used to estimate odds ratio (OR), 95% confidence interval and p-value for each feature. Multiple testing adjustments were performed using false discovery rate. Results: Eleven out of 92 texture features showed significant association with OS time (p-values from 0.009 to 0.044), of which 7 exhibited large effect (OR < 0.5 or > 1.5). Fifteen additional texture features trended toward statistical significance with p-values from .05 to .10. In all, 26 out of the 92 texture features showed significant association or trended toward significance with duration of OS. Conclusions: This preliminary study suggests that texture features on pre-immunotherapy CT imaging may help in predicting OS duration for patients with advanced NSCLC treated with Nivolumab monotherapy. We are in the process of validating a multivariate predictive model. Future directions include expansion of this study across the full database, survival analyses and correlation of texture features with tissue biology.

Download Full-text

H-Tuple Approach to Evaluate Statistical Significance of Biological Sequence Comparison with Gaps

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1272 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Afshin Fayyaz movaghar ◽

Sabine Mercier ◽

Louis Ferré

Keyword(s):

Sequence Comparison ◽

Numerical Experiments ◽

Statistical Significance ◽

P Value ◽

Biological Sequence ◽

P Values ◽

Local Score ◽

Approximate Distribution ◽

Biological Sequence Comparison ◽

New Scoring

We propose an approximate distribution for the gapped local score of a two sequence comparison. Our method stands on combining an adapted scoring scheme that includes the gaps and an approximate distribution of the ungapped local score of two independent sequences of i.i.d. random variables. The new scoring scheme is defined on h-tuples of the sequences, using the gapped global score. The influence of h and the accuracy of the p-value are numerically studied and compared with obtained p-value of BLAST. The numerical experiments emphasize that our approximate p-values outperform the BLAST ones, particularly for both simulated and real short sequences.

Download Full-text

Falacias sobre el valor p compartidas por profesores y estudiantes universitarios

Universitas Psychologica ◽

10.11144/javeriana.upsy16-3.fvcp ◽

2017 ◽

Vol 16 (3) ◽

pp. 1

Author(s):

Laura Badenes-Ribera ◽

Dolores Frias-Navarro

Keyword(s):

College Students ◽

Statistical Significance ◽

Psychological Research ◽

Practical Significance ◽

P Value ◽

P Values ◽

Statistical Education ◽

The Mean ◽

Estudiantes Universitarios

Resumen La “Práctica Basada en la Evidencia” exige que los profesionales valoren de forma crítica los resultados de las investigaciones psicológicas. Sin embargo, las interpretaciones incorrectas de los valores p de probabilidad son abundantes y repetitivas. Estas interpretaciones incorrectas afectan a las decisiones profesionales y ponen en riesgo la calidad de las intervenciones y la acumulación de un conocimiento científico válido. Identificar el tipo de falacia que subyace a las decisiones estadísticas es fundamental para abordar y planificar estrategias de educación estadística dirigidas a intervenir sobre las interpretaciones incorrectas. En consecuencia, el objetivo de este estudio es analizar la interpretación del valor p en estudiantes y profesores universitarios de Psicología. La muestra estuvo formada por 161 participantes (43 profesores y 118 estudiantes). La antigüedad media como profesor fue de 16.7 años (DT = 10.07). La edad media de los estudiantes fue de 21.59 (DT = 1.3). Los hallazgos sugieren que los estudiantes y profesores universitarios no conocen la interpretación correcta del valor p. La falacia de la probabilidad inversa presenta mayores problemas de comprensión. Además, se confunde la significación estadística y la significación práctica o clínica. Estos resultados destacan la necesidad de la educación estadística y re-educación estadística. Abstract The "Evidence Based Practice" requires professionals to critically assess the results of psychological research. However, incorrect interpretations of p values of probability are abundant and repetitive. These misconceptions affect professional decisions and compromise the quality of interventions and the accumulation of a valid scientific knowledge. Identifying the types of fallacies that underlying statistical decisions is fundamental for approaching and planning statistical education strategies designed to intervene in incorrect interpretations. Therefore, the aim of this study is to analyze the interpretation of p value among college students of psychology and academic psychologist. The sample was composed of 161 participants (43 academic and 118 students). The mean number of years as academic was 16.7 (SD = 10.07). The mean age of college students was 21.59 years (SD = 1.3). The findings suggest that college students and academic do not know the correct interpretation of p values. The fallacy of the inverse probability presents major problems of comprehension. In addition, statistical significance and practical significance or clinical are confused. There is a need for statistical education and statistical re-education.

Download Full-text