Accounting for Multiple Comparisons in Statistical Analysis of the Extensive Bioassay Data on Glyphosate

Kenny Crump; Edmund Crouch; Daniel Zelterman; Casey Crump; Joseph Haseman

doi:10.1093/toxsci/kfaa039

Accounting for Multiple Comparisons in Statistical Analysis of the Extensive Bioassay Data on Glyphosate

Toxicological Sciences ◽

10.1093/toxsci/kfaa039 ◽

2020 ◽

Vol 175 (2) ◽

pp. 156-167 ◽

Cited By ~ 1

Author(s):

Kenny Crump ◽

Edmund Crouch ◽

Daniel Zelterman ◽

Casey Crump ◽

Joseph Haseman

Keyword(s):

Dose Response ◽

False Positive ◽

Statistical Significance ◽

Permutation Tests ◽

Standard Test ◽

Statistical Testing ◽

International Agency ◽

P Value ◽

Test Statistics ◽

P Values

Abstract Glyphosate is a widely used herbicide worldwide. In 2015, the International Agency for Research on Cancer (IARC) reviewed glyphosate cancer bioassays and human studies and declared that the evidence for carcinogenicity of glyphosate is sufficient in experimental animals. We analyzed 10 glyphosate rodent bioassays, including those in which IARC found evidence of carcinogenicity, using a multiresponse permutation procedure that adjusts for the large number of tumors eligible for statistical testing and provides valid false-positive probabilities. The test statistics for these permutation tests are functions of p values from a standard test for dose-response trend applied to each specific type of tumor. We evaluated 3 permutation tests, using as test statistics the smallest p value from a standard statistical test for dose-response trend and the number of such tests for which the p value is less than or equal to .05 or .01. The false-positive probabilities obtained from 2 implementations of these 3 permutation tests are: smallest p value: .26, .17; p values ≤ .05: .08, .12; and p values ≤ .01: .06, .08. In addition, we found more evidence for negative dose-response trends than positive. Thus, we found no strong evidence that glyphosate is an animal carcinogen. The main cause for the discrepancy between IARC’s finding and ours appears to be that IARC did not account for the large number of tumor responses analyzed and the increased likelihood that several of these would show statistical significance simply by chance. This work provides a more comprehensive analysis of the animal carcinogenicity data for this important herbicide than previously available.

Download Full-text

Visualization Strategies for Regression Estimates with Randomization Inference

10.31235/osf.io/bsd7g ◽

2019 ◽

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.

Download Full-text

Visualization strategies for regression estimates with randomization inference

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x20930999 ◽

2020 ◽

Vol 20 (2) ◽

pp. 309-335

Author(s):

Marshall A. Taylor

Keyword(s):

Confidence Interval ◽

Confidence Intervals ◽

Regression Models ◽

Statistical Significance ◽

Permutation Tests ◽

P Value ◽

P Values ◽

Alpha Level ◽

Significance Levels ◽

Nonprobability Sample

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.

Download Full-text

Abstract MP11: Circulating Plasma Biomarkers Associated With Brain Arteriovenous Malformations

Stroke ◽

10.1161/str.52.suppl_1.mp11 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Sarah E Wetzel-Strong ◽

Shantel M Weinsheimer ◽

Jeffrey Nelson ◽

Ludmila Pawlikowska ◽

Dewi Clark ◽

...

Keyword(s):

Multiple Testing ◽

Statistical Significance ◽

Protein Profiling ◽

P Value ◽

P Values ◽

Plasma Biomarkers ◽

Standard Curve ◽

Disease States ◽

Heparin Plasma ◽

Circulating Levels

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.

Download Full-text

Reporting Correct p Values in VEGAS Analyses

Twin Research and Human Genetics ◽

10.1017/thg.2017.16 ◽

2017 ◽

Vol 20 (3) ◽

pp. 257-259 ◽

Cited By ~ 2

Author(s):

Julian Hecker ◽

Anna Maaser ◽

Dmitry Prokopenko ◽

Heide Loehlein Fier ◽

Christoph Lange

Keyword(s):

Linkage Disequilibrium ◽

False Positive ◽

Real Data ◽

Summary Statistics ◽

Methodological Framework ◽

Test Statistics ◽

P Values ◽

Different Types ◽

Linkage Disequilibrium Information ◽

User Friendly

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.

Download Full-text

The Conundrum of P-Values: Statistical Significance is Unavoidable but Need Medical Significance Too

Journal of Biostatistics and Epidemiology ◽

10.18502/jbe.v5i4.3862 ◽

2020 ◽

Author(s):

Abhaya Indrayan

Keyword(s):

Type I Error ◽

Dominant Role ◽

Statistical Significance ◽

Empirical Studies ◽

P Value ◽

Selective Reporting ◽

Type I ◽

Practical Application ◽

P Values ◽

Zero Effect

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.

Download Full-text

Does the P Value Have a Future in Plant Pathology?

Phytopathology ◽

10.1094/phyto-07-15-0165-le ◽

2015 ◽

Vol 105 (11) ◽

pp. 1400-1407 ◽

Cited By ~ 8

Author(s):

L. V. Madden ◽

D. A. Shah ◽

P. D. Esker

Keyword(s):

Data Analysis ◽

Null Hypothesis ◽

Posterior Probability ◽

Prior Probability ◽

Statistical Significance ◽

Statistical Testing ◽

P Value ◽

Significance Level ◽

Continued Use ◽

Null Hypothesis Statistical Testing

The P value (significance level) is possibly the mostly widely used, and also misused, quantity in data analysis. P has been heavily criticized on philosophical and theoretical grounds, especially from a Bayesian perspective. In contrast, a properly interpreted P has been strongly defended as a measure of evidence against the null hypothesis, H0. We discuss the meaning of P and null-hypothesis statistical testing, and present some key arguments concerning their use. P is the probability of observing data as extreme as, or more extreme than, the data actually observed, conditional on H0 being true. However, P is often mistakenly equated with the posterior probability that H0 is true conditional on the data, which can lead to exaggerated claims about the effect of a treatment, experimental factor or interaction. Fortunately, a lower bound for the posterior probability of H0 can be approximated using P and the prior probability that H0 is true. When one is completely uncertain about the truth of H0 before an experiment (i.e., when the prior probability of H0 is 0.5), the posterior probability of H0 is much higher than P, which means that one needs P values lower than typically accepted for statistical significance (e.g., P = 0.05) for strong evidence against H0. When properly interpreted, we support the continued use of P as one component of a data analysis that emphasizes data visualization and estimation of effect sizes (treatment effects).

Download Full-text

P values: from suggestion to superstition

Journal of Investigative Medicine ◽

10.1136/jim-2016-000206 ◽

2016 ◽

Vol 64 (7) ◽

pp. 1166-1171 ◽

Cited By ~ 13

Author(s):

John Concato ◽

John A Hartigan

Keyword(s):

Clinical Research ◽

Statistical Significance ◽

Historical Context ◽

P Value ◽

Threshold Probability ◽

Conceptual Approach ◽

P Values ◽

Clinical Investigations ◽

Current Usage ◽

Genomic Studies

A threshold probability value of ‘p≤0.05’ is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10−8 and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure–outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure–outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided ‘win’ (p≤0.05) or ‘lose’ (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10−8 has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.

Download Full-text

Dealing with false positive risk as an indicator of misperceived effectiveness of conservation interventions

PLoS ONE ◽

10.1371/journal.pone.0255784 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255784

Author(s):

Igor Khorozyan

Keyword(s):

False Positive ◽

Control Sample ◽

Statistical Testing ◽

Test Results ◽

P Values ◽

Management Actions ◽

Conservation Actions ◽

And Control ◽

Effective Conservation ◽

Conservation Interventions

As human pressures on the environment continue to spread and intensify, effective conservation interventions are direly needed to prevent threats, reduce conflicts, and recover populations and landscapes in a liaison between science and conservation. It is practically important to discriminate between true and false (or misperceived) effectiveness of interventions as false perceptions may shape a wrong conservation agenda and lead to inappropriate decisions and management actions. This study used the false positive risk (FPR) to estimate the rates of misperceived effectiveness of electric fences (overstated if reported as effective but actually ineffective based on FPR; understated otherwise), explain their causes and propose recommendations on how to improve the representation of true effectiveness. Electric fences are widely applied to reduce damage to fenced assets, such as livestock and beehives, or increase survival of fenced populations. The analysis of 109 cases from 50 publications has shown that the effectiveness of electric fences was overstated in at least one-third of cases, from 31.8% at FPR = 0.2 (20% risk) to 51.1% at FPR = 0.05 (5% risk, true effectiveness). In contrast, understatement reduced from 23.8% to 9.5% at these thresholds of FPR. This means that truly effective applications of electric fences were only 48.9% of all cases reported as effective, but truly ineffective cases were 90.5%, implying that the effectiveness of electric fences was heavily overstated. The main reasons of this bias were the lack of statistical testing or improper reporting of test results (63.3% of cases) and interpretation of marginally significant results (p < 0.05, p < 0.1 and p around 0.05) as indicators of effectiveness (10.1%). In conclusion, FPR is an important tool for estimating true effectiveness of conservation interventions and its application is highly recommended to disentangle true and false effectiveness for planning appropriate conservation actions. Researchers are encouraged to calculate FPR, publish its constituent statistics (especially treatment and control sample sizes) and explicitly provide test results with p values. It is suggested to call the effectiveness “true” if FPR < 0.05, “suggestive” if 0.05 ≤ FPR < 0.2 and “false” if FPR ≥ 0.2.

Download Full-text

Mind your gaps: Overlooking assembly gaps confounds statistical testing in genome analysis

10.1101/252973 ◽

2018 ◽

Cited By ~ 1

Author(s):

Diana Domanska ◽

Chakravarthi Kanduri ◽

Boris Simovski ◽

Geir Kjetil Sandve

Keyword(s):

Statistical Tests ◽

Statistical Significance ◽

Null Model ◽

Statistical Testing ◽

Slight Reduction ◽

Test Statistic ◽

Experimental Conditions ◽

P Values ◽

Localization Analysis ◽

The Right

AbstractBackgroundThe difficulties associated with sequencing and assembling some regions of the DNA sequence result in gaps in the reference genomes that are typically represented as stretches of Ns. Although the presence of assembly gaps causes a slight reduction in the mapping rate in many experimental settings, that does not invalidate the typical statistical testing comparing read count distributions across experimental conditions. However, we hypothesize that not handling assembly gaps in the null model may confound statistical testing of co-localization of genomic features.ResultsFirst, we performed a series of explorative analyses to understand whether and how the public genomic tracks intersect the assembly gaps track (hg19). The findings rightly confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps and the intersection was observed only at the beginning and end regions of the assembly gaps rather than covering the whole gap sizes. Further, we simulated a set of query and reference genomic tracks in a way that nullified any dependence between them to test our hypothesis that not avoiding assembly gaps in the null model would result in spurious inflation of statistical significance. We then contrasted the distributions of test statistics and p-values of Monte Carlo simulation-based permutation tests that either avoided or not avoided assembly gaps in the null model when testing for significant co-localization between a pair of query and reference tracks. We observed that the statistical tests that did not account for the assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribu tion of p-values that is shifted to the left (leading to inflated significance).ConclusionOur results shows that not accounting for assembly gaps in statistical testing of co-localization analysis may lead to false positives and over-optimistic findings.

Download Full-text

P543Cardiac resynchronization therapy in left ventricular non-compaction: long-term results in a series of 40 patients

EP Europace ◽

10.1093/europace/euaa162.155 ◽

2020 ◽

Vol 22 (Supplement_1) ◽

Author(s):

D A Radu ◽

C N Iorgulescu ◽

S N Bogdan ◽

A I Deaconu ◽

A Badiul ◽

...

Keyword(s):

Statistical Significance ◽

Systolic Dysfunction ◽

Serum Levels ◽

Left Ventricular ◽

Long Term Results ◽

P Value ◽

P Values ◽

Mean Differences

Abstract Background Left ventricular non-compaction (LVNC) is a structural cardiomyopathy (SC) with a high probability of LV systolic dysfunction. Left bundle branch block (LBBB) frequently occurs in SCs. Purpose We sought to analyse the evolution of LVNC-CRT (LC) patients in general and compare it with the non-LVNC-CRT group (nLC). Methods We analysed 40 patients with contrast-MRI documented LVNC (concomitant positive Petersen and Jacquier criteria) implanted with CRT devices in CEHB. The follow-up included 7 hospital visits for each patient (between baseline and 3 years). Demographics, risk factors, usual serum levels, pre-procedural planning factors, clinical, ECG, TTE and biochemical markers were recorded. Statistical analysis was performed using software. Notable differences were reported as either p-values from crosstabs (discrete) or mean differences, p-values and confidence intervals from t-tests (continuous). A p-value of .05 was chosen for statistical significance (SS). Results Subjects in LC were younger (-7.52 ys; <.000; (-3.617;-11.440)), with no sex predominance, more obese (45.9 vs. 28.3%; <0.24) and had less ischaemic disease (17.9 vs. 39.7%; <.007). LC implants were usually CRT-Ds (91 vs. 49.5%; <.000) and more frequently MPP-ready (35.8 vs. 8.4%; <.000). At baseline, sinus rhythm was predominant in LC (97.4 vs. 79.8%; <.007) and permitted frequent use of optimal fusion CRT (75.5 vs. 46.6%; <.002). Although initial LVEFs were similar, LCs had much larger EDVs (+48.91 ml; <.020; (+7.705;+90.124)) and ESVs (+34.91; <.05; (+1.657;+71.478)). After an initial encouraging ⁓ 1 year evolution the LC-CRT group crashed its performance in terms of both LVEF and volumes. Thus, at 1 year follow-up, when compared to nLCs, LVEFs were far lower (-22.02%; <.000; (-32.29;-11.76)) while EDVs and ESVs much higher – (+70.8 ml; <.037; (+49.27;+ 189.65)) and (+100.13; <.039; (+5.25;+195)) respectively – in LCs in spite of similarly corrected dyssynchrony. The mean mitral regurgitation (MR) degree at 1 year was much higher in LCs (+1.8 classes; <.002; (+0.69;+2.97)) certainly contributing to the poor results. The cumulated super-responder/responder (SR/R) rates were constantly lower and decreasing at both 1 year (37.5 vs. 72.4; <.040) and 2 years of follow-up (10.1 vs. 80%; NS). Conclusions CRT candidates with LVNC are significantly more severe at the time of implant. After an initial short-term improvement (probably due to acute correction of dyssynchrony) most patients fail to respond in the long term. Severe dilation with important secondary MR probably plays an important role.

Download Full-text