Accounting for Multiple Comparisons in Statistical Analysis of the Extensive Bioassay Data on Glyphosate

2020 ◽  
Vol 175 (2) ◽  
pp. 156-167 ◽  
Author(s):  
Kenny Crump ◽  
Edmund Crouch ◽  
Daniel Zelterman ◽  
Casey Crump ◽  
Joseph Haseman

Abstract Glyphosate is a widely used herbicide worldwide. In 2015, the International Agency for Research on Cancer (IARC) reviewed glyphosate cancer bioassays and human studies and declared that the evidence for carcinogenicity of glyphosate is sufficient in experimental animals. We analyzed 10 glyphosate rodent bioassays, including those in which IARC found evidence of carcinogenicity, using a multiresponse permutation procedure that adjusts for the large number of tumors eligible for statistical testing and provides valid false-positive probabilities. The test statistics for these permutation tests are functions of p values from a standard test for dose-response trend applied to each specific type of tumor. We evaluated 3 permutation tests, using as test statistics the smallest p value from a standard statistical test for dose-response trend and the number of such tests for which the p value is less than or equal to .05 or .01. The false-positive probabilities obtained from 2 implementations of these 3 permutation tests are: smallest p value: .26, .17; p values ≤ .05: .08, .12; and p values ≤ .01: .06, .08. In addition, we found more evidence for negative dose-response trends than positive. Thus, we found no strong evidence that glyphosate is an animal carcinogen. The main cause for the discrepancy between IARC’s finding and ours appears to be that IARC did not account for the large number of tumor responses analyzed and the increased likelihood that several of these would show statistical significance simply by chance. This work provides a more comprehensive analysis of the animal carcinogenicity data for this important herbicide than previously available.

2019 ◽  
Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.


Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Sarah E Wetzel-Strong ◽  
Shantel M Weinsheimer ◽  
Jeffrey Nelson ◽  
Ludmila Pawlikowska ◽  
Dewi Clark ◽  
...  

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.


2017 ◽  
Vol 20 (3) ◽  
pp. 257-259 ◽  
Author(s):  
Julian Hecker ◽  
Anna Maaser ◽  
Dmitry Prokopenko ◽  
Heide Loehlein Fier ◽  
Christoph Lange

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.


Author(s):  
Abhaya Indrayan

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.


2015 ◽  
Vol 105 (11) ◽  
pp. 1400-1407 ◽  
Author(s):  
L. V. Madden ◽  
D. A. Shah ◽  
P. D. Esker

The P value (significance level) is possibly the mostly widely used, and also misused, quantity in data analysis. P has been heavily criticized on philosophical and theoretical grounds, especially from a Bayesian perspective. In contrast, a properly interpreted P has been strongly defended as a measure of evidence against the null hypothesis, H0. We discuss the meaning of P and null-hypothesis statistical testing, and present some key arguments concerning their use. P is the probability of observing data as extreme as, or more extreme than, the data actually observed, conditional on H0 being true. However, P is often mistakenly equated with the posterior probability that H0 is true conditional on the data, which can lead to exaggerated claims about the effect of a treatment, experimental factor or interaction. Fortunately, a lower bound for the posterior probability of H0 can be approximated using P and the prior probability that H0 is true. When one is completely uncertain about the truth of H0 before an experiment (i.e., when the prior probability of H0 is 0.5), the posterior probability of H0 is much higher than P, which means that one needs P values lower than typically accepted for statistical significance (e.g., P = 0.05) for strong evidence against H0. When properly interpreted, we support the continued use of P as one component of a data analysis that emphasizes data visualization and estimation of effect sizes (treatment effects).


2016 ◽  
Vol 64 (7) ◽  
pp. 1166-1171 ◽  
Author(s):  
John Concato ◽  
John A Hartigan

A threshold probability value of ‘p≤0.05’ is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10−8 and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure–outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure–outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided ‘win’ (p≤0.05) or ‘lose’ (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10−8 has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255784
Author(s):  
Igor Khorozyan

As human pressures on the environment continue to spread and intensify, effective conservation interventions are direly needed to prevent threats, reduce conflicts, and recover populations and landscapes in a liaison between science and conservation. It is practically important to discriminate between true and false (or misperceived) effectiveness of interventions as false perceptions may shape a wrong conservation agenda and lead to inappropriate decisions and management actions. This study used the false positive risk (FPR) to estimate the rates of misperceived effectiveness of electric fences (overstated if reported as effective but actually ineffective based on FPR; understated otherwise), explain their causes and propose recommendations on how to improve the representation of true effectiveness. Electric fences are widely applied to reduce damage to fenced assets, such as livestock and beehives, or increase survival of fenced populations. The analysis of 109 cases from 50 publications has shown that the effectiveness of electric fences was overstated in at least one-third of cases, from 31.8% at FPR = 0.2 (20% risk) to 51.1% at FPR = 0.05 (5% risk, true effectiveness). In contrast, understatement reduced from 23.8% to 9.5% at these thresholds of FPR. This means that truly effective applications of electric fences were only 48.9% of all cases reported as effective, but truly ineffective cases were 90.5%, implying that the effectiveness of electric fences was heavily overstated. The main reasons of this bias were the lack of statistical testing or improper reporting of test results (63.3% of cases) and interpretation of marginally significant results (p < 0.05, p < 0.1 and p around 0.05) as indicators of effectiveness (10.1%). In conclusion, FPR is an important tool for estimating true effectiveness of conservation interventions and its application is highly recommended to disentangle true and false effectiveness for planning appropriate conservation actions. Researchers are encouraged to calculate FPR, publish its constituent statistics (especially treatment and control sample sizes) and explicitly provide test results with p values. It is suggested to call the effectiveness “true” if FPR < 0.05, “suggestive” if 0.05 ≤ FPR < 0.2 and “false” if FPR ≥ 0.2.


2018 ◽  
Author(s):  
Diana Domanska ◽  
Chakravarthi Kanduri ◽  
Boris Simovski ◽  
Geir Kjetil Sandve

AbstractBackgroundThe difficulties associated with sequencing and assembling some regions of the DNA sequence result in gaps in the reference genomes that are typically represented as stretches of Ns. Although the presence of assembly gaps causes a slight reduction in the mapping rate in many experimental settings, that does not invalidate the typical statistical testing comparing read count distributions across experimental conditions. However, we hypothesize that not handling assembly gaps in the null model may confound statistical testing of co-localization of genomic features.ResultsFirst, we performed a series of explorative analyses to understand whether and how the public genomic tracks intersect the assembly gaps track (hg19). The findings rightly confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps and the intersection was observed only at the beginning and end regions of the assembly gaps rather than covering the whole gap sizes. Further, we simulated a set of query and reference genomic tracks in a way that nullified any dependence between them to test our hypothesis that not avoiding assembly gaps in the null model would result in spurious inflation of statistical significance. We then contrasted the distributions of test statistics and p-values of Monte Carlo simulation-based permutation tests that either avoided or not avoided assembly gaps in the null model when testing for significant co-localization between a pair of query and reference tracks. We observed that the statistical tests that did not account for the assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribu tion of p-values that is shifted to the left (leading to inflated significance).ConclusionOur results shows that not accounting for assembly gaps in statistical testing of co-localization analysis may lead to false positives and over-optimistic findings.


EP Europace ◽  
2020 ◽  
Vol 22 (Supplement_1) ◽  
Author(s):  
D A Radu ◽  
C N Iorgulescu ◽  
S N Bogdan ◽  
A I Deaconu ◽  
A Badiul ◽  
...  

Abstract Background Left ventricular non-compaction (LVNC) is a structural cardiomyopathy (SC) with a high probability of LV systolic dysfunction. Left bundle branch block (LBBB) frequently occurs in SCs. Purpose We sought to analyse the evolution of LVNC-CRT (LC) patients in general and compare it with the non-LVNC-CRT group (nLC). Methods We analysed 40 patients with contrast-MRI documented LVNC (concomitant positive Petersen and Jacquier criteria) implanted with CRT devices in CEHB. The follow-up included 7 hospital visits for each patient (between baseline and 3 years). Demographics, risk factors, usual serum levels, pre-procedural planning factors, clinical, ECG, TTE and biochemical markers were recorded. Statistical analysis was performed using software. Notable differences were reported as either p-values from crosstabs (discrete) or mean differences, p-values and confidence intervals from t-tests (continuous). A p-value of .05 was chosen for statistical significance (SS). Results Subjects in LC were younger (-7.52 ys; &lt;.000; (-3.617;-11.440)), with no sex predominance, more obese (45.9 vs. 28.3%; &lt;0.24) and had less ischaemic disease (17.9 vs. 39.7%; &lt;.007). LC implants were usually CRT-Ds (91 vs. 49.5%; &lt;.000) and more frequently MPP-ready (35.8 vs. 8.4%; &lt;.000). At baseline, sinus rhythm was predominant in LC (97.4 vs. 79.8%; &lt;.007) and permitted frequent use of optimal fusion CRT (75.5 vs. 46.6%; &lt;.002). Although initial LVEFs were similar, LCs had much larger EDVs (+48.91 ml; &lt;.020; (+7.705;+90.124)) and ESVs (+34.91; &lt;.05; (+1.657;+71.478)). After an initial encouraging ⁓ 1 year evolution the LC-CRT group crashed its performance in terms of both LVEF and volumes. Thus, at 1 year follow-up, when compared to nLCs, LVEFs were far lower (-22.02%; &lt;.000; (-32.29;-11.76)) while EDVs and ESVs much higher – (+70.8 ml; &lt;.037; (+49.27;+ 189.65)) and (+100.13; &lt;.039; (+5.25;+195)) respectively – in LCs in spite of similarly corrected dyssynchrony. The mean mitral regurgitation (MR) degree at 1 year was much higher in LCs (+1.8 classes; &lt;.002; (+0.69;+2.97)) certainly contributing to the poor results. The cumulated super-responder/responder (SR/R) rates were constantly lower and decreasing at both 1 year (37.5 vs. 72.4; &lt;.040) and 2 years of follow-up (10.1 vs. 80%; NS). Conclusions CRT candidates with LVNC are significantly more severe at the time of implant. After an initial short-term improvement (probably due to acute correction of dyssynchrony) most patients fail to respond in the long term. Severe dilation with important secondary MR probably plays an important role.


Sign in / Sign up

Export Citation Format

Share Document