scholarly journals The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997–2017)

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12453
Author(s):  
Locksley L. McV. Messam ◽  
Hsin-Yi Weng ◽  
Nicole W. Y. Rosenberger ◽  
Zhi Hao Tan ◽  
Stephanie D. M. Payet ◽  
...  

Background Despite much discussion in the epidemiologic literature surrounding the use of null hypothesis significance testing (NHST) for inferences, the reporting practices of veterinary researchers have not been examined. We conducted a survey of articles published in Preventive Veterinary Medicine, a leading veterinary epidemiology journal, aimed at (a) estimating the frequency of reporting p values, confidence intervals and statistical significance between 1997 and 2017, (b) determining whether this varies by article section and (c) determining whether this varies over time. Methods We used systematic cluster sampling to select 985 original research articles from issues published in March, June, September and December of each year of the study period. Using the survey data analysis menu in Stata, we estimated overall and yearly proportions of article sections (abstracts, results-texts, results-tables and discussions) reporting p values, confidence intervals and statistical significance. Additionally, we estimated the proportion of p values less than 0.05 reported in each section, the proportion of article sections in which p values were reported as inequalities, and the proportion of article sections in which confidence intervals were interpreted as if they were significance tests. Finally, we used Generalised Estimating Equations to estimate prevalence odds ratios and 95% confidence intervals, comparing the occurrence of each of the above-mentioned reporting elements in one article section relative to another. Results Over the 20-year period, for every 100 published manuscripts, 31 abstracts (95% CI [28–35]), 65 results-texts (95% CI [61–68]), 23 sets of results-tables (95% CI [20–27]) and 59 discussion sections (95% CI [56–63]) reported statistical significance at least once. Only in the case of results-tables, were the numbers reporting p values (48; 95% CI [44–51]), and confidence intervals (44; 95% CI [41–48]) higher than those reporting statistical significance. We also found that a substantial proportion of p values were reported as inequalities and most were less than 0.05. The odds of a p value being less than 0.05 (OR = 4.5; 95% CI [2.3–9.0]) or being reported as an inequality (OR = 3.2; 95% CI [1.3–7.6]) was higher in the abstracts than in the results-texts. Additionally, when confidence intervals were interpreted, on most occasions they were used as surrogates for significance tests. Overall, no time trends in reporting were observed for any of the three reporting elements over the study period. Conclusions Despite the availability of superior approaches to statistical inference and abundant criticism of its use in the epidemiologic literature, NHST is substantially the most common means of inference in articles published in Preventive Veterinary Medicine. This pattern has not changed substantially between 1997 and 2017.

1988 ◽  
Vol 63 (1) ◽  
pp. 319-331 ◽  
Author(s):  
David Johnstone

A recent book by psychologist M. Oakes (1986) surveys the practice and logical foundations of statistical tests in the social and behavioral sciences. The book is aimed at producers and consumers of statistical research reports in these disciplines and has as its objective a shift in common practice from “significance” tests (however interpreted) to their more complete and informative analogs, confidence intervals. Much is made of the writings of the great English scientist and statistician Sir Ronald Fisher, to whom, most of all, the received theory of statistical tests is due. Oakes misrepresents Fisher's position on points of logic. There is also some overstatement of the case for confidence intervals. More interesting is the author's positive explanation for the widespread acceptance of significance tests among applied researchers, for there is no less settled logic or scheme of inference within theoretical statistics, as instantiated by the current papers of Casella and Berger (1987) and Berger and Sellke (1987) in the Journal of the American Statistical Association. That research workers in applied fields continue to use significance tests routinely may be explained by forces of supply and demand in the market for statistical evidence, where the commodity traded is not so much evidence, but “statistical significance.”


2013 ◽  
Vol 4 (4) ◽  
pp. 220-223 ◽  
Author(s):  
Eva Skovlund

AbstractBackgroundStatistical analyses are used to help understand the practical significance of the findings in a clinical study. Many clinical researchers appear to have limited knowledge onhowto perform appropriate statistical analysis as well as understanding what the results in fact mean.MethodsThis focal review is based on long experience in supervising clinicians on statistical analysis and advising editors of scientific journals on the quality of statistical analysis applied in scientific reports evaluated for publication.ResultsBasic facts on elementary statistical analyses are presented, and common misunderstandings are elucidated. Efficacy estimates, the effect of sample size, and confidence intervals for effect estimates are reviewed, and the difference between statistical significance and clinical relevance is highlighted. The weaknesses of p-values and misunderstandings in how to interpret them are illustrated with practical examples.Conclusions and recommendationsSome very important questions need to be answered before initiating a clinical trial. What is the research question? To which patients should the result be generalised? Is the number of patients sufficient to draw a valid conclusion? When data are analysed the number of (preplanned) significance tests should be kept small and post hoc analyses should be avoided. It should also be remembered that the clinical relevance of a finding cannot be assessed by the p-value. Thus effect estimates and corresponding 95% confidence intervals should always be reported.


Author(s):  
Scott B. Morris ◽  
Arash Shokri

To understand and communicate research findings, it is important for researchers to consider two types of information provided by research results: the magnitude of the effect and the degree of uncertainty in the outcome. Statistical significance tests have long served as the mainstream method for statistical inferences. However, the widespread misinterpretation and misuse of significance tests has led critics to question their usefulness in evaluating research findings and to raise concerns about the far-reaching effects of this practice on scientific progress. An alternative approach involves reporting and interpreting measures of effect size along with confidence intervals. An effect size is an indicator of magnitude and direction of a statistical observation. Effect size statistics have been developed to represent a wide range of research questions, including indicators of the mean difference between groups, the relative odds of an event, or the degree of correlation among variables. Effect sizes play a key role in evaluating practical significance, conducting power analysis, and conducting meta-analysis. While effect sizes summarize the magnitude of an effect, the confidence intervals represent the degree of uncertainty in the result. By presenting a range of plausible alternate values that might have occurred due to sampling error, confidence intervals provide an intuitive indicator of how strongly researchers should rely on the results from a single study.


BMJ Open ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. e051821
Author(s):  
Lisa Bero ◽  
Rosa Lawrence ◽  
Louis Leslie ◽  
Kellia Chiu ◽  
Sally McDonald ◽  
...  

ObjectiveTo compare results reporting and the presence of spin in COVID-19 study preprints with their finalised journal publications.DesignCross-sectional study.SettingInternational medical literature.ParticipantsPreprints and final journal publications of 67 interventional and observational studies of COVID-19 treatment or prevention from the Cochrane COVID-19 Study Register published between 1 March 2020 and 30 October 2020.Main outcome measuresStudy characteristics and discrepancies in (1) results reporting (number of outcomes, outcome descriptor, measure, metric, assessment time point, data reported, reported statistical significance of result, type of statistical analysis, subgroup analyses (if any), whether outcome was identified as primary or secondary) and (2) spin (reporting practices that distort the interpretation of results so they are viewed more favourably).ResultsOf 67 included studies, 23 (34%) had no discrepancies in results reporting between preprints and journal publications. Fifteen (22%) studies had at least one outcome that was included in the journal publication, but not the preprint; eight (12%) had at least one outcome that was reported in the preprint only. For outcomes that were reported in both preprints and journals, common discrepancies were differences in numerical values and statistical significance, additional statistical tests and subgroup analyses and longer follow-up times for outcome assessment in journal publications.At least one instance of spin occurred in both preprints and journals in 23/67 (34%) studies, the preprint only in 5 (7%), and the journal publications only in 2 (3%). Spin was removed between the preprint and journal publication in 5/67 (7%) studies; but added in 1/67 (1%) study.ConclusionsThe COVID-19 preprints and their subsequent journal publications were largely similar in reporting of study characteristics, outcomes and spin. All COVID-19 studies published as preprints and journal publications should be critically evaluated for discrepancies and spin.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 560.3-561
Author(s):  
E. F. Vicente-Rabaneda ◽  
J. De la Macorra ◽  
J. P. Baldivieso ◽  
F. Gutiérrez-Rodríguez ◽  
A. García-Vadillo ◽  
...  

Background:Interstitial lung disease (ILD) is a severe manifestation of rheumatoid arthritis (RA), linked to increased mortality. There is still no consensus on the best therapeutic strategy as there aren’t yet randomized controlled trials.Objectives:To analyze the available scientific evidence on the efficacy and safety of rituximab (RTX) treatment of interstitial lung disease (ILD) associated with rheumatoid arthritis (RA).Methods:A systematic search was carried out in PubMed until April 2020 following the PRISMA recommendations. Studies were selected according to the following inclusion criteria: (1) original research, including case series, case/control studies, cohort studies, and clinical trials; (2) population with RA and associated ILD, either monographically or together with other connective tissue diseases (CTD), provided that individualized data on patients with RA were provided; (3) patients treated with RTX; (4) objective and quantifiable results on the evolution of ILD after treatment with available data of FVC, DLCO and/or HRCT.Results:Of the 64 papers identified, 9 articles were selected. The studies showed great heterogeneity in design, both in the sample selection criteria and in the objectives of the analysis. Most were observational, retrospective (n = 6) or prospective (n = 2) studies, with only one open prospective experimental study. Those focused on RA predominated, but 3 of them also included patients with other CTDs. The mean age of the patients in the different studies ranged between 52 and 70 years, predominantly women. 40-79% had a history of smoking and were mostly positive for rheumatoid factor (83-100%) and anti-CCP (82-100%). The most frequent radiological patterns were NSIP, UIP and undefined. The outcome measures were diverse: changes in respiratory function tests (LTF) and HRCT, incidence of pulmonary dysfunction, mortality rates, effect on glucocorticoid deprivation, delay in inclusion in the lung transplant list and/or serious adverse events. The initiation of RTX was motivated by pulmonary and/or joint pathology, in patients with failure to other synthetic or biological DMARDs. A total of 393 treatment cycles were collected in 114 patients, with a mean of 3.45 cycles per patient. The RTX regimen was 2 infusions of 1g 2 weeks apart in all patients, except for 1 who received the lymphoma-like regimen. With regard to the efficacy of the treatment with RTX, improvement and especially stabilization of HRCT and LFT predominated, with numerically greater improvement for DLCO than for FVC. There was also a favorable trend in the evolution of patients treated with RTX compared to controls, although it did not reach statistical significance, and a lower risk of deterioration of lung function in patients treated with RTX versus those who had received other DMARDs. The mortality rate found at 5 years was lower than that previously described for the disease and half for the patients treated with RTX compared to those treated with anti-TNF. The adverse events described in the studies did not show additional safety alerts to those already described for RTX.Conclusion:RTX seems to be postulated as a promising therapy for patients with ILD associated with RA, showing a stabilizing effect on the lung function, with an acceptable safety profile. However, further research of higher methodological quality prospective studies is needed to confirm these favorable preliminary results.Disclosure of Interests:None declared


1998 ◽  
Vol 21 (2) ◽  
pp. 221-222
Author(s):  
Louis G. Tassinary

Chow (1996) offers a reconceptualization of statistical significance that is reasoned and comprehensive. Despite a somewhat rough presentation, his arguments are compelling and deserve to be taken seriously by the scientific community. It is argued that his characterization of literal replication, types of research, effect size, and experimental control are in need of revision.


2018 ◽  
Vol 44 (4) ◽  
pp. 292-298 ◽  
Author(s):  
Erica Nishida Hasimoto ◽  
Daniele Cristina Cataneo ◽  
Tarcísio Albertin dos Reis ◽  
Antonio José Maria Cataneo

ABSTRACT Objective: To determine the prevalence of primary hyperhidrosis in the city of Botucatu, Brazil, and to evaluate how this disorder affects the quality of life in those suffering from it. Methods: A population survey was conducted in order to identify cases of hyperhidrosis among residents in the urban area of the city, selected by systematic cluster sampling. In accordance with the census maps of the city, the sample size should be at least 4,033 participants. Ten interviewers applied a questionnaire that evaluated the presence of excessive sweating and invited the subjects who reported hyperhidrosis to be evaluated by a physician in order to confirm the diagnosis. Results: A total of 4,133 residents, in 1,351 households, were surveyed. Excessive sweating was reported by 85 residents (prevalence = 2.07%), of whom 51 (60%) were female. Of those 85 respondents, 51 (60%) agreed to undergo medical evaluation to confirm the diagnosis and only 23 (45%) were diagnosed with primary hyperhidrosis (prevalence = 0.93%). Of the 23 subjects diagnosed with primary hyperhidrosis, 11 (48%) reported poor or very poor quality of life. Conclusions: Although the prevalence of self-reported excessive sweating was greater than 2%, the actual prevalence of primary hyperhidrosis in our sample was 0.93% and nearly 50% of the respondents with primary hyperhidrosis reported impaired quality of life.


2016 ◽  
Vol 21 (1) ◽  
pp. 102-115 ◽  
Author(s):  
Stephen Gorard

This paper reminds readers of the absurdity of statistical significance testing, despite its continued widespread use as a supposed method for analysing numeric data. There have been complaints about the poor quality of research employing significance tests for a hundred years, and repeated calls for researchers to stop using and reporting them. There have even been attempted bans. Many thousands of papers have now been written, in all areas of research, explaining why significance tests do not work. There are too many for all to be cited here. This paper summarises the logical problems as described in over 100 of these prior pieces. It then presents a series of demonstrations showing that significance tests do not work in practice. In fact, they are more likely to produce the wrong answer than a right one. The confused use of significance testing has practical and damaging consequences for people's lives. Ending the use of significance tests is a pressing ethical issue for research. Anyone knowing the problems, as described over one hundred years, who continues to teach, use or publish significance tests is acting unethically, and knowingly risking the damage that ensues.


2013 ◽  
Vol 12 (3) ◽  
pp. 345-351 ◽  
Author(s):  
Jessica Middlemis Maher ◽  
Jonathan C. Markey ◽  
Diane Ebert-May

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.


Sign in / Sign up

Export Citation Format

Share Document