Significance tests and confidence intervals

2014 ◽  
pp. 142-168
Author(s):  
Simon Vaughan
2019 ◽  
Vol 12 (1) ◽  
pp. 205979911982651
Author(s):  
Michael Wood

In many fields of research, null hypothesis significance tests and p values are the accepted way of assessing the degree of certainty with which research results can be extrapolated beyond the sample studied. However, there are very serious concerns about the suitability of p values for this purpose. An alternative approach is to cite confidence intervals for a statistic of interest, but this does not directly tell readers how certain a hypothesis is. Here, I suggest how the framework used for confidence intervals could easily be extended to derive confidence levels, or “tentative probabilities,” for hypotheses. I also outline four quick methods for estimating these. This allows researchers to state their confidence in a hypothesis as a direct probability, instead of circuitously by p values referring to a hypothetical null hypothesis—which is usually not even stated explicitly. The inevitable difficulties of statistical inference mean that these probabilities can only be tentative, but probabilities are the natural way to express uncertainties, so, arguably, researchers using statistical methods have an obligation to estimate how probable their hypotheses are by the best available method. Otherwise, misinterpretations will fill the void.


1988 ◽  
Vol 63 (1) ◽  
pp. 319-331 ◽  
Author(s):  
David Johnstone

A recent book by psychologist M. Oakes (1986) surveys the practice and logical foundations of statistical tests in the social and behavioral sciences. The book is aimed at producers and consumers of statistical research reports in these disciplines and has as its objective a shift in common practice from “significance” tests (however interpreted) to their more complete and informative analogs, confidence intervals. Much is made of the writings of the great English scientist and statistician Sir Ronald Fisher, to whom, most of all, the received theory of statistical tests is due. Oakes misrepresents Fisher's position on points of logic. There is also some overstatement of the case for confidence intervals. More interesting is the author's positive explanation for the widespread acceptance of significance tests among applied researchers, for there is no less settled logic or scheme of inference within theoretical statistics, as instantiated by the current papers of Casella and Berger (1987) and Berger and Sellke (1987) in the Journal of the American Statistical Association. That research workers in applied fields continue to use significance tests routinely may be explained by forces of supply and demand in the market for statistical evidence, where the commodity traded is not so much evidence, but “statistical significance.”


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12453
Author(s):  
Locksley L. McV. Messam ◽  
Hsin-Yi Weng ◽  
Nicole W. Y. Rosenberger ◽  
Zhi Hao Tan ◽  
Stephanie D. M. Payet ◽  
...  

Background Despite much discussion in the epidemiologic literature surrounding the use of null hypothesis significance testing (NHST) for inferences, the reporting practices of veterinary researchers have not been examined. We conducted a survey of articles published in Preventive Veterinary Medicine, a leading veterinary epidemiology journal, aimed at (a) estimating the frequency of reporting p values, confidence intervals and statistical significance between 1997 and 2017, (b) determining whether this varies by article section and (c) determining whether this varies over time. Methods We used systematic cluster sampling to select 985 original research articles from issues published in March, June, September and December of each year of the study period. Using the survey data analysis menu in Stata, we estimated overall and yearly proportions of article sections (abstracts, results-texts, results-tables and discussions) reporting p values, confidence intervals and statistical significance. Additionally, we estimated the proportion of p values less than 0.05 reported in each section, the proportion of article sections in which p values were reported as inequalities, and the proportion of article sections in which confidence intervals were interpreted as if they were significance tests. Finally, we used Generalised Estimating Equations to estimate prevalence odds ratios and 95% confidence intervals, comparing the occurrence of each of the above-mentioned reporting elements in one article section relative to another. Results Over the 20-year period, for every 100 published manuscripts, 31 abstracts (95% CI [28–35]), 65 results-texts (95% CI [61–68]), 23 sets of results-tables (95% CI [20–27]) and 59 discussion sections (95% CI [56–63]) reported statistical significance at least once. Only in the case of results-tables, were the numbers reporting p values (48; 95% CI [44–51]), and confidence intervals (44; 95% CI [41–48]) higher than those reporting statistical significance. We also found that a substantial proportion of p values were reported as inequalities and most were less than 0.05. The odds of a p value being less than 0.05 (OR = 4.5; 95% CI [2.3–9.0]) or being reported as an inequality (OR = 3.2; 95% CI [1.3–7.6]) was higher in the abstracts than in the results-texts. Additionally, when confidence intervals were interpreted, on most occasions they were used as surrogates for significance tests. Overall, no time trends in reporting were observed for any of the three reporting elements over the study period. Conclusions Despite the availability of superior approaches to statistical inference and abundant criticism of its use in the epidemiologic literature, NHST is substantially the most common means of inference in articles published in Preventive Veterinary Medicine. This pattern has not changed substantially between 1997 and 2017.


1994 ◽  
Vol 33 (02) ◽  
pp. 214-219 ◽  
Author(s):  
J. Izsák

Abstract:The sample theory of normal diversity indices is complex. Distributionfree methods, such as the jackknife method, can easily be used to determine confidence intervals and testing diversity. Jackknife estimates and their variances for a number of different diversity indices are described in this paper. A simple numerical example is given for demonstrating this method. Discrimination based on confidence intervals is also discussed. It is assumed that there is a special correlation between the sensitivity parameter m and the relative width of confidence intervals in the Hurlbert index family. It is shown that the usual estimation of the Hurlbert index coincides with the relating jackknife estimate. For demonstration, diagnoses registered in a set of death certificates are used. There is a considerable diversity in diagnoses among different diagnostic groups: the diversity is largest in autopsy reports, whereas it is non-significant in GP’s reports and in reports of physicians authorized to issue death certificates. Knowing that autopsy reports tend to be fairly accurate, our research findings seem to confirm the hypothesis that there is a correlation between reliability and diversity of diagnoses.


Sign in / Sign up

Export Citation Format

Share Document