Bayesian Interpretation of the Expression “Significant at .05”: A Discrete Example

1990 ◽  
Vol 71 (1) ◽  
pp. 307-320
Author(s):  
David J. Johnstone

The Bayesian inference based on the information “significant at .05” depends logically on the sample size, n. If n is sufficiently large, the locution “significant at .05,” taken by itself, implies not strong evidence against the null hypothesis but strong evidence in its favor. More particularly, for large n, a report which says merely “significant at .05,” without further information, should be interpreted as evidence against the null only if for some reason peculiar to the test in question it is considered subjectively that the sample observation x is very probably significant not only at 5% but at 1% or lower. This result holds for any “point” (simple) null hypothesis and is demonstrated here in the context of a simple example. Note that for the purpose of interpreting the expression “significant at .05” per se, it is supposed that the exact value of x is unknown.

2016 ◽  
Vol 11 (4) ◽  
pp. 551-554 ◽  
Author(s):  
Martin Buchheit

The first sport-science-oriented and comprehensive paper on magnitude-based inferences (MBI) was published 10 y ago in the first issue of this journal. While debate continues, MBI is today well established in sport science and in other fields, particularly clinical medicine, where practical/clinical significance often takes priority over statistical significance. In this commentary, some reasons why both academics and sport scientists should abandon null-hypothesis significance testing and embrace MBI are reviewed. Apparent limitations and future areas of research are also discussed. The following arguments are presented: P values and, in turn, study conclusions are sample-size dependent, irrespective of the size of the effect; significance does not inform on magnitude of effects, yet magnitude is what matters the most; MBI allows authors to be honest with their sample size and better acknowledge trivial effects; the examination of magnitudes per se helps provide better research questions; MBI can be applied to assess changes in individuals; MBI improves data visualization; and MBI is supported by spreadsheets freely available on the Internet. Finally, recommendations to define the smallest important effect and improve the presentation of standardized effects are presented.


2021 ◽  
Author(s):  
Ruslan Masharipov ◽  
Yaroslav Nikolaev ◽  
Alexander Korotkov ◽  
Michael Didur ◽  
Denis Cherednichenko ◽  
...  

Classical null hypothesis significance testing is limited to the rejection of the point-null hypothesis; it does not allow the interpretation of non-significant results. Moreover, studies with a sufficiently large sample size will find statistically significant results even when the effect is negligible and may be considered practically equivalent to the null effect. This leads to a publication bias against the null hypothesis. There are two main approaches to assess null effects: shifting from the point-null to the interval-null hypothesis and considering the practical significance in the frequentist approach; using the Bayesian parameter inference based on posterior probabilities, or the Bayesian model inference based on Bayes factors. Herein, we discuss these statistical methods with particular focus on the application of the Bayesian parameter inference, as it is conceptually connected to both frequentist and Bayesian model inferences. Although Bayesian methods have been theoretically elaborated and implemented in commonly used neuroimaging software, they are not widely used for null effect assessment. To demonstrate the advantages of using the Bayesian parameter inference, we compared it with classical null hypothesis significance testing for fMRI data group analysis. We also consider the problem of choosing a threshold for a practically significant effect and discuss possible applications of Bayesian parameter inference in fMRI studies. We argue that Bayesian inference, which directly provides evidence for both the null and alternative hypotheses, may be more intuitive and convenient for practical use than frequentist inference, which only provides evidence against the null hypothesis. Moreover, it may indicate that the obtained data are not sufficient to make a confident inference. Because interim analysis is easy to perform using Bayesian inference, one can evaluate the data as the sample size increases and decide to terminate the experiment if the obtained data are sufficient to make a confident inference. To facilitate the application of the Bayesian parameter inference to null effect assessment, scripts with a simple GUI were developed.


Author(s):  
El-Housainy A. Rady ◽  
Mohamed R. Abonazel ◽  
Mariam H. Metawe’e

Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation trials) when the model truly fitted except modified Hosmer-Lemeshow test in "LogisticDx" package under all different model settings and Osius and Rojek’s (OsRo) test when the true model had an interaction term between binary and categorical covariates. In addition, le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares (CHCH) test gave unexpected different results under different packages. Concerning the power study, all tests had a very low power when a departure of missing covariate existed. Generally, stukel’s test (package ’LogisticDX) and CHCH test (package "RMS") reached a power in detecting a missing quadratic term greater than 80% under lower sample size while OsRo test (package ’LogisticDX’) was better in detecting missing interaction term. Beside the simulation study, we evaluated the performance of GOF tests using the breast cancer dataset.


2017 ◽  
pp. 234-351
Author(s):  
Kamelshewer Lohana Et al.,

The study Assess the Role & contributions of cooperative societies in boosting agricultural production & Entrepreneurship in the Kebbi State of Nigeria. A total of 120 sample size was used for the study. Cluster sampling technique was used to obtaining information from sample respondents (members of farmers’ cooperative societies). Sixty (60) questionnaires were administered to sixty respondents, each in both Zuru and Yauri Local Government Areas. Data collected was analysed and interpreted using simple percentage and descriptive methods. The major conclusions drawn from this research were: survey results, regarding effectiveness of cooperative societies in improving agricultural production & Entrepreneurship, have shown that 33.3% and 25% of the respondents in Zuru and Yauri Local Government Areas reported promoting farmers’ participation in agriculture, while 25% and 46% agreed to boost agricultural production in the study areas. About 36.6% and 35% believed in the effectiveness of cooperative societies in increasing food production. Sample respondents in the two Local Government Areas 5% and 3.3% reported all of the above indicators increase the effectiveness of cooperatives to agriculture. Survey results regarding the role of cooperatives in boosting Entrepreneurship in the study areas shows that 75% Zuru 88.3% Yauri agreed that cooperatives have added value to boosting Agric production & Entrepreneurship and only 15% and 11.6% did not agree with the above opinion. Many problems were identified that affects the smooth functioning of cooperatives and solutions for addressing the problems were recommended. Therefore it was concluded that Null Hypothesis HO is rejected and Alternate Hypothesis HA is accepted.


2019 ◽  
Author(s):  
Mark Andrews

The study of memory for texts has had an long tradition of research in psychology. According to most general accounts, the recognition or recall of items in a text is based on querying a memory representation that is built up on the basis of background knowledge. The objective of this paper is to describe and thoroughly test a Bayesian model of these general accounts. In particular, we present a model that describes how we use our background knowledge to form memories in terms of Bayesian inference of statistical patterns in the text, followed by posterior predictive inference of the words that are typical of those inferred patterns. This provides us with precise predictions about which words will be remembered, whether veridically or erroneously, from any given text. We tested these predictions using behavioural data from a memory experiment using a large sample of randomly chosen texts from a representative corpus of British English. The results show that the probability of remembering any given word in the text, whether falsely or veridically, is well predicted by the Bayesian model. Moreover, compared to nontrivial alternative models of text memory, by every measure used in the analyses, the predictions of the Bayesian model were superior, often overwhelmingly so. We conclude that these results provide strong evidence in favour of the Bayesian account of text memory that we have presented in this paper.


2017 ◽  
Vol 28 (4) ◽  
pp. 1019-1043 ◽  
Author(s):  
Shi-Fang Qiu ◽  
Xiao-Song Zeng ◽  
Man-Lai Tang ◽  
Wai-Yin Poon

Double sampling is usually applied to collect necessary information for situations in which an infallible classifier is available for validating a subset of the sample that has already been classified by a fallible classifier. Inference procedures have previously been developed based on the partially validated data obtained by the double-sampling process. However, it could happen in practice that such infallible classifier or gold standard does not exist. In this article, we consider the case in which both classifiers are fallible and propose asymptotic and approximate unconditional test procedures based on six test statistics for a population proportion and five approximate sample size formulas based on the recommended test procedures under two models. Our results suggest that both asymptotic and approximate unconditional procedures based on the score statistic perform satisfactorily for small to large sample sizes and are highly recommended. When sample size is moderate or large, asymptotic procedures based on the Wald statistic with the variance being estimated under the null hypothesis, likelihood rate statistic, log- and logit-transformation statistics based on both models generally perform well and are hence recommended. The approximate unconditional procedures based on the log-transformation statistic under Model I, Wald statistic with the variance being estimated under the null hypothesis, log- and logit-transformation statistics under Model II are recommended when sample size is small. In general, sample size formulae based on the Wald statistic with the variance being estimated under the null hypothesis, likelihood rate statistic and score statistic are recommended in practical applications. The applicability of the proposed methods is illustrated by a real-data example.


2005 ◽  
Vol 35 (1) ◽  
pp. 1-20 ◽  
Author(s):  
G. K. Huysamen

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.


2021 ◽  
Vol 10 (1) ◽  
pp. 36
Author(s):  
Diyas Herdian Putra ◽  
Ikhsanudin Ikhsanudin ◽  
Eusabinus Bunau

This research, entitled “Correlation Between Vocabulary Mastery and Fluency in Speaking” was carried out to the fifth semester students of English Education Study Program. The population of this research is fifth semester students of English Education Study Program of Teacher Training and Education Faculty at Tanjungpura University with the sample size of 30. The result of data analysis revealed the correlational between both variable from the samples is showing the correlational coefficient (r) value of 0.19. This value showed vocabulary mastery has low correlation with fluency in speaking. The contribution of vocabulary mastery to fluency in speaking is 3.6% which is almost non-existent. The hypothesis was tested by comparing the r value with r table, with the degree of freedom (df = n-2) of 28 and 1% level of significance. The r value (0.19) is lower than r table (0.463). It means, the alternative hypothesis (Ha) is rejected and null hypothesis (Ho) is accepted. With this research done, students should improve their speaking ability and remember more vocabularies to become a more and better speaker. The writer hopes this research may be beneficial to the readers and might resulting in newer research with different aspect and better concepts.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S773-S773
Author(s):  
Christopher Brydges ◽  
Allison A Bielak

Abstract Objective: Non-significant p values derived from null hypothesis significance testing do not distinguish between true null effects or cases where the data are insensitive in distinguishing the hypotheses. This study aimed to investigate the prevalence of Bayesian analyses in gerontological psychology, a statistical technique that can distinguish between conclusive and inconclusive non-significant results, by using Bayes factors (BFs) to reanalyze non-significant results from published gerontological research. Method: Non-significant results mentioned in abstracts of articles published in 2017 volumes of ten top gerontological psychology journals were extracted (N = 409) and categorized based on whether Bayesian analyses were conducted. BFs were calculated from non-significant t-tests within this sample to determine how frequently the null hypothesis was strongly supported. Results: Non-significant results were directly tested with Bayes factors in 1.22% of studies. Bayesian reanalyses of 195 non-significant t-tests found that only 7.69% of the findings provided strong evidence in support of the null hypothesis. Conclusions: Bayesian analyses are rarely used in gerontological research, and a large proportion of null findings were deemed inconclusive when reanalyzed with BFs. Researchers are encouraged to use BFs to test the validity of non-significant results, and ensure that sufficient sample sizes are used so that the meaningfulness of null findings can be evaluated.


2020 ◽  
Vol 35 (4) ◽  
pp. 364-371 ◽  
Author(s):  
Richard J. Salway ◽  
Trenika Williams ◽  
Camilo Londono ◽  
Patricia Roblin ◽  
Kristi Koenig ◽  
...  

AbstractIntroduction:Physicians’ management of hazardous material (HAZMAT) incidents requires personal protective equipment (PPE) utilization to ensure the safety of victims, facilities, and providers; therefore, providing effective and accessible training in its use is crucial. While an emphasis has been placed on the importance of PPE, there is debate about the most effective training methods. Circumstances may not allow for a traditional in-person demonstration; an accessible video training may provide a useful alternative.Hypothesis:Video training of Emergency Medicine (EM) residents in the donning and doffing of Level C PPE is more effective than in-person training.Null Hypothesis:Video training of EM residents in the donning and doffing of Level C PPE is equally effective compared with in-person training.Methods:A randomized, controlled pilot trial was performed with 20 EM residents as part of their annual Emergency Preparedness training. Residents were divided into four groups, with Group 1 and Group 2 viewing a demonstration video developed by the Emergency Preparedness Team (EPT) and Group 3 and Group 4 receiving the standard in-person demonstration training by an EPT member. The groups then separately performed a donning and doffing simulation while blinded evaluators assessed critical tasks utilizing a prepared evaluation tool. At the drill’s conclusion, all participants also completed a self-evaluation survey about their subjective interpretations of their respective trainings.Results:Both video and in-person training modalities showed significant overall improvement in participants’ confidence in doffing and donning PPE equipment (P <.05). However, no statistically significant difference was found in the number of failed critical tasks in donning or doffing between the training modalities (P >.05). Based on these results, the null hypothesis cannot be rejected. However, these results were limited by the small sample size and the study was not sufficiently powered to show a difference between training modalities.Conclusion:In this pilot study, video and in-person training were equally effective in training for donning and doffing Level C PPE, with similar error rates in both modalities. Further research into this subject with an appropriately powered study is warranted to determine whether this equivalence persists using a larger sample size.


Sign in / Sign up

Export Citation Format

Share Document