significance testing
Recently Published Documents


TOTAL DOCUMENTS

695
(FIVE YEARS 131)

H-INDEX

54
(FIVE YEARS 6)

Author(s):  
Xiaojuan Sun ◽  
Siyan Li ◽  
Julian X. L. Wang ◽  
Panxing Wang ◽  
Dong Guo

Author(s):  
Daniel Berner ◽  
Valentin Amrhein

A paradigm shift away from null hypothesis significance testing seems in progress. Based on simulations, we illustrate some of the underlying motivations. First, P-values vary strongly from study to study, hence dichotomous inference using significance thresholds is usually unjustified. Second, statistically significant results have overestimated effect sizes, a bias declining with increasing statistical power. Third, statistically non-significant results have underestimated effect sizes, and this bias gets stronger with higher statistical power. Fourth, the tested statistical hypotheses generally lack biological justification and are often uninformative. Despite these problems, a screen of 48 papers from the 2020 volume of the Journal of Evolutionary Biology exemplifies that significance testing is still used almost universally in evolutionary biology. All screened studies tested the default null hypothesis of zero effect with the default significance threshold of p = 0.05, none presented a pre-planned alternative hypothesis, and none calculated statistical power and the probability of ‘false negatives’ (beta error). The papers reported 49 significance tests on average. Of 41 papers that contained verbal descriptions of a ‘statistically non-significant’ result, 26 (63%) falsely claimed the absence of an effect. We conclude that our studies in ecology and evolutionary biology are mostly exploratory and descriptive. We should thus shift from claiming to “test” specific hypotheses statistically to describing and discussing many hypotheses (effect sizes) that are most compatible with our data, given our statistical model. We already have the means for doing so, because we routinely present compatibility (“confidence”) intervals covering these hypotheses.


2021 ◽  
Vol 15 ◽  
Author(s):  
Ruslan Masharipov ◽  
Irina Knyazeva ◽  
Yaroslav Nikolaev ◽  
Alexander Korotkov ◽  
Michael Didur ◽  
...  

Classical null hypothesis significance testing is limited to the rejection of the point-null hypothesis; it does not allow the interpretation of non-significant results. This leads to a bias against the null hypothesis. Herein, we discuss statistical approaches to ‘null effect’ assessment focusing on the Bayesian parameter inference (BPI). Although Bayesian methods have been theoretically elaborated and implemented in common neuroimaging software packages, they are not widely used for ‘null effect’ assessment. BPI considers the posterior probability of finding the effect within or outside the region of practical equivalence to the null value. It can be used to find both ‘activated/deactivated’ and ‘not activated’ voxels or to indicate that the obtained data are not sufficient using a single decision rule. It also allows to evaluate the data as the sample size increases and decide to stop the experiment if the obtained data are sufficient to make a confident inference. To demonstrate the advantages of using BPI for fMRI data group analysis, we compare it with classical null hypothesis significance testing on empirical data. We also use simulated data to show how BPI performs under different effect sizes, noise levels, noise distributions and sample sizes. Finally, we consider the problem of defining the region of practical equivalence for BPI and discuss possible applications of BPI in fMRI studies. To facilitate ‘null effect’ assessment for fMRI practitioners, we provide Statistical Parametric Mapping 12 based toolbox for Bayesian inference.


2021 ◽  
Vol 1 (3) ◽  
Author(s):  
Rosie Twomey ◽  
Vanessa Yingling ◽  
Joe Warne ◽  
Christoph Schneider ◽  
Christopher McCrum ◽  
...  

Scientists rely upon an accurate scientific literature in order to build and test new theories about the natural world. In the past decade, observational studies of the scientific literature have indicated that numerous questionable research practices and poor reporting practices may be hindering scientific progress. In particular, 3 recent studies have indicated an implausibly high rate of studies with positive (i.e., hypothesis confirming) results. In sports medicine, a field closely related to kinesiology, studies that tested a hypothesis indicated support for their primary hypothesis ~70% of the time. However, a study of journals that cover the entire field of kinesiology has yet to be completed, and the quality of other reporting practices, such as clinical trial registration, has not been evaluated. In this study we retrospectively evaluated 300 original research articles from the flagship journals of North America (Medicine and Science in Sports and Exercise), Europe (European Journal of Sport Science), and Australia (Journal of Science and Medicine in Sport). The hypothesis testing rate (~64%) and positive result rate (~81%) were much lower than what has been reported in other fields (e.g., psychology), and there was only weak evidence for our hypothesis that the positive result rate exceeded 80%. However, the positive result rate is still considered unreasonably high. Additionally, most studies did not report trial registration, and rarely included accessible data indicating rather poor reporting practices. The majority of studies relied upon significance testing (~92%), but it was more concerning that a majority of studies (~82%) without a stated hypothesis still relied upon significance testing. Overall, the positive result rate in kinesiology is unacceptably high, despite being lower than other fields such as psychology, and most published manuscripts demonstrated subpar reporting practices


2021 ◽  
Vol 8 (12) ◽  
Author(s):  
David Robert Grimes ◽  
James Heathers

A concerning amount of biomedical research is not reproducible. Unreliable results impede empirical progress in medical science, ultimately putting patients at risk. Many proximal causes of this irreproducibility have been identified, a major one being inappropriate statistical methods and analytical choices by investigators. Within this, we formally quantify the impact of inappropriate redaction beyond a threshold value in biomedical science. This is effectively truncation of a dataset by removing extreme data points, and we elucidate its potential to accidentally or deliberately engineer a spurious result in significance testing. We demonstrate that the removal of a surprisingly small number of data points can be used to dramatically alter a result. It is unknown how often redaction bias occurs in the broader literature, but given the risk of distortion to the literature involved, we suggest that it must be studiously avoided, and mitigated with approaches to counteract any potential malign effects to the research quality of medical science.


2021 ◽  
Author(s):  
Erik Otarola-Castillo ◽  
Meissa G Torquato ◽  
Caitlin E. Buck

Archaeologists often use data and quantitative statistical methods to evaluate their ideas. Although there are various statistical frameworks for decision-making in archaeology and science in general, in this chapter, we provide a simple explanation of Bayesian statistics. To contextualize the Bayesian statistical framework, we briefly compare it to the more widespread null hypothesis significance testing (NHST) approach. We also provide a simple example to illustrate how archaeologists use data and the Bayesian framework to compare hypotheses and evaluate their uncertainty. We then review how archaeologists have applied Bayesian statistics to solve research problems related to radiocarbon dating and chronology, lithic, ceramic, zooarchaeological, bioarchaeological, and spatial analyses. Because recent work has reviewed Bayesian applications in archaeology from the 1990s up to 2017, this work considers the relevant literature published since 2017.


2021 ◽  
pp. 161-210
Author(s):  
Alan Agresti ◽  
Maria Kateri

Sign in / Sign up

Export Citation Format

Share Document