scholarly journals Key steps to avoiding artistry with significance tests

Author(s):  
C Patrick Doncaster ◽  
Thomas H G Ezard

Statistical significance provides evidence for or against an explanation of a population of interest, not a description of data sampled from the population. This simple distinction gets ignored in hundreds of thousands of research publications yearly, which confuse statistical with biological significance by referring to hypothesis-testing analyses as demonstrating significant results. Here we identify three key steps to objective reporting of evidence-based analyses. Firstly, by interpreting P -values correctly as explanation not description, authors set their inferences in the context of the design of the study and its purpose to test for effects of biologically relevant size; nowhere in this process is it informative to use the word ‘significant’. Secondly, empirical effect sizes demand interpretation with respect to a size of relevance to the test hypothesis. Thirdly, even without an a priori expectation of biological relevance, authors can and should interpret significance tests with respect to effects of reliably detectable size.

2017 ◽  
Author(s):  
C Patrick Doncaster ◽  
Thomas H G Ezard

Statistical significance provides evidence for or against an explanation of a population of interest, not a description of data sampled from the population. This simple distinction gets ignored in hundreds of thousands of research publications yearly, which confuse statistical with biological significance by referring to hypothesis-testing analyses as demonstrating significant results. Here we identify three key steps to objective reporting of evidence-based analyses. Firstly, by interpreting P -values correctly as explanation not description, authors set their inferences in the context of the design of the study and its purpose to test for effects of biologically relevant size; nowhere in this process is it informative to use the word ‘significant’. Secondly, empirical effect sizes demand interpretation with respect to a size of relevance to the test hypothesis. Thirdly, even without an a priori expectation of biological relevance, authors can and should interpret significance tests with respect to effects of reliably detectable size.


Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


1998 ◽  
Vol 15 (2) ◽  
pp. 103-118 ◽  
Author(s):  
Vinson H. Sutlive ◽  
Dale A. Ulrich

The unqualified use of statistical significance tests for interpreting the results of empirical research has been called into question by researchers in a number of behavioral disciplines. This paper reviews what statistical significance tells us and what it does not, with particular attention paid to criticisms of using the results of these tests as the sole basis for evaluating the overall significance of research findings. In addition, implications for adapted physical activity research are discussed. Based on the recent literature of other disciplines, several recommendations for evaluating and reporting research findings are made. They include calculating and reporting effect sizes, selecting an alpha level larger than the conventional .05 level, placing greater emphasis on replication of results, evaluating results in a sample size context, and employing simple research designs. Adapted physical activity researchers are encouraged to use specific modifiers when describing findings as significant.


2005 ◽  
Vol 77 (1) ◽  
pp. 45-76 ◽  
Author(s):  
Lee-Ann C. Hayek ◽  
W. Ronald Heyer

Several analytic techniques have been used to determine sexual dimorphism in vertebrate morphological measurement data with no emergent consensus on which technique is superior. A further confounding problem for frog data is the existence of considerable measurement error. To determine dimorphism, we examine a single hypothesis (Ho = equal means) for two groups (females and males). We demonstrate that frog measurement data meet assumptions for clearly defined statistical hypothesis testing with statistical linear models rather than those of exploratory multivariate techniques such as principal components, correlation or correspondence analysis. In order to distinguish biological from statistical significance of hypotheses, we propose a new protocol that incorporates measurement error and effect size. Measurement error is evaluated with a novel measurement error index. Effect size, widely used in the behavioral sciences and in meta-analysis studies in biology, proves to be the most useful single metric to evaluate whether statistically significant results are biologically meaningful. Definitions for a range of small, medium, and large effect sizes specifically for frog measurement data are provided. Examples with measurement data for species of the frog genus Leptodactylus are presented. The new protocol is recommended not only to evaluate sexual dimorphism for frog data but for any animal measurement data for which the measurement error index and observed or a priori effect sizes can be calculated.


2013 ◽  
Vol 36 (1) ◽  
pp. 33-36
Author(s):  
A. Martínez–Abraín ◽  

Hypothesis testing is commonly used in ecology and conservation biology as a tool to test statistical–population parameter properties against null hypotheses. This tool was first invented by lab biologists and statisticians to deal with experimental data for which the magnitude of biologically relevant effects was known beforehand. The latter often makes the use of this tool inadequate in ecology because we field ecologists usually deal with observational data and seldom know the magnitude of biologically relevant effects. This precludes us from using hypothesis testing in the correct way, which is posing informed null hypotheses and making use of a priori power tests to calculate necessary sample sizes, and it forces us to use null hypotheses of equality to zero effects which are of little use for field ecologists because we know beforehand that zero effects do not exist in nature. This is why only ‘positive’ (statistically significant) results are sought by ecologists, because negative results always derive from a lack of power to detect small (usually biologically irrelevant) effects. Despite this, ‘negative’ results should be published, as they are important within the context of meta–analysis (which accounts for uncertainty when weighting individual studies by sample size) to allow proper decision–making. The use of multiple hypothesis testing and Bayesian statistics puts an end to this black or white dichotomy and moves us towards a more realistic continuum of grey tones.


Beverages ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. 35 ◽  
Author(s):  
Beth Desira ◽  
Shaun Watson ◽  
George Van Doorn ◽  
Justin Timora ◽  
Charles Spence

Our emotions influence our perception. In order to determine whether emotion influences the perception of beer, 32 participants watched either a scene from the movie Wall-E to induce joviality, or a short clip from the Shawshank Redemption to induce sadness. The participants were then required to sample up to 250 mL of Yenda Pale Ale beer and rate it on a variety of taste and flavor characteristics (e.g., bitterness), before completing the Positive and Negative Affect Schedule-X (PANAS-X). The data were analyzed using Bayesian t-tests and Null Hypothesis Significance Tests (NHSTs). After applying conservative corrections for multiple comparisons, NHSTs failed to reach statistical significance. However, the effect sizes suggested that inducing joviality, relative to inducing sadness, resulted in the beer being rated as (a) tasting more pleasant, (b) tasting sweeter, and (c) being of higher quality. Following the induction of joviality, participants were also willing to pay more for the beer. The Bayesian analyses indicated that induced emotion can influence flavor perception for complex taste stimuli. The effect sizes and Bayesian analyses are interpreted in terms of Feelings-as-Information theory. These preliminary findings can tentatively be applied to real-world environments such as venues that serve and/or market alcohol.


Author(s):  
Scott B. Morris ◽  
Arash Shokri

To understand and communicate research findings, it is important for researchers to consider two types of information provided by research results: the magnitude of the effect and the degree of uncertainty in the outcome. Statistical significance tests have long served as the mainstream method for statistical inferences. However, the widespread misinterpretation and misuse of significance tests has led critics to question their usefulness in evaluating research findings and to raise concerns about the far-reaching effects of this practice on scientific progress. An alternative approach involves reporting and interpreting measures of effect size along with confidence intervals. An effect size is an indicator of magnitude and direction of a statistical observation. Effect size statistics have been developed to represent a wide range of research questions, including indicators of the mean difference between groups, the relative odds of an event, or the degree of correlation among variables. Effect sizes play a key role in evaluating practical significance, conducting power analysis, and conducting meta-analysis. While effect sizes summarize the magnitude of an effect, the confidence intervals represent the degree of uncertainty in the result. By presenting a range of plausible alternate values that might have occurred due to sampling error, confidence intervals provide an intuitive indicator of how strongly researchers should rely on the results from a single study.


1983 ◽  
Vol 20 (2) ◽  
pp. 122-133 ◽  
Author(s):  
Alan G. Sawyer ◽  
J. Paul Peter

Classical statistical significance testing is the primary method by which marketing researchers empirically test hypotheses and draw inferences about theories. The authors discuss the interpretation and value of classical statistical significance tests and suggest that classical inferential statistics may be misinterpreted and overvalued by marketing researchers in judging research results. Replication, Bayesian hypothesis testing, meta-analysis, and strong inference are examined as approaches for augmenting conventional statistical analyses.


Sign in / Sign up

Export Citation Format

Share Document