Using confidence intervals to estimate the response of salmon populations (Oncorhynchus spp.) to experimental habitat alterations

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.

Statistical Power in Plant Pathology Research

Phytopathology ◽

10.1094/phyto-03-17-0098-le ◽

2018 ◽

Vol 108 (1) ◽

pp. 15-22 ◽

Cited By ~ 4

Author(s):

David H. Gent ◽

Paul D. Esker ◽

Alissa B. Kriss

Keyword(s):

Plant Pathology ◽

Treatment Effect ◽

Effect Size ◽

Null Hypothesis ◽

Power Analysis ◽

Statistical Power ◽

Statistical Test ◽

Type I ◽

Type Ii ◽

Type Ii Errors

In null hypothesis testing, failure to reject a null hypothesis may have two potential interpretations. One interpretation is that the treatments being evaluated do not have a significant effect, and a correct conclusion was reached in the analysis. Alternatively, a treatment effect may have existed but the conclusion of the study was that there was none. This is termed a Type II error, which is most likely to occur when studies lack sufficient statistical power to detect a treatment effect. In basic terms, the power of a study is the ability to identify a true effect through a statistical test. The power of a statistical test is 1 – (the probability of Type II errors), and depends on the size of treatment effect (termed the effect size), variance, sample size, and significance criterion (the probability of a Type I error, α). Low statistical power is prevalent in scientific literature in general, including plant pathology. However, power is rarely reported, creating uncertainty in the interpretation of nonsignificant results and potentially underestimating small, yet biologically significant relationships. The appropriate level of power for a study depends on the impact of Type I versus Type II errors and no single level of power is acceptable for all purposes. Nonetheless, by convention 0.8 is often considered an acceptable threshold and studies with power less than 0.5 generally should not be conducted if the results are to be conclusive. The emphasis on power analysis should be in the planning stages of an experiment. Commonly employed strategies to increase power include increasing sample sizes, selecting a less stringent threshold probability for Type I errors, increasing the hypothesized or detectable effect size, including as few treatment groups as possible, reducing measurement variability, and including relevant covariates in analyses. Power analysis will lead to more efficient use of resources and more precisely structured hypotheses, and may even indicate some studies should not be undertaken. However, the conclusions of adequately powered studies are less prone to erroneous conclusions and inflated estimates of treatment effectiveness, especially when effect sizes are small.

Spatial Confidence Sets for Raw Effect Size Images

10.1101/631473 ◽

2019 ◽

Author(s):

Alexander Bowring ◽

Fabian Telschow ◽

Armin Schwartzman ◽

Thomas E. Nichols

Keyword(s):

Sample Size ◽

Effect Size ◽

Null Hypothesis ◽

Statistical Power ◽

Memory Task ◽

Small Sample ◽

Spatial Uncertainty ◽

Practical Implementation ◽

Confidence Sets ◽

Human Connectome Project

AbstractThe mass-univariate approach for functional magnetic resonance imagery (fMRI) analysis remains a widely used and fundamental statistical tool within neuroimaging. However, this method suffers from at least two fundamental limitations: First, with sample sizes growing to 4, 5 or even 6 digits, the entire approach is undermined by the null hypothesis fallacy, i.e. with sufficient sample size, there is high enough statistical power to reject the null hypothesis everywhere, making it difficult if not impossible to localize effects of interest. Second, with any sample size, when cluster-size inference is used a significant p-value only indicates that a cluster is larger than chance, and no notion of spatial uncertainty is provided. Therefore, no perception of confidence is available to express the size or location of a cluster that could be expected with repeated sampling from the population.In this work, we address these issues by extending on a method proposed by Sommerfeld, Sain, and Schwartzman (2018) to develop spatial Confidence Sets (CSs) on clusters found in thresholded raw effect size maps. While hypothesis testing indicates where the null, i.e. a raw effect size of zero, can be rejected, the CSs give statements on the locations where raw effect sizes exceed, and fall short of, a non-zero threshold, providing both an upper and lower CS.While the method can be applied to any parameter in a mass-univariate General Linear Model, we motivate the method in the context of BOLD fMRI contrast maps for inference on percentage BOLD change raw effects. We propose several theoretical and practical implementation advancements to the original method in order to deliver an improved performance in small-sample settings. We validate the method with 3D Monte Carlo simulations that resemble fMRI data. Finally, we compute CSs for the Human Connectome Project working memory task contrast images, illustrating the brain regions that show a reliable %BOLD change for a given %BOLD threshold.

Effect size, confidence intervals and statistical power in psychological research

Psychology in Russia State of Art ◽

10.11621/pir.2015.0303 ◽

2015 ◽

Vol 8 (3) ◽

pp. 27-46 ◽

Cited By ~ 5

Author(s):

Arnoldo Téllez ◽

Cirilo H. García ◽

Victor Corral-Verdugo

Keyword(s):

Confidence Intervals ◽

Effect Size ◽

Statistical Power ◽

Psychological Research

Null Hypothesis Significance Testing: a short tutorial

10.7287/peerj.preprints.1050 ◽

2015 ◽

Author(s):

Cyril R Pernet

Keyword(s):

Social Sciences ◽

Confidence Intervals ◽

Effect Size ◽

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Good Practices ◽

Interpretation Errors ◽

Statistical Issues ◽

Bayesian Factor

Although thoroughly criticized, null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely. In this short tutorial, I first summarize the concepts behind the method while pointing to common interpretation errors. I then present the related concepts of confidence intervals, effect size, and Bayesian factor, and discuss what should be reported in which context. The goal is to clarify concepts, present statistical issues that researchers face using the NHST framework and highlight good practices.

Hypothesis Testing, p Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

Educational and Psychological Measurement ◽

10.1177/0013164416667983 ◽

2016 ◽

Vol 77 (4) ◽

pp. 673-689 ◽

Cited By ~ 4

Author(s):

Rand R. Wilcox ◽

Sarfaraz Serang

Keyword(s):

Hypothesis Testing ◽

Confidence Intervals ◽

Bayesian Methods ◽

Effect Size ◽

Null Hypothesis ◽

P Values ◽

Null Hypothesis Testing ◽

Robust Techniques ◽

Alternative Techniques ◽

Robust Statistical Methods

The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and p values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have occurred during the past 50 years. There are, of course, limitations to what null hypothesis testing and p values reveal about data. But modern advances make it clear that there are serious limitations and concerns associated with conventional confidence intervals, standard Bayesian methods, and commonly used measures of effect size. Many of these concerns can be addressed using modern robust methods.

Null Hypothesis Significance Testing: a short tutorial

10.7287/peerj.preprints.1050v1 ◽

2015 ◽

Author(s):

Cyril R Pernet

Keyword(s):

Social Sciences ◽

Confidence Intervals ◽

Effect Size ◽

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Good Practices ◽

Interpretation Errors ◽

Statistical Issues ◽

Bayesian Factor

Although thoroughly criticized, null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely. In this short tutorial, I first summarize the concepts behind the method while pointing to common interpretation errors. I then present the related concepts of confidence intervals, effect size, and Bayesian factor, and discuss what should be reported in which context. The goal is to clarify concepts, present statistical issues that researchers face using the NHST framework and highlight good practices.

Null Hypothesis Significance Testing: a short tutorial

10.7287/peerj.preprints.1050v2 ◽

2015 ◽

Author(s):

Cyril R Pernet

Keyword(s):

Social Sciences ◽

Confidence Intervals ◽

Effect Size ◽

Null Hypothesis ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Good Practices ◽

Interpretation Errors ◽

Statistical Issues ◽

Bayesian Factor

Although thoroughly criticized, null hypothesis significance testing is the statistical method of choice in biological, biomedical and social sciences to investigate if an effect is likely. In this short tutorial, I first summarize the concepts behind the method while pointing to common interpretation errors. I then present the related concepts of confidence intervals, effect size, and Bayesian factor, and discuss what should be reported in which context. The goal is to clarify concepts, present statistical issues that researchers face using the NHST framework and highlight good practices.

Précis of Statistical significance: Rationale, validity, and utility

Behavioral and Brain Sciences ◽

10.1017/s0140525x98001162 ◽

1998 ◽

Vol 21 (2) ◽

pp. 169-194 ◽

Cited By ~ 44

Author(s):

Siu L. Chow

Keyword(s):

Effect Size ◽

Null Hypothesis ◽

Statistical Power ◽

Statistical Significance ◽

Research Result ◽

Practical Importance ◽

Test Procedure ◽

Statistical Hypothesis ◽

Interval Estimate ◽

Test Statistic

The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.

Application of Statistical Power Analysis to the Oregon Coho Salmon (Oncorhynchus kisutch) Problem

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f89-154 ◽

1989 ◽

Vol 46 (7) ◽

pp. 1183-1187 ◽

Cited By ~ 30

Author(s):

Randall M. Peterman

Keyword(s):

Null Hypothesis ◽

Power Analysis ◽

Statistical Power ◽

Coho Salmon ◽

Alternative Hypothesis ◽

Oncorhynchus Kisutch ◽

Current Data ◽

Density Dependent ◽

Consistent Increase ◽

Marine Survival

Nickelson (1986; Can. J. Fish. Aquat. Sci. 43: 527–535) was unable to reject the null hypothesis (Ho) of density-independent marine survival rate for Oregon coho salmon (Oncorhynchus kisutch) when wild, private hatchery, and public hatchery stocks were analyzed separately. Thus, even though there appears to have been no consistent increase in adult abundance in recent years in spite of large increases in smolt abundance, Nickelson's analysis does not support the alternative hypothesis (HA) of density-dependent marine survival. Some fishery managers are using Nickelson's results to support proposals to increase smolt production further. I calculated statistical power for these cases, i.e. the probability that the null hypothesis of density-independence could have been rejected, even if marine survival were truly density-dependent. Power was below 0.19 for all cases, which meant that Nickelson (1986) had at least an 81% chance of making a Type II error (incorrectly accepting Ho), if Ho was actually false. Therefore, Oregon fishery managers should be cautious about making decisions on increased smolt production based on current data; they run a high risk of mistakenly assuming density-independent marine survival. More generally, managers should not take action based on a failure to reject a null hypothesis unless power is high.