scholarly journals Spatial Confidence Sets for Raw Effect Size Images

2019 ◽  
Author(s):  
Alexander Bowring ◽  
Fabian Telschow ◽  
Armin Schwartzman ◽  
Thomas E. Nichols

AbstractThe mass-univariate approach for functional magnetic resonance imagery (fMRI) analysis remains a widely used and fundamental statistical tool within neuroimaging. However, this method suffers from at least two fundamental limitations: First, with sample sizes growing to 4, 5 or even 6 digits, the entire approach is undermined by the null hypothesis fallacy, i.e. with sufficient sample size, there is high enough statistical power to reject the null hypothesis everywhere, making it difficult if not impossible to localize effects of interest. Second, with any sample size, when cluster-size inference is used a significant p-value only indicates that a cluster is larger than chance, and no notion of spatial uncertainty is provided. Therefore, no perception of confidence is available to express the size or location of a cluster that could be expected with repeated sampling from the population.In this work, we address these issues by extending on a method proposed by Sommerfeld, Sain, and Schwartzman (2018) to develop spatial Confidence Sets (CSs) on clusters found in thresholded raw effect size maps. While hypothesis testing indicates where the null, i.e. a raw effect size of zero, can be rejected, the CSs give statements on the locations where raw effect sizes exceed, and fall short of, a non-zero threshold, providing both an upper and lower CS.While the method can be applied to any parameter in a mass-univariate General Linear Model, we motivate the method in the context of BOLD fMRI contrast maps for inference on percentage BOLD change raw effects. We propose several theoretical and practical implementation advancements to the original method in order to deliver an improved performance in small-sample settings. We validate the method with 3D Monte Carlo simulations that resemble fMRI data. Finally, we compute CSs for the Human Connectome Project working memory task contrast images, illustrating the brain regions that show a reliable %BOLD change for a given %BOLD threshold.

2020 ◽  
Author(s):  
Alexander Bowring ◽  
Fabian Telschow ◽  
Armin Schwartzman ◽  
Thomas E. Nichols

AbstractCurrent statistical inference methods for task-fMRI suffer from two fundamental limitations. First, the focus is solely on detection of non-zero signal or signal change, a problem that is exasperated for large scale studies (e.g. UK Biobank, N = 40, 000+) where the ‘null hypothesis fallacy’ causes even trivial effects to be determined as significant. Second, for any sample size, widely used cluster inference methods only indicate regions where a null hypothesis can be rejected, without providing any notion of spatial uncertainty about the activation. In this work, we address these issues by developing spatial Confidence Sets (CSs) on clusters found in thresholded Cohen’s d effect size images. We produce an upper and lower CS to make confidence statements about brain regions where Cohen’s d effect sizes have exceeded and fallen short of a non-zero threshold, respectively. The CSs convey information about the magnitude and reliability of effect sizes that is usually given separately in a t-statistic and effect estimate map. We expand the theory developed in our previous work on CSs for %BOLD change effect maps (Bowring et al., 2019) using recent results from the bootstrapping literature. By assessing the empirical coverage with 2D and 3D Monte Carlo simulations resembling fMRI data, we find our method is accurate in sample sizes as low as N = 60. We compute Cohen’s d CSs for the Human Connectome Project working memory taskfMRI data, illustrating the brain regions with a reliable Cohen’s d response for a given threshold. By comparing the CSs with results obtained from a traditional statistical voxelwise inference, we highlight the improvement in activation localization that can be gained with the Confidence Sets.


2017 ◽  
Vol 6 (6) ◽  
pp. 158
Author(s):  
Louis Mutter ◽  
Steven B. Kim

There are numerous statistical hypothesis tests for categorical data including Pearson's Chi-Square goodness-of-fit test and other discrete versions of goodness-of-fit tests. For these hypothesis tests, the null hypothesis is simple, and the alternative hypothesis is composite which negates the simple null hypothesis. For power calculation, a researcher specifies a significance level, a sample size, a simple null hypothesis, and a simple alternative hypothesis. In practice, there are cases when an experienced researcher has deep and broad scientific knowledge, but the researcher may suffer from a lack of statistical power due to a small sample size being available. In such a case, we may formulate hypothesis testing based on a simple alternative hypothesis instead of the composite alternative hypothesis. In this article, we investigate how much statistical power can be gained via a correctly specified simple alternative hypothesis and how much statistical power can be lost under a misspecified alternative hypothesis, particularly when an available sample size is small.


1990 ◽  
Vol 47 (1) ◽  
pp. 2-15 ◽  
Author(s):  
Randall M. Peterman

Ninety-eight percent of recently surveyed papers in fisheries and aquatic sciences that did not reject some null hypothesis (H0) failed to report β, the probability of making a type II error (not rejecting H0 when it should have been), or statistical power (1 – β). However, 52% of those papers drew conclusions as if H0 were true. A false H0 could have been missed because of a low-power experiment, caused by small sample size or large sampling variability. Costs of type II errors can be large (for example, for cases that fail to detect harmful effects of some industrial effluent or a significant effect of fishing on stock depletion). Past statistical power analyses show that abundance estimation techniques usually have high β and that only large effects are detectable. I review relationships among β, power, detectable effect size, sample size, and sampling variability. I show how statistical power analysis can help interpret past results and improve designs of future experiments, impact assessments, and management regulations. I make recommendations for researchers and decision makers, including routine application of power analysis, more cautious management, and reversal of the burden of proof to put it on industry, not management agencies.


2005 ◽  
Vol 62 (12) ◽  
pp. 2716-2726 ◽  
Author(s):  
Michael J Bradford ◽  
Josh Korman ◽  
Paul S Higgins

There is considerable uncertainty about the effectiveness of fish habitat restoration programs, and reliable monitoring programs are needed to evaluate them. Statistical power analysis based on traditional hypothesis tests are usually used for monitoring program design, but here we argue that effect size estimates and their associated confidence intervals are more informative because results can be compared with both the null hypothesis of no effect and effect sizes of interest, such as restoration goals. We used a stochastic simulation model to compare alternative monitoring strategies for a habitat alteration that would change the productivity and capacity of a coho salmon (Oncorhynchus kisutch) producing stream. Estimates of the effect size using a freshwater stock–recruit model were more precise than those from monitoring the abundance of either spawners or smolts. Less than ideal monitoring programs can produce ambiguous results, which are cases in which the confidence interval includes both the null hypothesis and the effect size of interest. Our model is a useful planning tool because it allows the evaluation of the utility of different types of monitoring data, which should stimulate discussion on how the results will ultimately inform decision-making.


2019 ◽  
Author(s):  
Rob Cribbie ◽  
Nataly Beribisky ◽  
Udi Alter

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is often difficult but unavoidable for conducting the procedure properly, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact psychology journals in 2016 and 2017. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.


2021 ◽  
Vol 3 (1) ◽  
pp. 61-89
Author(s):  
Stefan Geiß

Abstract This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.


2020 ◽  
Author(s):  
Chia-Lung Shih ◽  
Te-Yu Hung

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.


2005 ◽  
Vol 35 (1) ◽  
pp. 1-20 ◽  
Author(s):  
G. K. Huysamen

Criticisms of traditional null hypothesis significance testing (NHST) became more pronounced during the 1960s and reached a climax during the past decade. Among others, NHST says nothing about the size of the population parameter of interest and its result is influenced by sample size. Estimation of confidence intervals around point estimates of the relevant parameters, model fitting and Bayesian statistics represent some major departures from conventional NHST. Testing non-nil null hypotheses, determining optimal sample size to uncover only substantively meaningful effect sizes and reporting effect-size estimates may be regarded as minor extensions of NHST. Although there seems to be growing support for the estimation of confidence intervals around point estimates of the relevant parameters, it is unlikely that NHST-based procedures will disappear in the near future. In the meantime, it is widely accepted that effect-size estimates should be reported as a mandatory adjunct to conventional NHST results.


2020 ◽  
pp. 28-63
Author(s):  
A. G. Vinogradov

The article belongs to a special modern genre of scholar publications, so-called tutorials – articles devoted to the application of the latest methods of design, modeling or analysis in an accessible format in order to disseminate best practices. The article acquaints Ukrainian psychologists with the basics of using the R programming language to the analysis of empirical research data. The article discusses the current state of world psychology in connection with the Crisis of Confidence, which arose due to the low reproducibility of empirical research. This problem is caused by poor quality of psychological measurement tools, insufficient attention to adequate sample planning, typical statistical hypothesis testing practices, and so-called “questionable research practices.” The tutorial demonstrates methods for determining the sample size depending on the expected magnitude of the effect size and desired statistical power, performing basic variable transformations and statistical analysis of psychological research data using language and environment R. The tutorial presents minimal system of R functions required to carry out: modern analysis of reliability of measurement scales, sample size calculation, point and interval estimation of effect size for four the most widespread in psychology designs for the analysis of two variables’ interdependence. These typical problems include finding the differences between the means and variances in two or more samples, correlations between continuous and categorical variables. Practical information on data preparation, import, basic transformations, and application of basic statistical methods in the cloud version of RStudio is provided.


Sign in / Sign up

Export Citation Format

Share Document