Including Data Analytical Stability in Cluster-based Inference

AbstractIn the statistical analysis of functional Magnetic Resonance Imaging (fMRI) brain data it remains a challenge to account for simultaneously testing activation in over 100.000 volume units or voxels. A popular method that reduces the dimensionality of this test problem is cluster-based inference. We propose a new testing procedure that allows to control the family-wise error (FWE) rate at the cluster level but improves cluster-based test decisions in two ways by (1) taking into account a measure for data analytical stability and (2) allowing a more voxel-based interpretation of the results. For each voxel, we define the re-selection rate conditional on a given FWE-corrected threshold and use this rate, which is a measure of stability, into the selection process. In our procedure, we set a more liberal and a more conservative FWE controlling threshold. Clusters that survive the liberal but not the conservative threshold are retained if sufficient evidence for voxelwise stability is available. Cluster that survive the conservative threshold are retained anyhow, and clusters that do not survive the liberal threshold are not further considered. Using the Human Connectome Project Data (Van Essen et al., 2012), we demonstrate how in a group analysis our method results not only in a higher number of selected voxels but also in a larger overlap between different test images. Additionally, we demonstrate the ability of our procedure to control the FWE, also in relatively small sample sizes.

Download Full-text

fMRI can be highly reliable, but it depends on what you measure

10.31234/osf.io/9eaxk ◽

2020 ◽

Cited By ~ 4

Author(s):

Philip A. Kragel ◽

Xiaochun Han ◽

Thomas Edward Kraynak ◽

Peter J. Gianaros ◽

Tor D Wager

Keyword(s):

Individual Differences ◽

Region Of Interest ◽

Small Sample ◽

Measurement Models ◽

Human Connectome Project ◽

Show Evidence ◽

Small Sample Sizes ◽

Project Data ◽

Test Retest Reliability ◽

Task Types

Elliot and colleagues (2020) systematically evaluated the reliability of individual differences in task-based fMRI activity and found reliability to be poor. Here we demonstrate that task-based fMRI can be quite reliable, and that the small sample sizes, task types, and dated region of interest measures used in Elliot et al. lead to an overly negative picture. We show evidence from recent studies using multivariate models in larger samples, which have short-term test-retest reliability in the “excellent” range (ICC > 0.75). These include 8 fMRI studies of pain and a large study of affective images (N > 300). In addition, while some use cases for biomarkers require reliable individual differences, others do not. They require only that fMRI measures serve as reliable indicators of the presence of a mental state or event, which we term ‘task reliability’. In a re-analysis of the Human Connectome Project data reported in Elliot et al., we show excellent task reliability across roughly 4 months. Despite difficulties with some experimental paradigms and measurement models, the future is bright for fMRI research focused on biomarker development.

Download Full-text

Effect Size and Power in fMRI Group Analysis

10.1101/295048 ◽

2018 ◽

Cited By ~ 18

Author(s):

Stephan Geuter ◽

Guanghao Qi ◽

Robert C. Welsh ◽

Tor D. Wager ◽

Martin A. Lindquist

Keyword(s):

Sample Size ◽

Group Analysis ◽

Population Level ◽

Small Sample ◽

Effect Sizes ◽

Medium Effect ◽

Single Subject ◽

Parameter Estimates ◽

Sample Sizes ◽

Human Connectome Project

AbstractMulti-subject functional magnetic resonance imaging (fMRI) analysis is often concerned with determining whether there exists a significant population-wide ‘activation’ in a comparison between two or more conditions. Typically this is assessed by testing the average value of a contrast of parameter estimates (COPE) against zero in a general linear model (GLM) analysis. In this work we investigate several aspects of this type of analysis. First, we study the effects of sample size on the sensitivity and reliability of the group analysis, allowing us to evaluate the ability of small sampled studies to effectively capture population-level effects of interest. Second, we assess the difference in sensitivity and reliability when using volumetric or surface based data. Third, we investigate potential biases in estimating effect sizes as a function of sample size. To perform this analysis we utilize the task-based fMRI data from the 500-subject release from the Human Connectome Project (HCP). We treat the complete collection of subjects (N = 491) as our population of interest, and perform a single-subject analysis on each subject in the population. We investigate the ability to recover population level effects using a subset of the population and standard analytical techniques. Our study shows that sample sizes of 40 are generally able to detect regions with high effect sizes (Cohen’s d > 0.8), while sample sizes closer to 80 are required to reliably recover regions with medium effect sizes (0.5 < d < 0.8). We find little difference in results when using volumetric or surface based data with respect to standard mass-univariate group analysis. Finally, we conclude that special care is needed when estimating effect sizes, particularly for small sample sizes.

Download Full-text

On the Validity of Tests for Asymmetry in Residual-Based Threshold Cointegration Models

Econometrics ◽

10.3390/econometrics7010012 ◽

2019 ◽

Vol 7 (1) ◽

pp. 12

Author(s):

Karl-Heinz Schild ◽

Karsten Schweikert

Keyword(s):

Empirical Studies ◽

Small Sample ◽

Sufficient Evidence ◽

Speed Of Adjustment ◽

Conditional Least Squares ◽

Long Run ◽

Size Distortions ◽

Small Sample Sizes ◽

Validity Of Tests ◽

Correct Size

This paper investigates the properties of tests for asymmetric long-run adjustment which are often applied in empirical studies on asymmetric price transmissions. We show that substantial size distortions are caused by preconditioning the test on finding sufficient evidence for cointegration in a first step. The extent of oversizing the test for long-run asymmetry depends inversely on the power of the primary cointegration test. Hence, tests for long-run asymmetry become invalid in cases of small sample sizes or slow speed of adjustment. Further, we provide simulation evidence that tests for long-run asymmetry are generally oversized if the threshold parameter is estimated by conditional least squares and show that bootstrap techniques can be used to obtain the correct size.

Download Full-text

Problems with small sample sizes in psychophysiological research

PsycEXTRA Dataset ◽

10.1037/e526132012-267 ◽

1996 ◽

Author(s):

Todd C. Riniolo ◽

Stephen W. Porges

Keyword(s):

Small Sample ◽

Sample Sizes ◽

Psychophysiological Research ◽

Small Sample Sizes

Download Full-text

Bayesian Latent Growth Mixture-Modeling With Small Sample Sizes

PsycEXTRA Dataset ◽

10.1037/e568142014-001 ◽

2014 ◽

Author(s):

Sarah Depaoli

Keyword(s):

Growth Mixture Modeling ◽

Mixture Modeling ◽

Small Sample ◽

Sample Sizes ◽

Latent Growth ◽

Growth Mixture ◽

Latent Growth Mixture Modeling ◽

Small Sample Sizes

Download Full-text

Integrating Visual and Bayesian Statistical Analyses in Single Case Experimental Research to Evaluate the Effectiveness and Magnitude of a Comprehensive Behavioral Intervention

10.31234/osf.io/54cj3 ◽

2018 ◽

Author(s):

Prathiba Natesan ◽

Smita Mehta

Keyword(s):

Behavioral Intervention ◽

Effect Size ◽

Rate Ratio ◽

Visual Analysis ◽

Single Case ◽

Small Sample ◽

Statistical Analyses ◽

Data Types ◽

Ratio Effect ◽

Small Sample Sizes

Single case experimental designs (SCEDs) have become an indispensable methodology where randomized control trials may be impossible or even inappropriate. However, the nature of SCED data presents challenges for both visual and statistical analyses. Small sample sizes, autocorrelations, data types, and design types render many parametric statistical analyses and maximum likelihood approaches ineffective. The presence of autocorrelation decreases interrater reliability in visual analysis. The purpose of the present study is to demonstrate a newly developed model called the Bayesian unknown change-point (BUCP) model which overcomes all the above-mentioned data analytic challenges. This is the first study to formulate and demonstrate rate ratio effect size for autocorrelated data, which has remained an open question in SCED research until now. This expository study also compares and contrasts the results from BUCP model with visual analysis, and rate ratio effect size with nonoverlap of all pairs (NAP) effect size. Data from a comprehensive behavioral intervention are used for the demonstration.

Download Full-text

No Evidence that Experiencing Physical Warmth Promotes Interpersonal Warmth: Two Failures to Replicate Williams and Bargh (2008)

10.31234/osf.io/mvn9b ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher Chabris ◽

Patrick Ryan Heck ◽

Jaclyn Mandart ◽

Daniel Jacob Benjamin ◽

Daniel J. Simons

Keyword(s):

Null Hypothesis ◽

Small Sample ◽

Sample Sizes ◽

Double Blind ◽

Bayesian Analyses ◽

Physical Warmth ◽

Small Sample Sizes ◽

Interpersonal Warmth

Williams and Bargh (2008) reported that holding a hot cup of coffee caused participants to judge a person’s personality as warmer, and that holding a therapeutic heat pad caused participants to choose rewards for other people rather than for themselves. These experiments featured large effects (r = .28 and .31), small sample sizes (41 and 53 participants), and barely statistically significant results. We attempted to replicate both experiments in field settings with more than triple the sample sizes (128 and 177) and double-blind procedures, but found near-zero effects (r = –.03 and .02). In both cases, Bayesian analyses suggest there is substantially more evidence for the null hypothesis of no effect than for the original physical warmth priming hypothesis.

Download Full-text

Non-Invasive Monitoring of Adrenocortical Activity in Three Sympatric Desert Gerbil Species

Animals ◽

10.3390/ani11010075 ◽

2021 ◽

Vol 11 (1) ◽

pp. 75

Author(s):

Álvaro Navarro-Castilla ◽

Mario Garrido ◽

Hadas Hawlena ◽

Isabel Barja

Keyword(s):

Small Sample ◽

Post Injection ◽

Control Groups ◽

Treatment Groups ◽

Adrenocortical Activity ◽

Non Invasive ◽

Endocrine Status ◽

Corticosterone Metabolites ◽

Small Sample Sizes ◽

Fecal Corticosterone Metabolites

The study of the endocrine status can be useful to understand wildlife responses to the changing environment. Here, we validated an enzyme immunoassay (EIA) to non-invasively monitor adrenocortical activity by measuring fecal corticosterone metabolites (FCM) in three sympatric gerbil species (Gerbillus andersoni, G. gerbillus and G. pyramidum) from the Northwestern Negev Desert’s sands (Israel). Animals included into treatment groups were injected with adrenocorticotropic hormone (ACTH) to stimulate adrenocortical activity, while control groups received a saline solution. Feces were collected at different intervals and FCM were quantified by an EIA. Basal FCM levels were similar in the three species. The ACTH effect was evidenced, but the time of FCM peak concentrations appearance differed between the species (6–24 h post-injection). Furthermore, FCM peak values were observed sooner in G. andersoni females than in males (6 h and 18 h post-injection, respectively). G. andersoni and G. gerbillus males in control groups also increased FCM levels (18 h and 48 h post-injection, respectively). Despite the small sample sizes, our results confirmed the EIA suitability for analyzing FCM in these species as a reliable indicator of the adrenocortical activity. This study also revealed that close species, and individuals within a species, can respond differently to the same stressor.

Download Full-text

Perceived friendship and binge drinking in young adults: A study of the Human Connectome Project data

Drug and Alcohol Dependence ◽

10.1016/j.drugalcdep.2021.108731 ◽

2021 ◽

Vol 224 ◽

pp. 108731

Author(s):

Guangfei Li ◽

Yu Chen ◽

Thang M. Le ◽

Simon Zhornitsky ◽

Wuyi Wang ◽

...

Keyword(s):

Young Adults ◽

Binge Drinking ◽

Human Connectome Project ◽

Project Data

Download Full-text

Use of Antiplatelet Agents Decreases the Positive Predictive Value of Fecal Immunochemical Tests for Colorectal Cancer but Does Not Affect Their Sensitivity

Journal of Personalized Medicine ◽

10.3390/jpm11060497 ◽

2021 ◽

Vol 11 (6) ◽

pp. 497

Author(s):

Yoonsuk Jung ◽

Eui Im ◽

Jinhee Lee ◽

Hyeah Lee ◽

Changmo Moon

Keyword(s):

Colorectal Cancer ◽

Large Scale ◽

Screening Program ◽

Antiplatelet Agents ◽

Population Based ◽

Small Sample ◽

Antithrombotic Agents ◽

Predictive Values ◽

Prescription Rates ◽

Small Sample Sizes

Previous studies have evaluated the effects of antithrombotic agents on the performance of fecal immunochemical tests (FITs) for the detection of colorectal cancer (CRC), but the results were inconsistent and based on small sample sizes. We studied this topic using a large-scale population-based database. Using the Korean National Cancer Screening Program Database, we compared the performance of FITs for CRC detection between users and non-users of antiplatelet agents and warfarin. Non-users were matched according to age and sex. Among 5,426,469 eligible participants, 768,733 used antiplatelet agents (mono/dual/triple therapy, n = 701,683/63,211/3839), and 19,569 used warfarin, while 4,638,167 were non-users. Among antiplatelet agents, aspirin, clopidogrel, and cilostazol ranked first, second, and third, respectively, in terms of prescription rates. Users of antiplatelet agents (3.62% vs. 4.45%; relative risk (RR): 0.83; 95% confidence interval (CI): 0.78–0.88), aspirin (3.66% vs. 4.13%; RR: 0.90; 95% CI: 0.83–0.97), and clopidogrel (3.48% vs. 4.88%; RR: 0.72; 95% CI: 0.61–0.86) had lower positive predictive values (PPVs) for CRC detection than non-users. However, there were no significant differences in PPV between cilostazol vs. non-users and warfarin users vs. non-users. For PPV, the RR (users vs. non-users) for antiplatelet monotherapy was 0.86, while the RRs for dual and triple antiplatelet therapies (excluding cilostazol) were 0.67 and 0.22, respectively. For all antithrombotic agents, the sensitivity for CRC detection was not different between users and non-users. Use of antiplatelet agents, except cilostazol, may increase the false positives without improving the sensitivity of FITs for CRC detection.

Download Full-text