scholarly journals Statistical power: implications for planning MEG studies

2019 ◽  
Author(s):  
Maximilien Chaumon ◽  
Aina Puce ◽  
Nathalie George

AbstractStatistical power is key for robust, replicable science. Here, we systematically explored how numbers of trials and subjects affect statistical power in MEG sensor-level data. More specifically, we simulated “experiments” using the MEG resting-state dataset of the Human Connectome Project (HCP). We divided the data in two conditions, injected a dipolar source at a known anatomical location in the “signal condition”, but not in the “noise condition”, and detected significant differences at sensor level with classical paired t-tests across subjects. Group-level detectability of these simulated effects varied drastically with anatomical origin. We thus examined in detail which spatial properties of the sources affected detectability, looking specifically at the distance from closest sensor and orientation of the source, and at the variability of these parameters across subjects. In line with previous single-subject studies, we found that the most detectable effects originate from source locations that are closest to the sensors and oriented tangentially with respect to the head surface. In addition, cross-subject variability in orientation also affected group-level detectability, boosting detection in regions where this variability was small and hindering detection in regions where it was large. Incidentally, we observed a considerable covariation of source position, orientation, and their cross-subject variability in individual brain anatomical space, making it difficult to assess the impact of each of these variables independently of one another. We thus also performed simulations where we controlled spatial properties independently of individual anatomy. These additional simulations confirmed the strong impact of distance and orientation and further showed that orientation variability across subjects affects detectability, whereas position variability does not.Importantly, our study indicates that strict unequivocal recommendations as to the ideal number of trials and subjects for any experiment cannot be realistically provided for neurophysiological studies. Rather, it highlights the importance of considering the spatial constraints underlying expected sources of activity while designing experiments.HighlightsAdequate sample size (number of subjects and trials) is key to robust neuroscienceWe simulated evoked MEG experiments and examined sensor-level detectabilityStatistical power varied by source distance, orientation & between-subject variabilityConsider source detectability at sensor-level when designing MEG studiesSample size for MEG studies? Consider source with lowest expected statistical power

2016 ◽  
Author(s):  
Joke Durnez ◽  
Jasper Degryse ◽  
Beatrijs Moerkerke ◽  
Ruth Seurinck ◽  
Vanessa Sochat ◽  
...  

HighlightsThe manuscript presents a method to calculate sample sizes for fMRI experimentsThe power analysis is based on the estimation of the mixture distribution of null and active peaksThe methodology is validated with simulated and real data.1AbstractMounting evidence over the last few years suggest that published neuroscience research suffer from low power, and especially for published fMRI experiments. Not only does low power decrease the chance of detecting a true effect, it also reduces the chance that a statistically significant result indicates a true effect (Ioannidis, 2005). Put another way, findings with the least power will be the least reproducible, and thus a (prospective) power analysis is a critical component of any paper. In this work we present a simple way to characterize the spatial signal in a fMRI study with just two parameters, and a direct way to estimate these two parameters based on an existing study. Specifically, using just (1) the proportion of the brain activated and (2) the average effect size in activated brain regions, we can produce closed form power calculations for given sample size, brain volume and smoothness. This procedure allows one to minimize the cost of an fMRI experiment, while preserving a predefined statistical power. The method is evaluated and illustrated using simulations and real neuroimaging data from the Human Connectome Project. The procedures presented in this paper are made publicly available in an online web-based toolbox available at www.neuropowertools.org.


2019 ◽  
Author(s):  
Ivan Alvarez ◽  
Andrew J. Parker ◽  
Holly Bridge

1AbstractStudies of changes in cerebral neocortical thickness often rely on small control samples for comparison with specific populations with abnormal visual systems. We present a normative dataset for FreeSurfer-derived cortical thickness across 25 human visual areas derived from 960 participants in the Human Connectome Project. Cortical thickness varies systematically across visual areas, in broad agreement with canonical visual system hierarchies in the dorsal and ventral pathways. In addition, cortical thickness estimates show consistent within-subject variability and reliability. Importantly, cortical thickness estimates in visual areas are well described by a normal distribution, making them amenable to direct statistical comparison.HighlightsNormative neocortical thickness values for human visual areas measured with FreeSurferA gradient of increasing neocortical thickness with visual area hierarchyConsistent within- and between-subject variability in neocortical thickness across visual areas


2018 ◽  
Author(s):  
Stephan Geuter ◽  
Guanghao Qi ◽  
Robert C. Welsh ◽  
Tor D. Wager ◽  
Martin A. Lindquist

AbstractMulti-subject functional magnetic resonance imaging (fMRI) analysis is often concerned with determining whether there exists a significant population-wide ‘activation’ in a comparison between two or more conditions. Typically this is assessed by testing the average value of a contrast of parameter estimates (COPE) against zero in a general linear model (GLM) analysis. In this work we investigate several aspects of this type of analysis. First, we study the effects of sample size on the sensitivity and reliability of the group analysis, allowing us to evaluate the ability of small sampled studies to effectively capture population-level effects of interest. Second, we assess the difference in sensitivity and reliability when using volumetric or surface based data. Third, we investigate potential biases in estimating effect sizes as a function of sample size. To perform this analysis we utilize the task-based fMRI data from the 500-subject release from the Human Connectome Project (HCP). We treat the complete collection of subjects (N = 491) as our population of interest, and perform a single-subject analysis on each subject in the population. We investigate the ability to recover population level effects using a subset of the population and standard analytical techniques. Our study shows that sample sizes of 40 are generally able to detect regions with high effect sizes (Cohen’s d > 0.8), while sample sizes closer to 80 are required to reliably recover regions with medium effect sizes (0.5 < d < 0.8). We find little difference in results when using volumetric or surface based data with respect to standard mass-univariate group analysis. Finally, we conclude that special care is needed when estimating effect sizes, particularly for small sample sizes.


2019 ◽  
Author(s):  
Alexander Bowring ◽  
Fabian Telschow ◽  
Armin Schwartzman ◽  
Thomas E. Nichols

AbstractThe mass-univariate approach for functional magnetic resonance imagery (fMRI) analysis remains a widely used and fundamental statistical tool within neuroimaging. However, this method suffers from at least two fundamental limitations: First, with sample sizes growing to 4, 5 or even 6 digits, the entire approach is undermined by the null hypothesis fallacy, i.e. with sufficient sample size, there is high enough statistical power to reject the null hypothesis everywhere, making it difficult if not impossible to localize effects of interest. Second, with any sample size, when cluster-size inference is used a significant p-value only indicates that a cluster is larger than chance, and no notion of spatial uncertainty is provided. Therefore, no perception of confidence is available to express the size or location of a cluster that could be expected with repeated sampling from the population.In this work, we address these issues by extending on a method proposed by Sommerfeld, Sain, and Schwartzman (2018) to develop spatial Confidence Sets (CSs) on clusters found in thresholded raw effect size maps. While hypothesis testing indicates where the null, i.e. a raw effect size of zero, can be rejected, the CSs give statements on the locations where raw effect sizes exceed, and fall short of, a non-zero threshold, providing both an upper and lower CS.While the method can be applied to any parameter in a mass-univariate General Linear Model, we motivate the method in the context of BOLD fMRI contrast maps for inference on percentage BOLD change raw effects. We propose several theoretical and practical implementation advancements to the original method in order to deliver an improved performance in small-sample settings. We validate the method with 3D Monte Carlo simulations that resemble fMRI data. Finally, we compute CSs for the Human Connectome Project working memory task contrast images, illustrating the brain regions that show a reliable %BOLD change for a given %BOLD threshold.


2019 ◽  
Author(s):  
Ritu Bhandari ◽  
Valeria Gazzola ◽  
Christian Keysers

AbstractMultiband (MB) acceleration of functional magnetic resonance imaging has become more widely available to neuroscientists. Here we compare MB factors of 1, 2 and 4 while participants view complex hand actions vs. simpler hand movements to localize the action observation network. While in a previous study, we show that MB4 shows moderate improvements in the group-level statistics, here we explore the impact it has on single subject statistics. We find that MB4 provides an increase in p values at the first level that is of medium effect size compared to MB1, providing moderate evidence across a number of voxels that MB4 indeed improves single subject statistics. This effect was localized mostly within regions that belong to the action observation network. In parallel, we find that Cohen’s d at the single subject level actually decreases using MB4 compared to MB1. Intriguingly, we find that subsampling MB4 sequences, by only considering every fourth acquired volume, also leads to increased Cohen’s d values, suggesting that the FAST algorithm we used to correct for temporal auto-correlation may over-penalize sequences with higher temporal autocorrelation, thereby underestimating the potential gains in single subject statistics offered by MB acceleration, and alternative methods should be explored. In summary, considering the moderate gains in statistical values observed both at the group level in our previous study and at the single subject level in this study, we believe that MB technology is now ripe for neuroscientists to start using MB4 acceleration for their studies, be it to accurately map activity in single subjects of interest (e.g. for presurgical planning or to explore rare patients) or for the purpose of group studies.


2021 ◽  
Author(s):  
Philip Griffiths ◽  
Joel Sims ◽  
Abi Williams ◽  
Nicola Williamson ◽  
David Cella ◽  
...  

Abstract Purpose: Treatment benefit as assessed using clinical outcome assessments (COAs), is a key endpoint in many clinical trials at both the individual and group level. Anchor-based methods can aid interpretation of COA change scores beyond statistical significance, and help derive a meaningful change threshold (MCT). However, evidence-based guidance on the selection of appropriately related anchors is lacking. Methods: A simulation was conducted which varied sample size, change score variability and anchor correlation strength to assess the impact of these variables on recovering the true simulated MCT at both the individual and group-level. At the individual-level, Receiver Operating Characteristic (ROC) curves and Predictive Modelling (PM) anchor analyses were conducted. At the group-level, group means of the ‘not-improved’ and ‘improved’ groups were compared. Results: Sample sizes, change score variability and magnitude of anchor correlation affected accuracy of the estimated MCT. At the individual-level, ROC curves were less accurate than PM methods at recovering the true MCT. For both methods, smaller samples led to higher variability in the returned MCT, but higher variability still using ROC. Anchors with weaker correlations with COA change scores had increased variability in the estimated MCT. An anchor correlation of 0.50-0.60 identified a true MCT cut-point under certain conditions using ROC. However, anchor correlations as low as 0.30 were appropriate when using PM under certain conditions. At the group-level, the MCT was consistently underestimated regardless of the anchor correlation. Conclusion: Findings show that the chosen method, sample size and variability in change scores influence the necessary anchor correlation strength when identifying a true individual-level MCT. Often, this needs to be higher than the commonly accepted threshold of 0.30. Stronger correlations than 0.30 are required at the group-level, but a specific recommendation is not provided. Results can be used to assist researchers selecting and assessing the quality of anchors.


2019 ◽  
Author(s):  
Alec P. Christie ◽  
Tatsuya Amano ◽  
Philip A. Martin ◽  
Gorm E. Shackelford ◽  
Benno I. Simmons ◽  
...  

AbstractEcologists use a wide range of study designs to estimate the impact of interventions or threats but there are no quantitative comparisons of their accuracy. For example, while it is accepted that simpler designs, such as After (sampling sites post-impact without a control), Before-After (BA) and Control-Impact (CI), are less robust than Randomised Controlled Trials (RCT) and Before-After Control-Impact (BACI) designs, it is not known how much less accurate they are.We simulate a step-change response of a population to an environmental impact using empirically-derived estimates of the major parameters. We use five ecological study designs to estimate the effect of this impact and evaluate each one by determining the percentage of simulations in which they accurately estimate the direction and magnitude of the environmental impact. We also simulate different numbers of replicates and assess several accuracy thresholds.We demonstrate that BACI designs could be 1.1-1.5 times more accurate than RCTs, 2.9-4.1 times more accurate than BA, 3.8-5.6 times more accurate than CI, and 6.8-10.8 times more accurate than After designs, when estimating to within ±30% of the true effect (depending on the sample size). We also found that increasing sample size substantially increases the accuracy of BACI designs but only increases the precision of simpler designs around a biased estimate; only by using more robust designs can accuracy increase. Modestly increasing replication of both control and impact sites also increased the accuracy of BACI designs more than substantially increasing replicates in just one of these groups.We argue that investment into using more robust designs in ecology, where possible, is extremely worthwhile given the inaccuracy of simpler designs, even when using large sample sizes. Based on our results we propose a weighting system that quantitatively ranks the accuracy of studies based on their study design and the number of replicates used. We hope these ‘accuracy weights’ enable researchers to better account for study design in evidence synthesis when assessing the reliability of a range of studies using a variety of designs.


2019 ◽  
Author(s):  
Eirini Messaritaki ◽  
Stavros I. Dimitriadis ◽  
Derek K. Jones

AbstractStructural brain networks derived from diffusion magnetic resonance imaging data have been used extensively to describe the human brain, and graph theory has allowed quantification of their network properties. Schemes used to construct the graphs that represent the structural brain networks differ in the metrics they use as edge weights and the algorithms they use to define the network topologies. In this work, twenty graph construction schemes were considered. The schemes use the number of streamlines, the fractional anisotropy, the mean diffusivity or other attributes of the tracts to define the edge weights, and either an absolute threshold or a data-driven algorithm to define the graph topology. The test-retest data of the Human Connectome Project were used to compare the reproducibility of the graphs and their various attributes (edges, topologies, graph theoretical metrics) derived through those schemes, for diffusion images acquired with three different diffusion weightings. The impact of the scheme on the statistical power of the study and on the number of participants required to detect a difference between populations or an effect of an intervention was also calculated.The reproducibility of the graphs and their attributes depended heavily on the graph construction scheme. Graph reproducibility was higher for schemes that used thresholding to define the graph topology, while data-driven schemes performed better at topology reproducibility. Additionally, schemes that used thresholding resulted in better reproducibility for local graph theoretical metrics, while data-driven schemes performed better for global metrics. Crucially, the number of participants required to detect a difference between populations or an effect of an intervention could change by a large factor depending on the scheme used, affecting the power of studies to reveal the effects of interest.


1996 ◽  
Vol 115 (5) ◽  
pp. 422-428
Author(s):  
Roy E. Shore

A number of topics are discussed related to the potential for and pitfalls in undertaking epidemiologic studies of the late effects of nasopharyngeal radium irradiation. The available evidence indicates that linear extrapolation of risk estimates from high-dose studies is a reasonable basis for estimating risk from radium exposure or other situations in which the radiation exposures were fairly low and fractionated. Epidemiologic study of populations given nasopharyngeal radium irradiation is worthwhile scientifically if several criteria can be met. It is very Important that any such study has adequate statistical power, which is a function of the doses to the organs of interest and the radiation risk coefficients for those organs, as wed as the available sample size. If the organ doses are low, a prohibitively large sample size would be required. Other problems with low-dose studies include the likelihood of false-positive results when a number of health end points are evaluated and the impact of dose uncertainties, small biases, and confounding factors that make the interpretation uncertain. Cluster studies or studies of self-selected cohorts of irradiated patients are not recommended because of the potential for severe bias with such study designs. The ability to define subgroups of the population who have heightened genetic susceptibility may become a reality in the next few years as genes conferring susceptibility to brain cancers or other head and neck tumors are identified; this scientific advance would have the potential to alter greatly the prospects and approaches of epidemiologic studies.


Sign in / Sign up

Export Citation Format

Share Document