Individual differences and test-retest reliability in neural and mood effects of tACS

Identifying biomarkers that predict mental states with large effect sizes and high test-retest reliability is a growing priority for fMRI research. We examined a well-established multivariate brain measure that tracks pain induced by nociceptive input, the Neurologic Pain Signature (NPS). In N = 295 participants across eight studies, NPS responses showed a very large effect size in predicting within-person single-trial pain reports (d = 1.45) and medium effect size in predicting individual differences in pain reports (d = 0.49, average r = 0.20). The NPS showed excellent short-term (within-day) test-retest reliability (ICC = 0.84, with average 69.5 trials/person). Reliability scaled with the number of trials within-person, with ≥60 trials required for excellent test-retest reliability. Reliability was comparable in two additional studies across 5-day (N = 29, ICC = 0.74, 30 trials/person) and 1-month (N = 40, ICC = 0.46, 5 trials/person) test-retest intervals. The combination of strong within-person correlations and only modest between-person correlations between the NPS and pain reports indicates that the two measures have different sources of between-person variance. The NPS is not a surrogate for individual differences in pain reports, but can serve as a reliable measure of pain-related physiology and mechanistic target for interventions.

Download Full-text

Learning from the Reliability Paradox: How Theoretically Informed Generative Models Can Advance the Social, Behavioral, and Brain Sciences

10.31234/osf.io/xr7y3 ◽

2020 ◽

Cited By ~ 4

Author(s):

Nathaniel Haines ◽

Peter D. Kvam ◽

Louis H. Irving ◽

Colin Smith ◽

Theodore P. Beauchaine ◽

...

Keyword(s):

Individual Differences ◽

Theory Development ◽

Generative Models ◽

Parameter Estimates ◽

Group Level ◽

Retest Reliability ◽

Individual Level ◽

Generative Modeling ◽

Behavioral Tasks ◽

Test Retest Reliability

Behavioral tasks (e.g., Stroop task) that produce replicable group-level effects (e.g., Stroop effect) often fail to reliably capture individual differences between participants (e.g., low test-retest reliability). This “reliability paradox” has led many researchers to conclude that most behavioral tasks cannot be used to develop and advance theories of individual differences. However, these conclusions are derived from statistical models that provide only superficial summary descriptions of behavioral data, thereby ignoring theoretically-relevant data-generating mechanisms that underly individual-level behavior. More generally, such descriptive methods lack the flexibility to test and develop increasingly complex theories of individual differences. To resolve this theory-description gap, we present generative modeling approaches, which involve using background knowledge to specify how behavior is generated at the individual level, and in turn how the distributions of individual-level mechanisms are characterized at the group level—all in a single joint model. Generative modeling shifts our focus away from estimating descriptive statistical “effects” toward estimating psychologically meaningful parameters, while simultaneously accounting for measurement error that would otherwise attenuate individual difference correlations. Using simulations and empirical data from the Implicit Association Test and Stroop, Flanker, Posner Cueing, and Delay Discounting tasks, we demonstrate how generative models yield (1) higher test-retest reliability estimates, and (2) more theoretically informative parameter estimates relative to traditional statistical approaches. Our results reclaim optimism regarding the utility of behavioral paradigms for testing and advancing theories of individual differences, and emphasize the importance of formally specifying and checking model assumptions to reduce theory-description gaps and facilitate principled theory development.

Download Full-text

Stable individual differences in strategies within, but not between, visual search tasks

Quarterly Journal of Experimental Psychology ◽

10.1177/1747021820929190 ◽

2020 ◽

pp. 174702182092919 ◽

Cited By ~ 1

Author(s):

Alasdair DF Clarke ◽

Jessica L Irons ◽

Warren James ◽

Andrew B Leber ◽

Amelia R Hunt

Keyword(s):

Individual Differences ◽

Visual Search ◽

Search Task ◽

The Other ◽

Retest Reliability ◽

Search Tasks ◽

And Performance ◽

Context Specific ◽

Test Retest Reliability ◽

Over Time

A striking range of individual differences has recently been reported in three different visual search tasks. These differences in performance can be attributed to strategy, that is, the efficiency with which participants control their search to complete the task quickly and accurately. Here, we ask whether an individual’s strategy and performance in one search task is correlated with how they perform in the other two. We tested 64 observers and found that even though the test–retest reliability of the tasks was high, an observer’s performance and strategy in one task was not predictive of their behaviour in the other two. These results suggest search strategies are stable over time, but context-specific. To understand visual search, we therefore need to account not only for differences between individuals but also how individuals interact with the search task and context.

Download Full-text

Large-scale analysis of test–retest reliabilities of self-regulation measures

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1818430116 ◽

2019 ◽

Vol 116 (12) ◽

pp. 5472-5477 ◽

Cited By ~ 88

Author(s):

A. Zeynep Enkavi ◽

Ian W. Eisenberg ◽

Patrick G. Bissett ◽

Gina L. Mazza ◽

David P. MacKinnon ◽

...

Keyword(s):

Individual Differences ◽

Large Scale ◽

Response Times ◽

Self Regulation ◽

Self Report ◽

Model Parameters ◽

Retest Reliability ◽

Large Scale Analysis ◽

Behavioral Tasks ◽

Test Retest Reliability

The ability to regulate behavior in service of long-term goals is a widely studied psychological construct known as self-regulation. This wide interest is in part due to the putative relations between self-regulation and a range of real-world behaviors. Self-regulation is generally viewed as a trait, and individual differences are quantified using a diverse set of measures, including self-report surveys and behavioral tasks. Accurate characterization of individual differences requires measurement reliability, a property frequently characterized in self-report surveys, but rarely assessed in behavioral tasks. We remedy this gap by (i) providing a comprehensive literature review on an extensive set of self-regulation measures and (ii) empirically evaluating test–retest reliability of this battery in a new sample. We find that dependent variables (DVs) from self-report surveys of self-regulation have high test–retest reliability, while DVs derived from behavioral tasks do not. This holds both in the literature and in our sample, although the test–retest reliability estimates in the literature are highly variable. We confirm that this is due to differences in between-subject variability. We also compare different types of task DVs (e.g., model parameters vs. raw response times) in their suitability as individual difference DVs, finding that certain model parameters are as stable as raw DVs. Our results provide greater psychometric footing for the study of self-regulation and provide guidance for future studies of individual differences in this domain.

Download Full-text

Reliability of the Discounting Inventory: An extension into substance-use population

Polish Psychological Bulletin ◽

10.1515/ppb-2017-0033 ◽

2017 ◽

Vol 48 (2) ◽

pp. 293-300 ◽

Cited By ~ 5

Author(s):

Marta Malesza ◽

Maria Maczuga

Keyword(s):

Substance Use ◽

Individual Differences ◽

Psychometric Properties ◽

Internal Consistency ◽

Reliability Measures ◽

Retest Reliability ◽

Social Discounting ◽

Scale Scores ◽

Test Retest Reliability

Abstract Recent research introduced the Discounting Inventory that allows the measurement of individual differences in the delay, probabilistic, effort, and social discounting rates. The goal of this investigation was to determine several aspects of the reliability of the Discounting Inventory using the responses of 385 participants (200 non-smokers and 185 current-smokers). Two types of reliability are of interest. Internal consistency and test-retest stability. A secondary aim was to extend such reliability measures beyond the non-clinical participant. The current study aimed to measure the reliability of the DI in a nicotine-dependent individuals and non-nicotine-dependent individuals. It is concluded that the internal consistency of the DI is excellent, and that the test-retest reliability results suggest that items intended to measure three types of discounting were likely testing trait, rather than state, factors, regardless of whether “non-smokers” were included in, or excluded from, the analyses (probabilistic discounting scale scores being the exception). With these cautions in mind, however, the psychometric properties of the DI appear to be very good.

Download Full-text

Integrating Task-Based Functional MRI Across Tasks Markedly Boosts Prediction and Reliability of Brain-Cognition Relationship

10.1101/2021.10.31.466638 ◽

2021 ◽

Author(s):

Alina Tetereva ◽

Jean Li ◽

Jeremiah Deng ◽

Argyris Stringaris ◽

Narun Pat

Keyword(s):

Individual Differences ◽

Functional Mri ◽

Cognitive Abilities ◽

Brain Mri ◽

Integration Theory ◽

Sources Of Information ◽

Resting State Functional Connectivity ◽

Retest Reliability ◽

Human Connectome Project ◽

Test Retest Reliability

Capturing individual differences in cognitive abilities is central to human neuroscience. Yet our ability to estimate cognitive abilities via brain MRI is still poor in both prediction and reliability. Our study tested if this inability was partly due to the over-reliance on 1) non-task MRI modalities and 2) single modalities. We directly compared predictive models comprising of different sets of MRI modalities (e.g., task vs. non-task). Using the Human Connectome Project (n=873 humans, 473 females, after exclusions), we integrated task-based functional MRI (tfMRI) across seven tasks along with other non-task MRI modalities (structural MRI, resting-state functional connectivity) via a machine-learning, stacking approach. The model integrating all modalities provided unprecedented prediction (r=.581) and excellent test-retest reliability (ICC>.75) in capturing general cognitive abilities. Importantly, comparing to the model integrating among non-task modalities (r=.367), integrating tfMRI across tasks led to significantly higher prediction (r=.544) while still providing excellent test-retest reliability (ICC>.75). The model integrating tfMRI across tasks was driven by areas in the frontoparietal network and by tasks that are cognition-related (working-memory, relational processing, and language). This result is consistent with the parieto-frontal integration theory of intelligence. Accordingly, our results sharply contradict the recently popular notion that tfMRI is not appropriate for capturing individual differences in cognition. Instead, our study suggests that tfMRI, when used appropriately (i.e., by drawing information across the whole brain and across tasks and by integrating with other modalities), provides predictive and reliable sources of information for individual differences in cognitive abilities, more so than non-task modalities.

Download Full-text

Test-retest reliability for common tasks in vision science

10.31234/osf.io/7fm2z ◽

2021 ◽

Author(s):

Kait Clark ◽

Kayley Birch-Hurst ◽

Charlotte Rebecca Pennington ◽

Austin C P Petrie ◽

Joshua Lee ◽

...

Keyword(s):

Individual Differences ◽

Undergraduate Students ◽

Visual Processing ◽

Human Cognition ◽

Cognitive Mechanisms ◽

Retest Reliability ◽

Vision Science ◽

Intraclass Correlations ◽

Low Levels ◽

Test Retest Reliability

Research in perception and attention has typically sought to evaluate cognitive mechanisms according to the average response to a manipulation. Recently, there has been a shift toward appreciating the value of individual differences and the insight gained by exploring the impacts of between-participant variation on human cognition. However, a recent study suggests that many robust, well-established cognitive control tasks suffer from surprisingly low levels of test-retest reliability (Hedge et al., 2018b). We tested a large sample of undergraduate students (n = 160) in two sessions (separated by 1–3 weeks) on four commonly used tasks in vision science. We implemented measures that spanned the range of visual processing, including motion coherence (MoCo), useful field of view (UFOV), multiple-object tracking (MOT), and visual working memory (VWM). Intraclass correlations ranged from excellent to poor suggesting that some task measures are more suitable for assessing individual differences than others. VWM capacity (ICC = 0.89), MoCo threshold (ICC = 0.60), UFOV middle accuracy (ICC = 0.60) and UFOV outer accuracy (ICC = 0.74) showed good-to-excellent reliability. Other measures, namely the maximum number of items tracked in MOT (ICC = 0.41) and UFOV number accuracy (ICC = 0.48), showed moderate reliability; the MOT threshold (ICC = 0.36) and UFOV inner accuracy (ICC = 0.30) showed poor reliability. In this paper, we present these results alongside a summary of reliabilities estimated previously for other vision science tasks. We then offer useful recommendations for evaluating test-retest reliability when considering a task for use in evaluating individual differences.

Download Full-text

Internal consistency and test-retest reliability of an affective task-switching paradigm

10.31234/osf.io/k4yn3 ◽

2020 ◽

Author(s):

Cindy Eckart ◽

Dominik Kraft ◽

Christian Fiebach

Keyword(s):

Individual Differences ◽

Response Time ◽

Internal Consistency ◽

Task Switching ◽

Error Rates ◽

Task Demands ◽

Strong Response ◽

Retest Reliability ◽

Switch Costs ◽

Test Retest Reliability

Affective flexibility refers to the flexible adaptation of behavior or thought given emotionally relevant stimuli, tasks, or contexts. Individual differences in affective flexibility have received increased interest in the past years, as they may relate to differences in the efficiency of emotion regulation and dealing with stress and adversity. One way to assess individual differences in affective flexibility is to assess the behavioral costs (in terms of increased response times or errors rates) of switching between affective and neutral tasks. However, behavioral task measures like switch costs can only be treated as trait-like individual difference characteristics if their psychometric quality has been established. To this end, we developed an affective task switching paradigm and report an analysis of the test-retest reliability (inter-session interval two weeks) as well as internal consistencies of affective switch costs. Our results show strong response time switch costs, both for switching from the neutral to the emotion task and vice versa, excellent internal consistency estimates for both measures from both sessions (Spearman-Brown corrected r = .92), and good test-retest reliabilities (ICC(2,1) of .78 and .82, respectively). Effect sizes and reliability estimates were substantially lower for switch costs calculated from error rates, which is consistent with previous literature discussing the psychometric properties of task-based cognitive measures. In conclusion, our results indicate that response time-based switch costs are well-suited as a measure of individual differences in the efficiency of affective flexibility, i.e., of dynamically adjusting behavior and thought in the context of emotional task demands.

Download Full-text

Assessing Individual Differences in Genome-Wide Gene Expression in Human Whole Blood: Reliability Over Four Hours and Stability Over 10 Months

Twin Research and Human Genetics ◽

10.1375/twin.12.4.372 ◽

2009 ◽

Vol 12 (4) ◽

pp. 372-380 ◽

Cited By ~ 4

Author(s):

Emma L. Meaburn ◽

Cathy Fernandes ◽

Ian W. Craig ◽

Robert Plomin ◽

Leonard C. Schalkwyk

Keyword(s):

Gene Expression ◽

Individual Differences ◽

Blood Collection ◽

Retest Reliability ◽

Long Term Stability ◽

Genome Wide ◽

Intraclass Correlations ◽

Average Expression Level ◽

Test Retest Reliability ◽

Probe Set

AbstractStudying the causes and correlates of natural variation in gene expression in healthy populations assumes that individual differences in gene expression can be reliably and stably assessed across time. However, this is yet to be established. We examined 4-hour test–retest reliability and 10 month test–retest stability of individual differences in gene expression in ten 12-year-old children. Blood was collected on four occasions: 10 a.m. and 2 p.m. on Day 1 and 10 months later at 10 a.m. and 2 p.m. Total RNA was hybridized to Affymetrix-U133 plus 2.0 arrays. For each probeset, the correlation across individuals between 10 a.m. and 2 p.m. on Day 1 estimates test–retest reliability. We identified 3,414 variable and abundantly expressed probesets whose 4-hour test–retest reliability exceeded .70, a conventionally accepted level of reliability, which we had 80% power to detect. Of the 3,414 reliable probesets, 1,752 were also significantly reliable 10 months later. We assessed the long-term stability of individual differences in gene expression by correlating the average expression level for each probe-set across the two 4-hour assessments on Day 1 with the average level of each probe-set across the two 4-hour assessments 10 months later. 1,291 (73.7%) of the 1,752 probe-sets that reliably detected individual differences across 4 hours on two occasions, 10 months apart, also stably detected individual differences across 10 months. Heritability, as estimated from the MZ twin intraclass correlations, is twice as high for the 1,752 reliable probesets versus all present probesets on the array (0.68 vs 0.34), and is even higher (0.76) for the 1,291 reliable probesets that are also stable across 10 months. The 1,291 probesets that reliably detect individual differences from a single peripheral blood collection and stably detect individual differences over 10 months are promising targets for research on the causes (e.g., eQTLs) and correlates (e.g., psychopathology) of individual differences in gene expression.

Download Full-text

Bias in Petrie's Alternate-Form Procedure for Kinesthetic Aftereffect

Perceptual and Motor Skills ◽

10.2466/pms.1980.51.2.543 ◽

1980 ◽

Vol 51 (2) ◽

pp. 543-548 ◽

Cited By ~ 10

Author(s):

Brian L. Mishara ◽

A. Harvey Baker

Keyword(s):

Individual Differences ◽

Inverse Relationship ◽

Practice Effects ◽

Alternative Form ◽

Single Session ◽

Retest Reliability ◽

Alternate Form ◽

Perceptual Style ◽

Alternate Forms ◽

Test Retest Reliability

Kinesthetic aftereffect, commonly administered with alternate forms given on two separate occasions, has been used to assess individual differences in personality and perceptual style. Recent criticisms have questioned whether this task gives a worthwhile measure of individual differences. This paper represents a further response to such criticisms by extending to Petrie's alternate-form procedure our argument that lack of test-retest reliability is associated with practice effects which differentially bias second-session scores. Findings indicate that second session bias does occur with this procedure, and, as expected, there was an inverse relationship between magnitude of retest reliability and differential bias. Although use of the alternative-form procedure is contraindicated, a single-session procedure remains a promising personality measure since it is not affected by differential bias.

Download Full-text