scholarly journals Learning from the Reliability Paradox: How Theoretically Informed Generative Models Can Advance the Social, Behavioral, and Brain Sciences

Author(s):  
Nathaniel Haines ◽  
Peter D. Kvam ◽  
Louis H. Irving ◽  
Colin Smith ◽  
Theodore P. Beauchaine ◽  
...  

Behavioral tasks (e.g., Stroop task) that produce replicable group-level effects (e.g., Stroop effect) often fail to reliably capture individual differences between participants (e.g., low test-retest reliability). This “reliability paradox” has led many researchers to conclude that most behavioral tasks cannot be used to develop and advance theories of individual differences. However, these conclusions are derived from statistical models that provide only superficial summary descriptions of behavioral data, thereby ignoring theoretically-relevant data-generating mechanisms that underly individual-level behavior. More generally, such descriptive methods lack the flexibility to test and develop increasingly complex theories of individual differences. To resolve this theory-description gap, we present generative modeling approaches, which involve using background knowledge to specify how behavior is generated at the individual level, and in turn how the distributions of individual-level mechanisms are characterized at the group level—all in a single joint model. Generative modeling shifts our focus away from estimating descriptive statistical “effects” toward estimating psychologically meaningful parameters, while simultaneously accounting for measurement error that would otherwise attenuate individual difference correlations. Using simulations and empirical data from the Implicit Association Test and Stroop, Flanker, Posner Cueing, and Delay Discounting tasks, we demonstrate how generative models yield (1) higher test-retest reliability estimates, and (2) more theoretically informative parameter estimates relative to traditional statistical approaches. Our results reclaim optimism regarding the utility of behavioral paradigms for testing and advancing theories of individual differences, and emphasize the importance of formally specifying and checking model assumptions to reduce theory-description gaps and facilitate principled theory development.

2019 ◽  
Vol 116 (12) ◽  
pp. 5472-5477 ◽  
Author(s):  
A. Zeynep Enkavi ◽  
Ian W. Eisenberg ◽  
Patrick G. Bissett ◽  
Gina L. Mazza ◽  
David P. MacKinnon ◽  
...  

The ability to regulate behavior in service of long-term goals is a widely studied psychological construct known as self-regulation. This wide interest is in part due to the putative relations between self-regulation and a range of real-world behaviors. Self-regulation is generally viewed as a trait, and individual differences are quantified using a diverse set of measures, including self-report surveys and behavioral tasks. Accurate characterization of individual differences requires measurement reliability, a property frequently characterized in self-report surveys, but rarely assessed in behavioral tasks. We remedy this gap by (i) providing a comprehensive literature review on an extensive set of self-regulation measures and (ii) empirically evaluating test–retest reliability of this battery in a new sample. We find that dependent variables (DVs) from self-report surveys of self-regulation have high test–retest reliability, while DVs derived from behavioral tasks do not. This holds both in the literature and in our sample, although the test–retest reliability estimates in the literature are highly variable. We confirm that this is due to differences in between-subject variability. We also compare different types of task DVs (e.g., model parameters vs. raw response times) in their suitability as individual difference DVs, finding that certain model parameters are as stable as raw DVs. Our results provide greater psychometric footing for the study of self-regulation and provide guidance for future studies of individual differences in this domain.


2018 ◽  
Author(s):  
Ayse Zeynep Enkavi ◽  
Ian W Eisenberg ◽  
Patrick Bissett ◽  
Gina L. Mazza ◽  
David MacKinnon ◽  
...  

The ability to regulate behavior in service of long-term goals is a widely studied psychological construct known as self-regulation. This wide interest is in part due to the putative relations between self-regulation and a range of real-world behaviors. Self-regulation is generally viewed as a trait, and individual differences are quantified using a diverse set of measures including self-report surveys and behavioral tasks. Accurate characterization of individual differences requires measurement reliability, a property frequently characterized in self-report surveys, but rarely assessed in behavioral tasks. We remedy this gap by (1) providing a comprehensive literature review on an extensive set of self-regulation measures, and (2) empirically evaluating retest reliability in this battery of measures in a new sample. We find that self-report survey measures of self-regulation have high test-retest reliability while measures derived from behavioral tasks do not. This holds both in the literature and in our sample. We confirm that this is due to differences in between-subjects variability. We also compare different types of task measures (e.g., model parameters vs. raw response times) in their suitability as individual difference measures, finding that certain model parameters are as stable as raw measures. Our results provide greater psychometric footing for the study of self-regulation and provide guidance for future studies of individual differences in this domain.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zahra Barakchian ◽  
Anjali Raja Beharelle ◽  
Todd A. Hare

AbstractFood choice paradigms are commonly used to study decision mechanisms, individual differences, and intervention efficacy. Here, we measured behavior from twenty-three healthy young adults who completed five repetitions of a cued-attribute food choice paradigm over two weeks. This task includes cues prompting participants to explicitly consider the healthiness of the food items before making a selection, or to choose naturally based on whatever freely comes to mind. We found that the average patterns of food choices following both cue types and ratings about the palatability (i.e. taste) and healthiness of the food items were similar across all five repetitions. At the individual level, the test-retest reliability for choices in both conditions and healthiness ratings was excellent. However, test-retest reliability for taste ratings was only fair, suggesting that estimates about palatability may vary more from day to day for the same individual.


2021 ◽  
Author(s):  
Xiaochun Han ◽  
Yoni K. Ashar ◽  
Philip Kragel ◽  
Bogdan Petre ◽  
Victoria Schelkun ◽  
...  

Identifying biomarkers that predict mental states with large effect sizes and high test-retest reliability is a growing priority for fMRI research. We examined a well-established multivariate brain measure that tracks pain induced by nociceptive input, the Neurologic Pain Signature (NPS). In N = 295 participants across eight studies, NPS responses showed a very large effect size in predicting within-person single-trial pain reports (d = 1.45) and medium effect size in predicting individual differences in pain reports (d = 0.49, average r = 0.20). The NPS showed excellent short-term (within-day) test-retest reliability (ICC = 0.84, with average 69.5 trials/person). Reliability scaled with the number of trials within-person, with ≥60 trials required for excellent test-retest reliability. Reliability was comparable in two additional studies across 5-day (N = 29, ICC = 0.74, 30 trials/person) and 1-month (N = 40, ICC = 0.46, 5 trials/person) test-retest intervals. The combination of strong within-person correlations and only modest between-person correlations between the NPS and pain reports indicates that the two measures have different sources of between-person variance. The NPS is not a surrogate for individual differences in pain reports, but can serve as a reliable measure of pain-related physiology and mechanistic target for interventions.


2020 ◽  
pp. 174702182092919 ◽  
Author(s):  
Alasdair DF Clarke ◽  
Jessica L Irons ◽  
Warren James ◽  
Andrew B Leber ◽  
Amelia R Hunt

A striking range of individual differences has recently been reported in three different visual search tasks. These differences in performance can be attributed to strategy, that is, the efficiency with which participants control their search to complete the task quickly and accurately. Here, we ask whether an individual’s strategy and performance in one search task is correlated with how they perform in the other two. We tested 64 observers and found that even though the test–retest reliability of the tasks was high, an observer’s performance and strategy in one task was not predictive of their behaviour in the other two. These results suggest search strategies are stable over time, but context-specific. To understand visual search, we therefore need to account not only for differences between individuals but also how individuals interact with the search task and context.


2017 ◽  
Vol 48 (2) ◽  
pp. 293-300 ◽  
Author(s):  
Marta Malesza ◽  
Maria Maczuga

Abstract Recent research introduced the Discounting Inventory that allows the measurement of individual differences in the delay, probabilistic, effort, and social discounting rates. The goal of this investigation was to determine several aspects of the reliability of the Discounting Inventory using the responses of 385 participants (200 non-smokers and 185 current-smokers). Two types of reliability are of interest. Internal consistency and test-retest stability. A secondary aim was to extend such reliability measures beyond the non-clinical participant. The current study aimed to measure the reliability of the DI in a nicotine-dependent individuals and non-nicotine-dependent individuals. It is concluded that the internal consistency of the DI is excellent, and that the test-retest reliability results suggest that items intended to measure three types of discounting were likely testing trait, rather than state, factors, regardless of whether “non-smokers” were included in, or excluded from, the analyses (probabilistic discounting scale scores being the exception). With these cautions in mind, however, the psychometric properties of the DI appear to be very good.


2021 ◽  
Author(s):  
Zahra Barakchian ◽  
Anjali Raja Beharelle ◽  
Todd A. Hare

ABSTRACTFood choice paradigms are commonly used to study decision mechanisms, individual differences, and intervention efficacy. Here, we measured behavior from twenty-three healthy young adults who completed five repetitions of a cued-attribute food choice paradigm over two weeks. This task includes cues prompting participants to explicitly consider the healthiness of the food items before making a selection, or to choose naturally based on whatever freely comes to mind. We found that the average patterns of food choices following both cue types and ratings about the palatability (i.e. taste) and healthiness of the food items were similar across all five repetitions. At the individual level, the test-retest reliability for choices in both conditions and healthiness ratings was excellent. However, test-retest reliability for taste ratings was only fair, suggesting that estimates about palatability may vary more from day to day for the same individual.


2019 ◽  
Author(s):  
Bernhard Pastötter ◽  
Christian Frings

The forward testing effect refers to the finding that retrieval practice of previously studied information enhances learning and retention of subsequently studied other information. While most of the previous research on the forward testing effect examined group differences, the present study took an individual differences approach to investigate this effect. Experiment 1 examined whether the forward effect has test-retest reliability between two experimental sessions. Experiment 2 investigated whether the effect is related to participants’ working memory capacity. In both experiments (and each session of Experiment 1), participants studied three lists of items in anticipation of a final cumulative recall test. In the testing condition, participants were tested immediately on lists 1 and 2, whereas in the restudy condition, they restudied lists 1 and 2. In both conditions, participants were tested immediately on list 3. On the group level, the results of both experiments demonstrated a forward testing effect, with interim testing of lists 1 and 2 enhancing immediate recall of list 3. On the individual level, the results of Experiment 1 showed that the forward effect on list 3 recall has moderate test-retest reliability between two experimental sessions. In addition, the results of Experiment 2 showed that the forward effect on list 3 recall does not depend on participants’ working memory capacity. These findings suggest that the forward testing effect is reliable at the individual level and affects learners at a wide range of working memory capacities alike. The theoretical and practical implications of the findings are discussed.


2021 ◽  
Author(s):  
Alina Tetereva ◽  
Jean Li ◽  
Jeremiah Deng ◽  
Argyris Stringaris ◽  
Narun Pat

Capturing individual differences in cognitive abilities is central to human neuroscience. Yet our ability to estimate cognitive abilities via brain MRI is still poor in both prediction and reliability. Our study tested if this inability was partly due to the over-reliance on 1) non-task MRI modalities and 2) single modalities. We directly compared predictive models comprising of different sets of MRI modalities (e.g., task vs. non-task). Using the Human Connectome Project (n=873 humans, 473 females, after exclusions), we integrated task-based functional MRI (tfMRI) across seven tasks along with other non-task MRI modalities (structural MRI, resting-state functional connectivity) via a machine-learning, stacking approach. The model integrating all modalities provided unprecedented prediction (r=.581) and excellent test-retest reliability (ICC>.75) in capturing general cognitive abilities. Importantly, comparing to the model integrating among non-task modalities (r=.367), integrating tfMRI across tasks led to significantly higher prediction (r=.544) while still providing excellent test-retest reliability (ICC>.75). The model integrating tfMRI across tasks was driven by areas in the frontoparietal network and by tasks that are cognition-related (working-memory, relational processing, and language). This result is consistent with the parieto-frontal integration theory of intelligence. Accordingly, our results sharply contradict the recently popular notion that tfMRI is not appropriate for capturing individual differences in cognition. Instead, our study suggests that tfMRI, when used appropriately (i.e., by drawing information across the whole brain and across tasks and by integrating with other modalities), provides predictive and reliable sources of information for individual differences in cognitive abilities, more so than non-task modalities.


Sign in / Sign up

Export Citation Format

Share Document