A Test-Retest Reliability Generalization Meta-Analysis of Judgments Via the Policy-Capturing Technique

Policy capturing is a widely used technique, but the temporal stability of policy-capturing judgments has long been a cause for concern. This article emphasizes the importance of reporting reliability, and in particular test-retest reliability, estimates in policy-capturing studies. We found that only 164 of 955 policy-capturing studies (i.e., 17.17%) reported a test-retest reliability estimate. We then conducted a reliability generalization meta-analysis on policy-capturing studies that did report test-retest reliability estimates—and we obtained an average reliability estimate of .78. We additionally examined 16 potential methodological and substantive antecedents to test-retest reliability (equivalent to moderators in validity generalization studies). We found that test-retest reliability was robust to variation in 14 of the 16 factors examined but that reliability was higher in paper-and-pencil studies than in web-based studies and was higher for behavioral intention judgments than for other (e.g., attitudinal and perceptual) judgments. We provide an agenda for future research. Finally, we provide several best-practice recommendations for researchers (and journal reviewers) with regard to (a) reporting test-retest reliability, (b) designing policy-capturing studies for appropriate reportage, and (c) properly interpreting test-retest reliability in policy-capturing studies.

Download Full-text

The Fear of COVID-19 Scale: A Reliability Generalization Meta-Analysis

Assessment ◽

10.1177/1073191121994164 ◽

2021 ◽

pp. 107319112199416

Author(s):

Desirée Blázquez-Rincón ◽

Juan I. Durán ◽

Juan Botella

Keyword(s):

Predictive Model ◽

Meta Analysis ◽

Classical Test Theory ◽

Measurement Model ◽

Rasch Measurement ◽

Test Theory ◽

Reliability Estimate ◽

Reliability Generalization ◽

Reliability Estimates ◽

Total Variability

A reliability generalization meta-analysis was carried out to estimate the average reliability of the seven-item, 5-point Likert-type Fear of COVID-19 Scale (FCV-19S), one of the most widespread scales developed around the COVID-19 pandemic. Different reliability coefficients from classical test theory and the Rasch Measurement Model were meta-analyzed, heterogeneity among the most reported reliability estimates was examined by searching for moderators, and a predictive model to estimate the expected reliability was proposed. At least one reliability estimate was available for a total of 44 independent samples out of 42 studies, being that Cronbach’s alpha was most frequently reported. The coefficients exhibited pooled estimates ranging from .85 to .90. The moderator analyses led to a predictive model in which the standard deviation of scores explained 36.7% of the total variability among alpha coefficients. The FCV-19S has been shown to be consistently reliable regardless of the moderator variables examined.

Download Full-text

Reliability, and Convergent and Discriminant Validity of Gaming Disorder Scales: A Meta-Analysis

Frontiers in Psychology ◽

10.3389/fpsyg.2021.764209 ◽

2021 ◽

Vol 12 ◽

Author(s):

Seowon Yoon ◽

Yeji Yang ◽

Eunbin Ro ◽

Woo-Young Ahn ◽

Jueun Kim ◽

...

Keyword(s):

Systematic Review ◽

Internet Addiction ◽

Discriminant Validity ◽

Meta Analysis ◽

Effect Sizes ◽

Gaming Disorder ◽

Reliability Generalization ◽

Retest Reliability ◽

Convergent And Discriminant Validity ◽

Test Retest Reliability

Background: An association between gaming disorder (GD) and the symptoms of common mental disorders is unraveled yet. In this preregistered study, we quantitatively synthesized reliability, convergent and discriminant validity of GD scales to examine association between GD and other constructs.Methods: Five representative GD instruments (GAS-7, AICA, IGDT-10, Lemmens IGD-9, and IGDS9-SF) were chosen based on recommendations by the previous systematic review study to conduct correlation meta-analyses and reliability generalization. A systematic literature search was conducted through Pubmed, Proquest, Embase, and Google Scholar to identify studies that reported information on either reliability or correlation with related variables. 2,124 studies were full-text assessed as of October 2020, and 184 were quantitatively synthesized. Conventional Hedges two-level meta-analytic method was utilized.Results: The result of reliability generalization reported a mean coefficient alpha of 0.86 (95% CI = 0.85–0.87) and a mean test-retest estimate of 0.86 (95% CI = 0.81–0.89). Estimated effect sizes of correlation between GD and the variables were as follows: 0.33 with depression (k = 45; number of effect sizes), 0.29 with anxiety (k = 37), 0.30 with aggression (k = 19), –0.22 with quality of life (k = 18), 0.29 with loneliness (k = 18), 0.56 with internet addiction (k = 20), and 0.40 with game playtime (k = 53), respectively. The result of moderator analyses, funnel and forest plots, and publication bias analyses were also presented.Discussion and Conclusion: All five GD instruments have good internal consistency and test-retest reliability. Relatively few studies reported the test-retest reliability. The result of correlation meta-analysis revealed that GD scores were only moderately associated with game playtime. Common psychological problems such as depression and anxiety were found to have a slightly smaller association with GD than the gaming behavior. GD scores were strongly correlated with internet addiction. Further studies should adopt a rigorous methodological procedure to unravel the bidirectional relationship between GD and other psychopathologies.Limitations: The current study did not include gray literature. The representativeness of the five tools included in the current study could be questioned. High heterogeneity is another limitation of the study.Systematic Review Registration: [https://www.crd.york.ac.uk/PROSPERO/], identifier [CRD42020219781].

Download Full-text

A Reliability Generalization Meta-Analysis of the Padua Inventory of Obsessions and Compulsions

The Spanish Journal of Psychology ◽

10.1017/sjp.2017.65 ◽

2017 ◽

Vol 20 ◽

Cited By ~ 2

Author(s):

Julio Sánchez-Meca ◽

María Rubio-Aparicio ◽

Rosa María Núñez-Núñez ◽

José López-Pina ◽

Fulgencio Marín-Martínez ◽

...

Keyword(s):

Standard Deviation ◽

Literature Search ◽

Meta Analysis ◽

Reliability Generalization ◽

Obsessive Compulsive ◽

Mental Control ◽

Reliability Estimates ◽

Average Coefficient ◽

Test Retest Reliability ◽

Practical Implications

AbstractThe Padua Inventory (PI) of obsessions and compulsions is one of the most usually applied tests to assess obsessive-compulsive symptomatology in research contexts as well as for clinical and screening purposes. A reliability generalization meta-analysis was accomplished to estimate the average reliability of the PI scores and to search for characteristics of the samples and studies that can explain the variability among reliability estimates. An exhaustive literature search enabled us to select 39 studies (53 independent samples) that reported alpha and/or test-retest coefficients with the data at hand for the PI total score and subscales. An excellent average coefficient alpha was found for the PI total score (M= .935; 95%CI = .922–.949) and for Impaired Mental Control subscale (M= .911; 95%CI = .897–.924), being good for Contamination (M= .861; 95%CI = .841–.882) and Checking (M= .880; 95%CI = .856–.903), and fair for Urges and Worries (M= .783; 95%CI = .745–.822). The average test-retest reliability for PI total score was also satisfactory (M= .835; 95%CI = .782–.877). Moderator analyses showed larger coefficients alpha for larger standard deviation of the PI total scores (p= .0005;R2= .46), for adapted versions of the test (p= .002;R2= .32), and for samples composed of clinical participants (p= .066;R2= .10). The practical implications of these results are discussed as well as the need for researchers to report reliability estimates with the data at hand.

Download Full-text

Affective and cognitive components of students’ attitudes towards communication learning - validation of the Communication Skills Attitude Scale in a cohort of polish medical students

BMC Medical Education ◽

10.1186/s12909-021-02626-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Piotr Przymuszała ◽

Magdalena Cerbin-Koczorowska ◽

Patrycja Marciniak-Stępak ◽

Łucja Zielińska-Tomczak ◽

Martyna Piszczek ◽

...

Keyword(s):

Medical Students ◽

Communication Skills ◽

Behavioral Control ◽

Attitude Scale ◽

Future Research ◽

Perceived Behavioral Control ◽

Negative Attitudes ◽

Retest Reliability ◽

Cognitive Components ◽

Test Retest Reliability

Abstract Background The Communication Skills Attitude Scale (CSAS) is a recognized tool for assessment of attitudes towards communication learning. In the original version, it consists of 26 items divided on theoretical assumptions into two subscales: Positive and Negative Attitudes Scales. However, the evidence for its structure seems unsatisfactory, and a simple division into positive and negative attitudes may be insufficient to describe attitudes of medical students towards communication learning. Moreover, the existing evidence of the test-retest reliability of the CSAS seems limited. Consequently, this study aimed to provide more evidence on its psychometric properties while validating the CSAS questionnaire in a cohort of Polish medical students. Methods The CSAS was translated, adapted into Polish, and validated in a cohort of 389 Polish medical students. Statistical analysis involved, among others, parallel analysis to determine the number of factors, confirmatory factor analysis to compare the proposed model with theory-based ones, and test-retest reliability analysis. Results Conducted analysis revealed that in the examined population, the CSAS should rather consist of four than two subscales. Proposed four subscales addressed perceived outcomes of communication learning, positive and negative attitudes towards it (affective components), and factors motivating students to learn communication (a cognitive component of attitudes). Results of test-retest reliability were satisfactory for individual items and subscales. Conclusions This study presented a valid and reliable version of the Communication Skills Attitude Scale for Polish medical students and confirmed previous assumptions that CSAS may also be appropriate for assessment of affective and cognitive components of attitudes. Future research should, based on Ajzen’s Theory of Planned Behavior, make attempts to develop a tool assessing not only attitudes but also subjective norms and perceived behavioral control.

Download Full-text

The Reliability, Validity and Normative Scores of the Bene-Anthony Family Relations Test for Use With Arab Children

Frontiers in Psychology ◽

10.3389/fpsyg.2021.548493 ◽

2021 ◽

Vol 12 ◽

Author(s):

Abdrabo Soliman ◽

Abdel-Salam G. Abdel-Salam ◽

Mervat Ahmed

Keyword(s):

Family Relations ◽

Clinical Status ◽

Psychological Disorder ◽

Normative Values ◽

Future Research ◽

Retest Reliability ◽

Arab Children ◽

Starting Point ◽

Test Retest Reliability ◽

The Impact

Background: The Bene-Anthony Family Relations Test (BAFRT) is one of the most widely used measures of family dynamics seen from a child’s perspective. However, the most common issue surrounding this test is the lack of accurate normative scores for use with non-white ethnic groups. The purpose of this study was to examine the BAFRT’s reliability and validity for use with Arab children, as well as to provide normative data for this group. Methods: The BAFRT was translated into Arabic and back-translated to ensure accuracy. The test was administered to a cohort of 394 Arab children, consisting of both cognitively normal children (n = 269) and children diagnosed with a psychological disorder (n = 125), all aged 5–8 years old. Test-retest reliability was assessed using a sub-set of children and validity was tested against clinical status as well as CBCL and SDQ measures. Normative measures were calculated after examining the impact of influencing variables such as age and gender. Results: Statistical analyses showed that in our cohort of Arab children the BAFRT has good test-retest reliability, correlates well with measures of emotional and behavioral adjustment, and discriminates accurately between clinical and non-clinical children. Age, gender, and clinical status all significantly impacted upon BAFRT scores and therefore normative values are presented from our cohort when considering these variables. Conclusion: The normative scores we present will provide researchers and clinicians an appropriate reference point for the comparison of scores from Arab children and a starting point for future research into this area.

Download Full-text

Career Indecision Profile-65 Scores: Test–Retest Reliability and Measurement Equivalence in College and Noncollege Samples

Journal of Career Assessment ◽

10.1177/1069072718775692 ◽

2018 ◽

Vol 27 (3) ◽

pp. 510-526 ◽

Cited By ~ 1

Author(s):

Casey J. Zobell ◽

Margaret M. Nauta ◽

Matthew S. Hesson-McInnis

Keyword(s):

Career Counseling ◽

Measurement Equivalence ◽

Career Indecision ◽

Future Research ◽

Confirmatory Factor Analyses ◽

Student Sample ◽

Retest Reliability ◽

College Student Sample ◽

Two Samples ◽

Test Retest Reliability

The Career Indecision Profile-65 (CIP-65) is a relatively new measure of career indecision that appears to have promise for use in career counseling and research. We sought to expand the information available to those evaluating the CIP-65 for potential use by assessing its measurement equivalence in college ( N = 529) and noncollege ( N = 472) samples and its scores’ test–retest reliability in a subset of the college–student sample ( n = 107). Six-week test–retest reliability coefficients ranged from .58 (interpersonal conflicts) to .85 (choice/commitment anxiety) for the subscale scores. Confirmatory factor analyses revealed that the CIP-65’s four-factor structure fit the data well in both the college and noncollege samples. The CIP-65 scores were configurally invariant in the two samples, but we did not find support for metric invariance. We offer explanations for these findings, discuss implications for practice, and present ideas for future research.

Download Full-text

Purdue Pegboard: Test-Retest Estimates

Perceptual and Motor Skills ◽

10.2466/pms.1988.66.2.503 ◽

1988 ◽

Vol 66 (2) ◽

pp. 503-506 ◽

Cited By ~ 58

Author(s):

John R. Reddon ◽

David M. Gill ◽

Stephen E. Gauk ◽

Marita D. Maerz

Keyword(s):

Temporal Stability ◽

Retest Reliability ◽

Left Hand ◽

Right Hand ◽

The Right ◽

Test Retest Reliability ◽

Right And Left Hand ◽

Over Time

26 normal, self-reported dextral subjects (12 men, 14 women) were assessed with a Purdue Pegboard 5 times at weekly intervals to evaluate temporal stability and efficacy of lateralization with this test. There was a statistically significant increase in performance over time for men on the right- and left-hand placing subtests and for women on the assemblies subtest. For men/women the test-retest reliability over the 5 sessions averaged .63/.76 for the right-hand, .64/.79 for the left-hand, .67/.81 for both-hands, .81/.83 for assemblies, and .33/.22 for the right/left-hand ratio.

Download Full-text

Gait Asymmetry Post-Stroke: Determining Valid and Reliable Methods Using a Single Accelerometer Located on the Trunk

Sensors ◽

10.3390/s20010037 ◽

2019 ◽

Vol 20 (1) ◽

pp. 37 ◽

Cited By ~ 5

Author(s):

Christopher Buckley ◽

M. Encarna Micó-Amigo ◽

Michael Dunne-Willows ◽

Alan Godfrey ◽

Aodhán Hickey ◽

...

Keyword(s):

Concurrent Validity ◽

Intraclass Correlation ◽

Future Research ◽

Validity And Reliability ◽

Gait Symmetry ◽

Retest Reliability ◽

Post Stroke ◽

Gait Asymmetry ◽

Test Retest Reliability ◽

Cardinal Symptom

Asymmetry is a cardinal symptom of gait post-stroke that is targeted during rehabilitation. Technological developments have allowed accelerometers to be a feasible tool to provide digital gait variables. Many acceleration-derived variables are proposed to measure gait asymmetry. Despite a need for accurate calculation, no consensus exists for what is the most valid and reliable variable. Using an instrumented walkway (GaitRite) as the reference standard, this study compared the validity and reliability of multiple acceleration-derived asymmetry variables. Twenty-five post-stroke participants performed repeated walks over GaitRite whilst wearing a tri-axial accelerometer (Axivity AX3) on their lower back, on two occasions, one week apart. Harmonic ratio, autocorrelation, gait symmetry index, phase plots, acceleration, and jerk root mean square were calculated from the acceleration signals. Test–retest reliability was calculated, and concurrent validity was estimated by comparison with GaitRite. The strongest concurrent validity was obtained from step regularity from the vertical signal, which also recorded excellent test–retest reliability (Spearman’s rank correlation coefficients (rho) = 0.87 and Intraclass correlation coefficient (ICC21) = 0.98, respectively). Future research should test the responsiveness of this and other step asymmetry variables to quantify change during recovery and the effect of rehabilitative interventions for consideration as digital biomarkers to quantify gait asymmetry.

Download Full-text

Test–retest reliability and practice effects of Expanded Halstead–Reitan Neuropsychological Test Battery

Journal of the International Neuropsychological Society ◽

10.1017/s1355617799544056 ◽

1999 ◽

Vol 5 (4) ◽

pp. 346-356 ◽

Cited By ~ 266

Author(s):

SUREYYA S. DIKMEN ◽

ROBERT K. HEATON ◽

IGOR GRANT ◽

NANCY R. TEMKIN

Keyword(s):

Standard Deviation ◽

Neuropsychological Test ◽

Clinical Work ◽

Practice Effects ◽

Future Research ◽

Neuropsychological Measures ◽

Reliability Estimates ◽

Median Test ◽

Test Retest Reliability ◽

Competency Level

Test–retest reliabilities and practice effects of a broad range of neuropsychological measures were examined in 384 normal or neurologically stable adults. Median test–retest interval was 11 months (range 3–16 months). The reliability estimates for most of the measures are reasonably good, ranging from .70 to low .90s. An exception is the relatively poor reliabilities of most memory measures. For all test measures, the value on initial testing is a strong determinant of the value on the second examination. Practice effects are seen on most measures. The magnitude of the practice effects, however, varies as a function of type of measure, test–retest interval, age, and overall competency level of the participant. This study provides several types of retest information that may be useful for future research and clinical work: comparative reliabilities of the various measures, estimate of error variability associated with each administration, standard deviation of the change, and comparative magnitude of practice effects on various tests. (JINS, 1999, 5, 346–356.)

Download Full-text

Test-Retest Reliability of the Idea Inventory

Psychological Reports ◽

10.2466/pr0.1984.54.3.873 ◽

1984 ◽

Vol 54 (3) ◽

pp. 873-874 ◽

Cited By ~ 5

Author(s):

Norris D. Vestre

Keyword(s):

College Students ◽

Temporal Stability ◽

Stability Test ◽

Retest Reliability ◽

Irrational Thinking ◽

Rational Emotive ◽

Test Retest Reliability ◽

Changes Over Time ◽

Over Time ◽

Reliability Coefficients

The Idea Inventory proposed as a measure of irrational thinking as defined by rational-emotive theory, was administered to two independent samples of college students on two occasions. Sample 1 ( n = 135) provided a test-retest interval of 4 wk.; Sample 2 ( n = 114), an interval of 4 to 6 wk. Indices of temporal stability, test-retest reliability coefficients (product-moment) and group changes over time, indicated satisfactory reliability for the Idea Inventory.

Download Full-text