A Test-Retest Reliability Generalization Meta-Analysis of Judgments Via the Policy-Capturing Technique

2021 ◽  
pp. 109442812110115
Author(s):  
Ze Zhu ◽  
Alan J. Tomassetti ◽  
Reeshad S. Dalal ◽  
Shannon W. Schrader ◽  
Kevin Loo ◽  
...  

Policy capturing is a widely used technique, but the temporal stability of policy-capturing judgments has long been a cause for concern. This article emphasizes the importance of reporting reliability, and in particular test-retest reliability, estimates in policy-capturing studies. We found that only 164 of 955 policy-capturing studies (i.e., 17.17%) reported a test-retest reliability estimate. We then conducted a reliability generalization meta-analysis on policy-capturing studies that did report test-retest reliability estimates—and we obtained an average reliability estimate of .78. We additionally examined 16 potential methodological and substantive antecedents to test-retest reliability (equivalent to moderators in validity generalization studies). We found that test-retest reliability was robust to variation in 14 of the 16 factors examined but that reliability was higher in paper-and-pencil studies than in web-based studies and was higher for behavioral intention judgments than for other (e.g., attitudinal and perceptual) judgments. We provide an agenda for future research. Finally, we provide several best-practice recommendations for researchers (and journal reviewers) with regard to (a) reporting test-retest reliability, (b) designing policy-capturing studies for appropriate reportage, and (c) properly interpreting test-retest reliability in policy-capturing studies.

Assessment ◽  
2021 ◽  
pp. 107319112199416
Author(s):  
Desirée Blázquez-Rincón ◽  
Juan I. Durán ◽  
Juan Botella

A reliability generalization meta-analysis was carried out to estimate the average reliability of the seven-item, 5-point Likert-type Fear of COVID-19 Scale (FCV-19S), one of the most widespread scales developed around the COVID-19 pandemic. Different reliability coefficients from classical test theory and the Rasch Measurement Model were meta-analyzed, heterogeneity among the most reported reliability estimates was examined by searching for moderators, and a predictive model to estimate the expected reliability was proposed. At least one reliability estimate was available for a total of 44 independent samples out of 42 studies, being that Cronbach’s alpha was most frequently reported. The coefficients exhibited pooled estimates ranging from .85 to .90. The moderator analyses led to a predictive model in which the standard deviation of scores explained 36.7% of the total variability among alpha coefficients. The FCV-19S has been shown to be consistently reliable regardless of the moderator variables examined.


2021 ◽  
Vol 12 ◽  
Author(s):  
Seowon Yoon ◽  
Yeji Yang ◽  
Eunbin Ro ◽  
Woo-Young Ahn ◽  
Jueun Kim ◽  
...  

Background: An association between gaming disorder (GD) and the symptoms of common mental disorders is unraveled yet. In this preregistered study, we quantitatively synthesized reliability, convergent and discriminant validity of GD scales to examine association between GD and other constructs.Methods: Five representative GD instruments (GAS-7, AICA, IGDT-10, Lemmens IGD-9, and IGDS9-SF) were chosen based on recommendations by the previous systematic review study to conduct correlation meta-analyses and reliability generalization. A systematic literature search was conducted through Pubmed, Proquest, Embase, and Google Scholar to identify studies that reported information on either reliability or correlation with related variables. 2,124 studies were full-text assessed as of October 2020, and 184 were quantitatively synthesized. Conventional Hedges two-level meta-analytic method was utilized.Results: The result of reliability generalization reported a mean coefficient alpha of 0.86 (95% CI = 0.85–0.87) and a mean test-retest estimate of 0.86 (95% CI = 0.81–0.89). Estimated effect sizes of correlation between GD and the variables were as follows: 0.33 with depression (k = 45; number of effect sizes), 0.29 with anxiety (k = 37), 0.30 with aggression (k = 19), –0.22 with quality of life (k = 18), 0.29 with loneliness (k = 18), 0.56 with internet addiction (k = 20), and 0.40 with game playtime (k = 53), respectively. The result of moderator analyses, funnel and forest plots, and publication bias analyses were also presented.Discussion and Conclusion: All five GD instruments have good internal consistency and test-retest reliability. Relatively few studies reported the test-retest reliability. The result of correlation meta-analysis revealed that GD scores were only moderately associated with game playtime. Common psychological problems such as depression and anxiety were found to have a slightly smaller association with GD than the gaming behavior. GD scores were strongly correlated with internet addiction. Further studies should adopt a rigorous methodological procedure to unravel the bidirectional relationship between GD and other psychopathologies.Limitations: The current study did not include gray literature. The representativeness of the five tools included in the current study could be questioned. High heterogeneity is another limitation of the study.Systematic Review Registration: [https://www.crd.york.ac.uk/PROSPERO/], identifier [CRD42020219781].


2017 ◽  
Vol 20 ◽  
Author(s):  
Julio Sánchez-Meca ◽  
María Rubio-Aparicio ◽  
Rosa María Núñez-Núñez ◽  
José López-Pina ◽  
Fulgencio Marín-Martínez ◽  
...  

AbstractThe Padua Inventory (PI) of obsessions and compulsions is one of the most usually applied tests to assess obsessive-compulsive symptomatology in research contexts as well as for clinical and screening purposes. A reliability generalization meta-analysis was accomplished to estimate the average reliability of the PI scores and to search for characteristics of the samples and studies that can explain the variability among reliability estimates. An exhaustive literature search enabled us to select 39 studies (53 independent samples) that reported alpha and/or test-retest coefficients with the data at hand for the PI total score and subscales. An excellent average coefficient alpha was found for the PI total score (M= .935; 95%CI = .922–.949) and for Impaired Mental Control subscale (M= .911; 95%CI = .897–.924), being good for Contamination (M= .861; 95%CI = .841–.882) and Checking (M= .880; 95%CI = .856–.903), and fair for Urges and Worries (M= .783; 95%CI = .745–.822). The average test-retest reliability for PI total score was also satisfactory (M= .835; 95%CI = .782–.877). Moderator analyses showed larger coefficients alpha for larger standard deviation of the PI total scores (p= .0005;R2= .46), for adapted versions of the test (p= .002;R2= .32), and for samples composed of clinical participants (p= .066;R2= .10). The practical implications of these results are discussed as well as the need for researchers to report reliability estimates with the data at hand.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Piotr Przymuszała ◽  
Magdalena Cerbin-Koczorowska ◽  
Patrycja Marciniak-Stępak ◽  
Łucja Zielińska-Tomczak ◽  
Martyna Piszczek ◽  
...  

Abstract Background The Communication Skills Attitude Scale (CSAS) is a recognized tool for assessment of attitudes towards communication learning. In the original version, it consists of 26 items divided on theoretical assumptions into two subscales: Positive and Negative Attitudes Scales. However, the evidence for its structure seems unsatisfactory, and a simple division into positive and negative attitudes may be insufficient to describe attitudes of medical students towards communication learning. Moreover, the existing evidence of the test-retest reliability of the CSAS seems limited. Consequently, this study aimed to provide more evidence on its psychometric properties while validating the CSAS questionnaire in a cohort of Polish medical students. Methods The CSAS was translated, adapted into Polish, and validated in a cohort of 389 Polish medical students. Statistical analysis involved, among others, parallel analysis to determine the number of factors, confirmatory factor analysis to compare the proposed model with theory-based ones, and test-retest reliability analysis. Results Conducted analysis revealed that in the examined population, the CSAS should rather consist of four than two subscales. Proposed four subscales addressed perceived outcomes of communication learning, positive and negative attitudes towards it (affective components), and factors motivating students to learn communication (a cognitive component of attitudes). Results of test-retest reliability were satisfactory for individual items and subscales. Conclusions This study presented a valid and reliable version of the Communication Skills Attitude Scale for Polish medical students and confirmed previous assumptions that CSAS may also be appropriate for assessment of affective and cognitive components of attitudes. Future research should, based on Ajzen’s Theory of Planned Behavior, make attempts to develop a tool assessing not only attitudes but also subjective norms and perceived behavioral control.


2021 ◽  
Vol 12 ◽  
Author(s):  
Abdrabo Soliman ◽  
Abdel-Salam G. Abdel-Salam ◽  
Mervat Ahmed

Background: The Bene-Anthony Family Relations Test (BAFRT) is one of the most widely used measures of family dynamics seen from a child’s perspective. However, the most common issue surrounding this test is the lack of accurate normative scores for use with non-white ethnic groups. The purpose of this study was to examine the BAFRT’s reliability and validity for use with Arab children, as well as to provide normative data for this group. Methods: The BAFRT was translated into Arabic and back-translated to ensure accuracy. The test was administered to a cohort of 394 Arab children, consisting of both cognitively normal children (n = 269) and children diagnosed with a psychological disorder (n = 125), all aged 5–8 years old. Test-retest reliability was assessed using a sub-set of children and validity was tested against clinical status as well as CBCL and SDQ measures. Normative measures were calculated after examining the impact of influencing variables such as age and gender. Results: Statistical analyses showed that in our cohort of Arab children the BAFRT has good test-retest reliability, correlates well with measures of emotional and behavioral adjustment, and discriminates accurately between clinical and non-clinical children. Age, gender, and clinical status all significantly impacted upon BAFRT scores and therefore normative values are presented from our cohort when considering these variables. Conclusion: The normative scores we present will provide researchers and clinicians an appropriate reference point for the comparison of scores from Arab children and a starting point for future research into this area.


2018 ◽  
Vol 27 (3) ◽  
pp. 510-526 ◽  
Author(s):  
Casey J. Zobell ◽  
Margaret M. Nauta ◽  
Matthew S. Hesson-McInnis

The Career Indecision Profile-65 (CIP-65) is a relatively new measure of career indecision that appears to have promise for use in career counseling and research. We sought to expand the information available to those evaluating the CIP-65 for potential use by assessing its measurement equivalence in college ( N = 529) and noncollege ( N = 472) samples and its scores’ test–retest reliability in a subset of the college–student sample ( n = 107). Six-week test–retest reliability coefficients ranged from .58 (interpersonal conflicts) to .85 (choice/commitment anxiety) for the subscale scores. Confirmatory factor analyses revealed that the CIP-65’s four-factor structure fit the data well in both the college and noncollege samples. The CIP-65 scores were configurally invariant in the two samples, but we did not find support for metric invariance. We offer explanations for these findings, discuss implications for practice, and present ideas for future research.


1988 ◽  
Vol 66 (2) ◽  
pp. 503-506 ◽  
Author(s):  
John R. Reddon ◽  
David M. Gill ◽  
Stephen E. Gauk ◽  
Marita D. Maerz

26 normal, self-reported dextral subjects (12 men, 14 women) were assessed with a Purdue Pegboard 5 times at weekly intervals to evaluate temporal stability and efficacy of lateralization with this test. There was a statistically significant increase in performance over time for men on the right- and left-hand placing subtests and for women on the assemblies subtest. For men/women the test-retest reliability over the 5 sessions averaged .63/.76 for the right-hand, .64/.79 for the left-hand, .67/.81 for both-hands, .81/.83 for assemblies, and .33/.22 for the right/left-hand ratio.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 37 ◽  
Author(s):  
Christopher Buckley ◽  
M. Encarna Micó-Amigo ◽  
Michael Dunne-Willows ◽  
Alan Godfrey ◽  
Aodhán Hickey ◽  
...  

Asymmetry is a cardinal symptom of gait post-stroke that is targeted during rehabilitation. Technological developments have allowed accelerometers to be a feasible tool to provide digital gait variables. Many acceleration-derived variables are proposed to measure gait asymmetry. Despite a need for accurate calculation, no consensus exists for what is the most valid and reliable variable. Using an instrumented walkway (GaitRite) as the reference standard, this study compared the validity and reliability of multiple acceleration-derived asymmetry variables. Twenty-five post-stroke participants performed repeated walks over GaitRite whilst wearing a tri-axial accelerometer (Axivity AX3) on their lower back, on two occasions, one week apart. Harmonic ratio, autocorrelation, gait symmetry index, phase plots, acceleration, and jerk root mean square were calculated from the acceleration signals. Test–retest reliability was calculated, and concurrent validity was estimated by comparison with GaitRite. The strongest concurrent validity was obtained from step regularity from the vertical signal, which also recorded excellent test–retest reliability (Spearman’s rank correlation coefficients (rho) = 0.87 and Intraclass correlation coefficient (ICC21) = 0.98, respectively). Future research should test the responsiveness of this and other step asymmetry variables to quantify change during recovery and the effect of rehabilitative interventions for consideration as digital biomarkers to quantify gait asymmetry.


1999 ◽  
Vol 5 (4) ◽  
pp. 346-356 ◽  
Author(s):  
SUREYYA S. DIKMEN ◽  
ROBERT K. HEATON ◽  
IGOR GRANT ◽  
NANCY R. TEMKIN

Test–retest reliabilities and practice effects of a broad range of neuropsychological measures were examined in 384 normal or neurologically stable adults. Median test–retest interval was 11 months (range 3–16 months). The reliability estimates for most of the measures are reasonably good, ranging from .70 to low .90s. An exception is the relatively poor reliabilities of most memory measures. For all test measures, the value on initial testing is a strong determinant of the value on the second examination. Practice effects are seen on most measures. The magnitude of the practice effects, however, varies as a function of type of measure, test–retest interval, age, and overall competency level of the participant. This study provides several types of retest information that may be useful for future research and clinical work: comparative reliabilities of the various measures, estimate of error variability associated with each administration, standard deviation of the change, and comparative magnitude of practice effects on various tests. (JINS, 1999, 5, 346–356.)


1984 ◽  
Vol 54 (3) ◽  
pp. 873-874 ◽  
Author(s):  
Norris D. Vestre

The Idea Inventory proposed as a measure of irrational thinking as defined by rational-emotive theory, was administered to two independent samples of college students on two occasions. Sample 1 ( n = 135) provided a test-retest interval of 4 wk.; Sample 2 ( n = 114), an interval of 4 to 6 wk. Indices of temporal stability, test-retest reliability coefficients (product-moment) and group changes over time, indicated satisfactory reliability for the Idea Inventory.


Sign in / Sign up

Export Citation Format

Share Document