An Empirical Comparison of Four Generalized Trust Scales: Test–Retest Reliability, Measurement Invariance, Predictive Validity, and Replicability

Sociological Methods & Research ◽

10.1177/00491241211055765 ◽

2021 ◽

pp. 004912412110557

Author(s):

Blaine G. Robbins

Keyword(s):

Measurement Invariance ◽

Predictive Validity ◽

Social Trust ◽

Self Report ◽

Generalized Trust ◽

Retest Reliability ◽

Measurement Validation ◽

Test Retest Reliability ◽

Trust Scale ◽

Reliability Measurement

The Stranger Face Trust scale (SFT) and Imaginary Stranger Trust scale (IST) are two new self-report measures of generalized trust that assess trust in strangers—both real and imaginary—across four trust domains. Prior research has established the reliability and validity of SFT and IST, but a number of measurement validation tests remain. Across three separate studies, I assess the test–retest reliability, measurement invariance, predictive validity, and replicability of SFT and IST, with the misanthropy scale (MST) and generalized social trust scale (GST) serving as benchmarks. First, tests of internal consistency, test–retest reliability, and longitudinal measurement invariance established that all four generalized trust scales were acceptably reliable, with SFT and IST yielding greater overall reliability than MST and GST. Second, tests of multiple group measurement invariance revealed that SFT and IST were equivalent across gender, race, education, and age groups, while MST and GST were non-equivalent across the same sociodemographic groups. Third, an investment game established the predictive validity of SFT and MST, with IST and GST yielding poor predictive validity. Fourth, tests of factor structure and measurement invariance indicated that all four generalized trust scales replicated across samples. The present findings bolster the validity, reliability, and measurement equivalence of SFT and IST, while illustrating the compromised validity and measurement non-equivalence of MST and GST. Implications for the measurement of generalized trust are discussed.

Download Full-text

An Empirical Comparison of Four Generalized Trust Scales: Tests-Retest Reliability, Measurement Invariance, Predictive Validity, and Replicability

10.31235/osf.io/9ch5m ◽

2020 ◽

Author(s):

Blaine G Robbins

Keyword(s):

Measurement Invariance ◽

Predictive Validity ◽

Social Trust ◽

Self Report ◽

Generalized Trust ◽

Retest Reliability ◽

Measurement Validation ◽

Test Retest Reliability ◽

Trust Scale ◽

Reliability Measurement

The Stranger Face Trust scale (SFT) and Imaginary Stranger Trust scale (IST) are two new self-report measures of generalized trust that assess trust in strangers—both real and imaginary—across four trust domains. Prior research has established the reliability and validity of SFT and IST, but a number of measurement validation tests remain. Across three separate studies, I assess the test-retest reliability, measurement invariance, predictive validity, and replicability of SFT and IST, with the misanthropy scale (MST) and generalized social trust scale (GST) serving as benchmarks. First, tests of internal consistency, test-retest reliability, and longitudinal measurement invariance established that all four generalized trust scales were acceptably reliable, with SFT and IST yielding greater overall reliability than MST and GST. Second, tests of multiple group measurement invariance revealed that SFT and IST were invariant across gender, race, education, and age-groups, while MST and GST were nonequivalent across the same sociodemographic groups. Third, an investment game established the predictive validity of SFT and—to a lesser extent—MST, with IST and GST yielding poor predictive validity. Fourth, tests of factor structure and measurement invariance indicated that all four generalized trust scales replicated across samples. The present findings bolster the validity, reliability, and measurement invariance of SFT and IST, while illustrating the compromised validity and measurement invariance of MST and GST. Implications for the measurement of generalized trust are discussed.

Download Full-text

Reliable screening for neuropsychological impairment in multiple sclerosis

Multiple Sclerosis Journal ◽

10.1191/1352458504ms1098oa ◽

2004 ◽

Vol 10 (6) ◽

pp. 675-678 ◽

Cited By ~ 170

Author(s):

Ralph HB Benedict ◽

Darcy Cox ◽

Laetitia L Thompson ◽

Fred Foley ◽

Bianca Weinstock-Guttman ◽

...

Keyword(s):

Multiple Sclerosis ◽

Cognitive Impairment ◽

Bayesian Statistics ◽

Cognitive Performance ◽

Predictive Validity ◽

Self Report ◽

Retest Reliability ◽

Informant Report ◽

Test Retest Reliability ◽

Normal Controls

In an earlier study, we developed the Multiple Sclerosis Neuropsychological Screening Questionnaire (MSNQ) to assist in the screening for neuropsychological (NP) impairments. Self-report MSNQ scores correlated significantly with measures of depression, whereas informant-report MSNQ scores correlated with cognitive performance, but not depression. This study was criticized for use of a small sample and lack of data regarding normal performance and test -retest reliability. The present study was designed to replicate the earlier work with a larger sample of patients and normal controls obtained from multiple sites. We also evaluated the test -retest reliability and predictive validity of the MSNQ. The sample included 85 multiple sclerosis (MS) patients and 40 normal controls, matched on demographic variables. All participants completed the MSNQ and underwent NP testing. Thirty-four patients were re-examined at one week. Pearson and ANOVA techniques were utilized for univariate comparisons. Bayesian statistics were calculated to assess predictive validity. Patient self- and informant-report MSNQ scores differed from normal and test -retest reliability indices were high. Both self- and informant-reports were correlated with cognitive dysfunction and depression scales. Self-report MSNQ scores correlated more strongly with depression than cognitive performance, whereas the opposite pattern was observed with informant-report scores. Bayesian statistics showed that informant-report MSNQ scores predict cognitive impairment and patient self-report scores identify patients with cognitive impairment or depression. It is concluded that the MSNQ is useful, although patient self-reports may be exaggerated in depressed patients or reduced in patients with severe cognitive impairment.

Download Full-text

Valid and Reliable Measures of Generalized Trust: Evidence from a Nationally Representative Survey and Behavioral Experiment

10.31235/osf.io/cwepy ◽

2022 ◽

Author(s):

Blaine G Robbins

Keyword(s):

Convergent Validity ◽

Discriminant Validity ◽

Reliability And Validity ◽

Self Report ◽

Behavioral Experiment ◽

Generalized Trust ◽

Sociological Knowledge ◽

Measurement Validation ◽

Nationally Representative ◽

Trust Scale

Across the social sciences, generalized trust has been one of the most frequently examined constructs since researchers first introduced measures of it in the 1950s. Despite its significance, there is growing consensus that conventional measures of generalized trust are prone to measurement invalidity and non-equivalence, which places sociological knowledge about generalized trust in serious doubt. In this article, I advance trust research in sociology by (1) refining two new self-report measures of generalized trust—the Stranger Face Trust scale (SFT) and the Imaginary Stranger Trust scale (IST)—and (2) assessing their empirical performance on a nationally representative probability sample (N = 1,264). I compare the reliability and validity of SFT, IST, and traditional measures of generalized trust across a number of measurement validation tests (convergent validity, discriminant validity, known-groups validity, and predictive validity). Results suggest that SFT provides the most accurate and consistent measure of generalized trust. I conclude by discussing the implications of my findings for sociological knowledge.

Download Full-text

Test-Retest Reliability of a Self-Report Questionnaire for DSM-IV and ICD-10 Personality Disorders

European Journal of Psychological Assessment ◽

10.1027//1015-5759.16.1.53 ◽

2000 ◽

Vol 16 (1) ◽

pp. 53-58 ◽

Cited By ~ 11

Author(s):

Hans Ottosson ◽

Martin Grann ◽

Gunnar Kullgren

Keyword(s):

Personality Disorder ◽

Anxiety Disorders ◽

Personality Disorders ◽

Clinical Sample ◽

Self Report ◽

Anxiety State ◽

Short Term ◽

Retest Reliability ◽

Axis I ◽

Test Retest Reliability

Summary: Short-term stability or test-retest reliability of self-reported personality traits is likely to be biased if the respondent is affected by a depressive or anxiety state. However, in some studies, DSM-oriented self-reported instruments have proved to be reasonably stable in the short term, regardless of co-occurring depressive or anxiety disorders. In the present study, we examined the short-term test-retest reliability of a new self-report questionnaire for personality disorder diagnosis (DIP-Q) on a clinical sample of 30 individuals, having either a depressive, an anxiety, or no axis-I disorder. Test-retest scorings from subjects with depressive disorders were mostly unstable, with a significant change in fulfilled criteria between entry and retest for three out of ten personality disorders: borderline, avoidant and obsessive-compulsive personality disorder. Scorings from subjects with anxiety disorders were unstable only for cluster C and dependent personality disorder items. In the absence of co-morbid depressive or anxiety disorders, mean dimensional scores of DIP-Q showed no significant differences between entry and retest. Overall, the effect from state on trait scorings was moderate, and it is concluded that test-retest reliability for DIP-Q is acceptable.

Download Full-text

Validity, reliability, and calibration of the physical activity unit 7 item screener (PAU-7S) at population scale

International Journal of Behavioral Nutrition and Physical Activity ◽

10.1186/s12966-021-01169-w ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Helmut Schröder ◽

Isaac Subirana ◽

Julia Wärnberg ◽

María Medrano ◽

Marcela González-Gross ◽

...

Keyword(s):

Physical Activity ◽

Predictive Validity ◽

Internal Consistency ◽

Regression Models ◽

Weighted Kappa ◽

Linear Regression Models ◽

Retest Reliability ◽

Activity Unit ◽

Test Retest Reliability ◽

Acceptable Internal Consistency

Abstract Background Validation of self-reported tools, such as physical activity (PA) questionnaires, is crucial. The aim of this study was to determine test-retest reliability, internal consistency, and the concurrent, construct, and predictive validity of the short semi-quantitative Physical Activity Unit 7 item Screener (PAU-7S), using accelerometry as the reference measurement. The effect of linear calibration on PAU-7S validity was tested. Methods A randomized sample of 321 healthy children aged 8–16 years (149 boys, 172 girls) from the nationwide representative PASOS study completed the PAU-7S before and after wearing an accelerometer for at least 7 consecutive days. Weight, height, and waist circumference were measured. Cronbach alpha was calculated for internal consistency. Test-retest reliability was determined by intra-class correlation (ICC). Concurrent validity was assessed by ICC and Spearman correlation coefficient between moderate to vigorous PA (MVPA) derived by the PAU-7S and by accelerometer. Concordance between both methods was analyzed by absolute agreement, weighted kappa, and Bland-Altman statistics. Multiple linear regression models were fitted for construct validity and predictive validity was determined by leave-one-out cross-validation. Results The PAU-7S overestimated MVPA by 18%, compared to accelerometers (106.5 ± 77.0 vs 95.2 ± 33.2 min/day, respectively). A Cronbach alpha of 0.76 showed an acceptable internal consistency of the PAU-7S. Test-retest reliability was good (ICC 0.71 p < 0.001). Spearman correlation and ICC coefficients of MVPA derived by the PAU-7S and accelerometers increased from 0.31 to 0.62 and 0.20 to 0.62, respectively, after calibration of the PAU-7S. Between-methods concordance improved from a weighted kappa of 0.24 to 0.50 after calibration. A slight reduction in ICC, from 0.62 to 0.60, yielded good predictive validity. Multiple linear regression models showed an inverse association of MVPA with standardized body mass index (β − 0.162; p < 0.077) and waist to height ratio (β − 0.010; p < 0.014). All validity dimensions were somewhat stronger in boys compared to girls. Conclusion The PAU-7S shows a good test-retest reliability and acceptable internal consistency. All dimensions of validity increased from poor/fair to moderate/good after calibration. The PAU-7S is a valid instrument for measuring MVPA in children and adolescents. Trial registration Trial registration numberISRCTN34251612.

Download Full-text

Developing a measure to assess clinicians’ ability to reflect on key staff–patient dynamics in forensic settings

Journal of Forensic Practice ◽

10.1108/jfp-07-2021-0041 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Adam Polnay ◽

Helen Walker ◽

Christopher Gallacher

Keyword(s):

Reflective Practice ◽

Factor Structure ◽

Internal Consistency ◽

Self Report ◽

Face Validity ◽

Good Test ◽

Data Set ◽

Retest Reliability ◽

Content Type ◽

Test Retest Reliability

Purpose Relational dynamics between patients and staff in forensic settings can be complicated and demanding for both sides. Reflective practice groups (RPGs) bring clinicians together to reflect on these dynamics. To date, evaluation of RPGs has lacked quantitative focus and a suitable quantitative tool. Therefore, a self-report tool was designed. This paper aims to pilot The Relational Aspects of CarE (TRACE) scale with clinicians in a high-secure hospital and investigate its psychometric properties. Design/methodology/approach A multi-professional sample of 80 clinicians were recruited, completing TRACE and attitudes to personality disorder questionnaire (APDQ). Exploratory factor analysis (EFA) determined factor structure and internal consistency of TRACE. A subset was selected to measure test–retest reliability. TRACE was cross-validated against the APDQ. Findings EFA found five factors underlying the 20 TRACE items: “awareness of common responses,” “discussing and normalising feelings;” “utilising feelings,” “wish to care” and “awareness of complicated affects.” This factor structure is complex, but items clustered logically to key areas originally used to generate items. Internal consistency (α = 0.66, 95% confidence interval (CI) = 0.55–0.76) demonstrated borderline acceptability. TRACE demonstrated good test–retest reliability (intra-class correlation = 0.94, 95% CI = 0.78–0.98) and face validity. TRACE indicated a slight negative correlation with APDQ. A larger data set is needed to substantiate these preliminary findings. Practical implications Early indications suggested TRACE was valid and reliable, suitable to measure the effectiveness of reflective practice. Originality/value The TRACE was a distinctive measure that filled a methodological gap in the literature.

Download Full-text

Psychometric properties of gross motor assessment tools for children: a systematic review

BMJ Open ◽

10.1136/bmjopen-2018-021734 ◽

2018 ◽

Vol 8 (10) ◽

pp. e021734 ◽

Cited By ~ 33

Author(s):

Alison Griffiths ◽

Rachel Toovey ◽

Prue E Morgan ◽

Alicia J Spittle

Keyword(s):

Psychometric Properties ◽

Predictive Validity ◽

Methodological Quality ◽

Clinical Utility ◽

Assessment Tools ◽

Motor Assessment ◽

Gross Motor ◽

Retest Reliability ◽

Validity Test ◽

Test Retest Reliability

ObjectiveGross motor assessment tools have a critical role in identifying, diagnosing and evaluating motor difficulties in childhood. The objective of this review was to systematically evaluate the psychometric properties and clinical utility of gross motor assessment tools for children aged 2–12 years.MethodA systematic search of MEDLINE, Embase, CINAHL and AMED was performed between May and July 2017. Methodological quality was assessed with the COnsensus-based Standards for the selection of health status Measurement INstruments checklist and an outcome measures rating form was used to evaluate reliability, validity and clinical utility of assessment tools.ResultsSeven assessment tools from 37 studies/manuals met the inclusion criteria: Bayley Scale of Infant and Toddler Development-III (Bayley-III), Bruininks-Oseretsky Test of Motor Proficiency-2 (BOT-2), Movement Assessment Battery for Children-2 (MABC-2), McCarron Assessment of Neuromuscular Development (MAND), Neurological Sensory Motor Developmental Assessment (NSMDA), Peabody Developmental Motor Scales-2 (PDMS-2) and Test of Gross Motor Development-2 (TGMD-2). Methodological quality varied from poor to excellent. Validity and internal consistency varied from fair to excellent (α=0.5–0.99). The Bayley-III, NSMDA and MABC-2 have evidence of predictive validity. Test–retest reliability is excellent in the BOT-2 (intraclass correlation coefficient (ICC)=0.80–0.99), PDMS-2 (ICC=0.97), MABC-2 (ICC=0.83–0.96) and TGMD-2 (ICC=0.81–0.92). TGMD-2 has the highest inter-rater (ICC=0.88–0.93) and intrarater reliability (ICC=0.92–0.99).ConclusionsThe majority of gross motor assessments for children have good-excellent validity. Test–retest reliability is highest in the BOT-2, MABC-2, PDMS-2 and TGMD-2. The Bayley-III has the best predictive validity at 2 years of age for later motor outcome. None of the assessment tools demonstrate good evaluative validity. Further research on evaluative gross motor assessment tools are urgently needed.

Download Full-text

Validation of the NARCOMS Registry: pain assessment

Multiple Sclerosis Journal ◽

10.1191/1352458505ms1167oa ◽

2005 ◽

Vol 11 (3) ◽

pp. 338-342 ◽

Cited By ~ 28

Author(s):

Ruth Ann Marrie ◽

Gary Cutter ◽

Tuula Tyry ◽

Olympia Hadjimichael ◽

Timothy Vollmer

Keyword(s):

Multiple Sclerosis ◽

Convergent Validity ◽

Disease Duration ◽

Self Report ◽

Research Committee ◽

Retest Reliability ◽

The North ◽

Disability Status ◽

Pain Question ◽

Test Retest Reliability

The North American Research Committee on Multiple Sclerosis (NARCOMS) Registry is a multiple sclerosis (MS) self-report registry with more than 24 000 participants. Participants report disability status upon enrolment, and semi-annually using Performance Scales (PS), Patient Determined Disease Steps (PDDS) and a pain question. In November 2000 and 2001, we also collected the Pain Effects Scale (PES). Our aim was to validate the NARCOMS pain question using the PES as our criterion measure. We measured correlations between the pain question and age, disease duration, various PS subscales and PDDS to assess construct validity. We correlated pain question responses in participants who reported no change in PDSS or the PS subscales between questionnaires to determine test—retest reliability. We measured responsiveness in participants who reported a substantial change in the sensory, spasticity PS subscales. The correlation between the pain question and PES was r=0.61 in November 2000, and r=0.64 in November 2001 (both P<0.0001). Correlations between the pain question and age, and disease duration were low, indicating divergent validity. Correlations between the pain question and spasticity, sensory PS subscales and PDSS were moderate, indicating convergent validity. Test—retest reliability was r=0.84 (P<0.0001). Responsiveness was 70.7%. The pain question is a valid self-report measure of pain in MS.

Download Full-text

The Screen for Child Anxiety Related Emotional Disorders (SCARED): Informant Discrepancy, Measurement Invariance, and Test–Retest Reliability

Child Psychiatry & Human Development ◽

10.1007/s10578-018-0854-0 ◽

2018 ◽

Vol 50 (3) ◽

pp. 473-482 ◽

Cited By ~ 15

Author(s):

Brigid Behrens ◽

Caroline Swetlitz ◽

Daniel S. Pine ◽

David Pagliaccio

Keyword(s):

Measurement Invariance ◽

Emotional Disorders ◽

Child Anxiety ◽

Retest Reliability ◽

Informant Discrepancy ◽

Test Retest Reliability

Download Full-text

Psychometric properties of a self-report version of the Sexual Interest and Desire Inventory for Women (SIDI-F-SR)

10.31219/osf.io/8ghda ◽

2020 ◽

Cited By ~ 1

Author(s):

Julia Velten ◽

Gerrit Hirschfeld ◽

Milena Meyers ◽

Jürgen Margraf

Keyword(s):

Psychometric Properties ◽

Internal Consistency ◽

Sexual Interest ◽

Intraclass Correlation ◽

Self Report ◽

Clinical Psychologist ◽

Retest Reliability ◽

Absolute Agreement ◽

Test Retest Reliability ◽

Restriction Of Range

Background: The Sexual Interest and Desire Inventory Female (SIDI-F) is a clinician-administered scale that allows for a comprehensive assessment of symptoms related to Hypoactive Sexual Desire Dysfunction (HSDD). As self-report questionnaires may facilitate less socially desirable responding and as time and resources are scarce in many clinical and research settings, a self-report version was developed (SIDI-F-SR). Aim: To investigate the agreement between the SIDI-F and a self-report version (SIDI-F-SR) and assess psychometric properties of the SIDI-F-SR. Methods: A total of 170 women (Mage=36.61, SD=10.61, range=20-69) with HSDD provided data on the SIDI-F, administered by a clinical psychologist via telephone, and the SIDI-F-SR, delivered as an Internet-based questionnaire. A subset of 19 women answered the SIDI-F-SR twice over a period of 14 weeks. Outcomes: Intraclass correlation as well as predictors of absolute agreement between SIDI-F and SIDI-F-SR, as well as internal consistency, test-retest reliability, and criterion-related validity of the SIDI-F-SR were examined. Results: There was high agreement between SIDI-F and SIDI-F-SR (ICC=.86). On average, women scored about one point higher in the self-report vs. the clinician-administered scale. Agreement was higher in young women and those with severe symptoms. Internal consistency of the SIDI-F-SR was acceptable (α=.76) and comparable to the SIDI-F (α=.74). When corrections for the restriction of range were applied, internal consistency of the SIDI-F-SR increased to .91. Test-retest-reliability was good (r=.74). Criterion-related validity was low but comparable between SIDI-F and SIDI-F-SR.

Download Full-text