Hidden invalidity among fifteen commonly used measures in social and personality psychology

Flake, Pek, and Hehman (2017) recently demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We apply their recommendations for the comprehensive assessment of structural validity to a uniquely large and varied dataset (N = 144496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures (k = 15 questionnaires, 26 subscales) in social and personality psychology. When assessed using the modal practice of considering only their internal consistency, 89% of scales appeared to possess good validity. Yet, when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for median age and gender) only 4% demonstrated good validity. Furthermore, the less commonly a test is reported in the literature, the more likely it was to be failed (e.g., measurement invariance). This suggests that the pattern of under- reporting in the field may represent widespread hidden invalidity of the measures we use, and therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity. Similar to the better-known concept of p-hacking, we introduce the concept of validity hacking (v-hacking) and argue that it should be acknowledged and addressed.

Download Full-text

Hidden Invalidity Among 15 Commonly Used Measures in Social and Personality Psychology

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919882903 ◽

2020 ◽

Vol 3 (2) ◽

pp. 166-184 ◽

Cited By ~ 13

Author(s):

Ian Hussey ◽

Sean Hughes

Keyword(s):

Measurement Invariance ◽

Internal Consistency ◽

Degrees Of Freedom ◽

Self Report ◽

Structural Validity ◽

Data Set ◽

Personality Psychology ◽

Reliability Factor ◽

And Gender ◽

Gender Groups

It has recently been demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We comprehensively assessed structural validity in a uniquely large and varied data set ( N = 144,496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures ( k = 15 questionnaires, 26 scales) in social and personality psychology. When the scales were assessed using the modal practice of considering only internal consistency, 88% of them appeared to possess good validity. Yet when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for age and gender groups), only 4% demonstrated good validity. Furthermore, the less commonly a test was reported in the literature, the more likely the scales were to fail that test (e.g., scales failed measurement invariance much more often than internal consistency). This suggests that the pattern of underreporting in the field may represent widespread hidden invalidity of the measures used and may therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity and introduce the concept of validity hacking ( v-hacking), similar to the better-known concept of p-hacking. We argue that the practice of v-hacking should be acknowledged and addressed.

Download Full-text

Commentary_on_Hussey&Hughes(2020)

10.31234/osf.io/ew2td ◽

2020 ◽

Author(s):

Eunike Wetzel ◽

Brent Roberts

Keyword(s):

Measurement Invariance ◽

Factor Structure ◽

Self Report ◽

Measurement Properties ◽

Structural Validity ◽

Consistency Test ◽

Personality Psychology ◽

Reliability Factor ◽

Psychological Scales ◽

Test Retest Reliability

Hussey and Hughes (2020) analyzed four aspects (internal consistency, test-retest reliability, factor structure, and measurement invariance) relevant to the structural validity of psychological scales in 15 self-report questionnaires and concluded that social and personality psychology has a “hidden invalidity” problem. We argue that their argument that the field ignores structural validity (hence “hidden”) is incorrect because many published papers specifically investigate the measurement properties of instruments applied in social and personality psychology. Furthermore, we show that the models they used to test structural validity do not match the construct space for many of the measures. Lastly, we argue that their conclusion that measures are invalid based on a pass/fail decision for measurement invariance is overly simplistic. Rather, partial measurement invariance and the effect size of the noninvariance should be considered. Moving forward, we think it would be important for all researchers to more actively engage with prior measurement research, know the limits of existing measures, and invest in a deeper examination of the psychometric properties of their own measures in each of their studies.

Download Full-text

Cross-cultural adaptation, reliability and validity of the Fremantle Knee Awareness Questionnaire in Italian subjects with painful knee osteoarthritis

Health and Quality of Life Outcomes ◽

10.1186/s12955-021-01754-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Marco Monticone ◽

Cristiano Sconza ◽

Igor Portoghese ◽

Tomohiko Nishigami ◽

Benedict M. Wand ◽

...

Keyword(s):

Construct Validity ◽

Knee Osteoarthritis ◽

Pain Intensity ◽

Internal Consistency ◽

Structural Validity ◽

Retest Reliability ◽

Knee Oa ◽

Pain Catastrophising ◽

Test Retest Reliability ◽

Painful Knee

Abstract Background and aim Growing attention is being given to utilising physical function measures to better understand and manage knee osteoarthritis (OA). The Fremantle Knee Awareness Questionnaire (FreKAQ), a self-reported measure of body-perception specific to the knee, has never been validated in Italian patients. The aims of this study were to culturally adapt and validate the Italian version of the FreKAQ (FreKAQ-I), to allow for its use with Italian-speaking patients with painful knee OA. Methods The FreKAQ-I was developed by means of forward–backward translation, a final review by an expert committee and a test of the pre-final version to evaluate its comprehensibility. The psychometric testing included: internal structural validity by Rasch analysis; construct validity by assessing hypotheses of FreKAQ correlations with the knee injury and osteoarthritis outcome score (KOOS), a pain intensity numerical rating scale (PI-NRS), the pain catastrophising scale (PCS), and the Hospital anxiety and depression score (HADS) (Pearson’s correlations); known-group validity by evaluating the ability of FreKAQ scores to discriminate between two groups of participants with different clinical profiles (Mann–Whitney U test); reliability by internal consistency (Cronbach’s alpha) and test–retest reliability (intraclass correlation coefficient, ICC2.1); and measurement error by calculating the minimum detectable change (MDC). Results It took one month to develop a consensus-based version of the FreKAQ-I. The questionnaire was administered to 102 subjects with painful knee OA and was well accepted. Internal structural validity confirmed the substantial unidimensionality of the FreKAQ-I: variance explained was 53.3%, the unexplained variance in the first contrast showed an eigenvalue of 1.8, and no local dependence was detected. Construct validity was good as all of the hypotheses were met; correlations: KOOS (rho = 0.38–0.51), PI-NRS (rho = 0.35–0.37), PCS (rho = 0.47) and HADS (Anxiety rho = 0.36; Depression rho = 0.43). Regarding known-groups validity, FreKAQ scores were significantly different between groups of participants demonstrating high and low levels of pain intensity, pain catastrophising, anxiety, depression and the four KOOS subscales (p ≤ 0.004). Internal consistency was acceptable (α = 0.74) and test–retest reliability was excellent (ICC = 0.92, CI 0.87–0.94). The MDC95 was 5.22 scale points. Conclusion The FreKAQ-I is unidimensional, reliable and valid in Italian patients with painful knee OA. Its use is recommended for clinical and research purposes.

Download Full-text

Developing a measure to assess clinicians’ ability to reflect on key staff–patient dynamics in forensic settings

Journal of Forensic Practice ◽

10.1108/jfp-07-2021-0041 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Adam Polnay ◽

Helen Walker ◽

Christopher Gallacher

Keyword(s):

Reflective Practice ◽

Factor Structure ◽

Internal Consistency ◽

Self Report ◽

Face Validity ◽

Good Test ◽

Data Set ◽

Retest Reliability ◽

Content Type ◽

Test Retest Reliability

Purpose Relational dynamics between patients and staff in forensic settings can be complicated and demanding for both sides. Reflective practice groups (RPGs) bring clinicians together to reflect on these dynamics. To date, evaluation of RPGs has lacked quantitative focus and a suitable quantitative tool. Therefore, a self-report tool was designed. This paper aims to pilot The Relational Aspects of CarE (TRACE) scale with clinicians in a high-secure hospital and investigate its psychometric properties. Design/methodology/approach A multi-professional sample of 80 clinicians were recruited, completing TRACE and attitudes to personality disorder questionnaire (APDQ). Exploratory factor analysis (EFA) determined factor structure and internal consistency of TRACE. A subset was selected to measure test–retest reliability. TRACE was cross-validated against the APDQ. Findings EFA found five factors underlying the 20 TRACE items: “awareness of common responses,” “discussing and normalising feelings;” “utilising feelings,” “wish to care” and “awareness of complicated affects.” This factor structure is complex, but items clustered logically to key areas originally used to generate items. Internal consistency (α = 0.66, 95% confidence interval (CI) = 0.55–0.76) demonstrated borderline acceptability. TRACE demonstrated good test–retest reliability (intra-class correlation = 0.94, 95% CI = 0.78–0.98) and face validity. TRACE indicated a slight negative correlation with APDQ. A larger data set is needed to substantiate these preliminary findings. Practical implications Early indications suggested TRACE was valid and reliable, suitable to measure the effectiveness of reflective practice. Originality/value The TRACE was a distinctive measure that filled a methodological gap in the literature.

Download Full-text

Psychometric properties of a self-report version of the Sexual Interest and Desire Inventory for Women (SIDI-F-SR)

10.31219/osf.io/8ghda ◽

2020 ◽

Cited By ~ 1

Author(s):

Julia Velten ◽

Gerrit Hirschfeld ◽

Milena Meyers ◽

Jürgen Margraf

Keyword(s):

Psychometric Properties ◽

Internal Consistency ◽

Sexual Interest ◽

Intraclass Correlation ◽

Self Report ◽

Clinical Psychologist ◽

Retest Reliability ◽

Absolute Agreement ◽

Test Retest Reliability ◽

Restriction Of Range

Background: The Sexual Interest and Desire Inventory Female (SIDI-F) is a clinician-administered scale that allows for a comprehensive assessment of symptoms related to Hypoactive Sexual Desire Dysfunction (HSDD). As self-report questionnaires may facilitate less socially desirable responding and as time and resources are scarce in many clinical and research settings, a self-report version was developed (SIDI-F-SR). Aim: To investigate the agreement between the SIDI-F and a self-report version (SIDI-F-SR) and assess psychometric properties of the SIDI-F-SR. Methods: A total of 170 women (Mage=36.61, SD=10.61, range=20-69) with HSDD provided data on the SIDI-F, administered by a clinical psychologist via telephone, and the SIDI-F-SR, delivered as an Internet-based questionnaire. A subset of 19 women answered the SIDI-F-SR twice over a period of 14 weeks. Outcomes: Intraclass correlation as well as predictors of absolute agreement between SIDI-F and SIDI-F-SR, as well as internal consistency, test-retest reliability, and criterion-related validity of the SIDI-F-SR were examined. Results: There was high agreement between SIDI-F and SIDI-F-SR (ICC=.86). On average, women scored about one point higher in the self-report vs. the clinician-administered scale. Agreement was higher in young women and those with severe symptoms. Internal consistency of the SIDI-F-SR was acceptable (α=.76) and comparable to the SIDI-F (α=.74). When corrections for the restriction of range were applied, internal consistency of the SIDI-F-SR increased to .91. Test-retest-reliability was good (r=.74). Criterion-related validity was low but comparable between SIDI-F and SIDI-F-SR.

Download Full-text

A brief, patient- and proxy-reported outcome measure in advanced illness: Validity, reliability and responsiveness of the Integrated Palliative care Outcome Scale (IPOS)

Palliative Medicine ◽

10.1177/0269216319854264 ◽

2019 ◽

Vol 33 (8) ◽

pp. 1045-1057 ◽

Cited By ~ 27

Author(s):

Fliss EM Murtagh ◽

Christina Ramsenthaler ◽

Alice Firth ◽

Esther I Groeneveld ◽

Natasha Lovell ◽

...

Keyword(s):

Palliative Care ◽

Internal Consistency ◽

Outcome Measure ◽

Self Report ◽

Advanced Illness ◽

Proxy Report ◽

Palliative Care Outcome Scale ◽

Test Retest Reliability ◽

Care Outcome ◽

Responsiveness To Change

Background: Few measures capture the complex symptoms and concerns of those receiving palliative care. Aim: To validate the Integrated Palliative care Outcome Scale, a measure underpinned by extensive psychometric development, by evaluating its validity, reliability and responsiveness to change. Design: Concurrent, cross-cultural validation study of the Integrated Palliative care Outcome Scale – both (1) patient self-report and (2) staff proxy-report versions. We tested construct validity (factor analysis, known-group comparisons, and correlational analysis), reliability (internal consistency, agreement, and test–retest reliability), and responsiveness (through longitudinal evaluation of change). Setting/participants: In all, 376 adults receiving palliative care, and 161 clinicians, from a range of settings in the United Kingdom and Germany Results: We confirm a three-factor structure (Physical Symptoms, Emotional Symptoms and Communication/Practical Issues). Integrated Palliative care Outcome Scale shows strong ability to distinguish between clinically relevant groups; total Integrated Palliative care Outcome Scale and Integrated Palliative care Outcome Scale subscale scores were higher – reflecting more problems – in those patients with ‘unstable’ or ‘deteriorating’ versus ‘stable’ Phase of Illness (F = 15.1, p < 0.001). Good convergent and discriminant validity to hypothesised items and subscales of the Edmonton Symptom Assessment System and Functional Assessment of Cancer Therapy–General is demonstrated. The Integrated Palliative care Outcome Scale shows good internal consistency (α = 0.77) and acceptable to good test–retest reliability (60% of items kw > 0.60). Longitudinal validity in form of responsiveness to change is good. Conclusion: The Integrated Palliative care Outcome Scale is a valid and reliable outcome measure, both in patient self-report and staff proxy-report versions. It can assess and monitor symptoms and concerns in advanced illness, determine the impact of healthcare interventions, and demonstrate quality of care. This represents a major step forward internationally for palliative care outcome measurement.

Download Full-text

Test-Retest Reliability of the Holyoake Codependency Index with Australian Students

Psychological Reports ◽

10.2466/pr0.94.2.482-484 ◽

2004 ◽

Vol 94 (2) ◽

pp. 482-484 ◽

Cited By ~ 2

Author(s):

Greg E. Dear

Keyword(s):

Construct Validity ◽

Internal Consistency ◽

Internal Validity ◽

Self Report ◽

Full Scale Test ◽

Retest Reliability ◽

External Focus ◽

Scale Test ◽

Test Retest Reliability ◽

Report Measure

The Holyoake Codependency Index is a 13-item self-report measure of three aspects of codependency: External Focus, Self-sacrifice, and a sense of being overwhelmed by another person's problematic behavior (termed Reactivity). Previous studies have supported internal validity and the internal consistency and construct validity of the subscales. The present scores for 59 students indicate full scale test-retest reliability of .88 and for subscales (.76 to .82) over a 3-wk. interval.

Download Full-text

Validation of the Strengths and Difficulties Self-Report in Norwegian Sign Language

The Journal of Deaf Studies and Deaf Education ◽

10.1093/deafed/enz026 ◽

2019 ◽

Vol 25 (1) ◽

pp. 91-104 ◽

Cited By ~ 2

Author(s):

Chris Margaret Aanondsen ◽

Thomas Jozefiak ◽

Kerstin Heiling ◽

Tormod Rimehaug

Keyword(s):

Mental Health ◽

Sign Language ◽

Internal Consistency ◽

Mental Health Problems ◽

Health Problems ◽

Self Report ◽

Equation Modeling ◽

Retest Reliability ◽

Strengths And Difficulties ◽

Test Retest Reliability

Abstract The majority of studies on mental health in deaf and hard-of-hearing (DHH) children report a higher level of mental health problems. Inconsistencies in reports of prevalence of mental health problems have been found to be related to a number of factors such as language skills, cognitive ability, heterogeneous samples as well as validity problems caused by using written measures designed for typically hearing children. This study evaluates the psychometric properties of the self-report version of the Strengths and Difficulties Questionnaire (SDQ) in Norwegian Sign Language (NSL; SDQ-NSL) and in written Norwegian (SDQ-NOR). Forty-nine DHH children completed the SDQ-NSL as well as the SDQ-NOR in randomized order and their parents completed the parent version of the SDQ-NOR and a questionnaire on hearing and language-related information. Internal consistency was examined using Dillon–Goldstein’s rho, test–retest reliability using intraclass correlations, construct validity by confirmatory factor analysis (CFA), and partial least squares structural equation modeling. Internal consistency and test–retest reliability were established as acceptable to good. CFA resulted in a best fit for the proposed five-factor model for both versions, although not all fit indices reached acceptable levels. The reliability and validity of the SDQ-NSL seem promising even though the validation was based on a small sample size.

Download Full-text

Cross-cultural adaptation and psychometric validation of the Persian version of the Cardiac Rehabilitation Barriers Scale (CRBS-P)

BMJ Open ◽

10.1136/bmjopen-2019-034552 ◽

2020 ◽

Vol 10 (6) ◽

pp. e034552

Author(s):

Mahdieh Ghanbari-Firoozabadi ◽

Masoud Mirzaei ◽

Mohammadreza Vafaii Nasab ◽

Sherry L Grace ◽

Hassan Okati-Aliabad ◽

...

Keyword(s):

Factor Analysis ◽

Cardiac Rehabilitation ◽

Internal Consistency ◽

Goodness Of Fit ◽

Cross Cultural ◽

Psychometric Validation ◽

Structural Validity ◽

Retest Reliability ◽

Persian Version ◽

Test Retest Reliability

ObjectivesThis study aimed to translate, cross-culturally adapt and psychometrically validate a Persian version of the Cardiac Rehabilitation Barriers Scale (CRBS-P) and to identify the main barriers in an Iranian setting.SettingAfshar cardiac rehabilitation (CR) centre, affiliated with the Yazd University of Medical Sciences, in the centre of Iran.DesignThis was a multimethod study, culminating in a cross-sectional survey.ParticipantsInpatient CR graduates who did not attend their initial outpatient CR appointment.MethodThe 21-item CRBS was translated and cross-culturally adapted in accordance with best practices; an expert panel considered the items and previous non-attending patients were interviewed via phone to refine the scale. Next, structural validity was assessed; participants were invited to complete the CRBS on the phone between March 2017 and February 2018. Using exploratory factor analysis (EFA) with principal component analysis extraction and oblique rotation. Second, confirmatory factor analysis (CFA) was used to verify the results; several goodness-of-fit indices were considered. The internal consistency and 3-week test–retest reliability of the scale (5% subsample) were evaluated using Cronbach’s α and intraclass correlation (ICC), respectively.ResultsFace, content and cross-cultural validity were established by the experts and patients (n=50). One thousand and one hundred (40.7%) of the 2700 patients completed the CRBS-P. Structural validity was established by EFA (Bartlett’s test p<0.001; =0.759) and confirmed by the CFA; a four-factor solution with 18 items accounting for 61.256% of variance had the best fit (χ2/df=3.206, root mean square error of approximation=0.061 and Comparative Fit Index=0.959). The internal consistency and test–retest reliability (n=42) of the scale were acceptable (ICC=0.743 95% CI (0.502 to 0.868); overall α=0.797). The top barriers were not knowing about CR, cost and lack of encouragement from physicians.ConclusionThe four-factor, 18-item CRBS-P had good psychometric properties, and hence can be reliably and validly used to measure CR barriers in Iran and other Persian-speaking populations.

Download Full-text