scholarly journals Commentary_on_Hussey&Hughes(2020)

2020 ◽  
Author(s):  
Eunike Wetzel ◽  
Brent Roberts

Hussey and Hughes (2020) analyzed four aspects (internal consistency, test-retest reliability, factor structure, and measurement invariance) relevant to the structural validity of psychological scales in 15 self-report questionnaires and concluded that social and personality psychology has a “hidden invalidity” problem. We argue that their argument that the field ignores structural validity (hence “hidden”) is incorrect because many published papers specifically investigate the measurement properties of instruments applied in social and personality psychology. Furthermore, we show that the models they used to test structural validity do not match the construct space for many of the measures. Lastly, we argue that their conclusion that measures are invalid based on a pass/fail decision for measurement invariance is overly simplistic. Rather, partial measurement invariance and the effect size of the noninvariance should be considered. Moving forward, we think it would be important for all researchers to more actively engage with prior measurement research, know the limits of existing measures, and invest in a deeper examination of the psychometric properties of their own measures in each of their studies.

Author(s):  
Ian Hussey ◽  
Sean Hughes

Flake, Pek, and Hehman (2017) recently demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We apply their recommendations for the comprehensive assessment of structural validity to a uniquely large and varied dataset (N = 144496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures (k = 15 questionnaires, 26 subscales) in social and personality psychology. When assessed using the modal practice of considering only their internal consistency, 89% of scales appeared to possess good validity. Yet, when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for median age and gender) only 4% demonstrated good validity. Furthermore, the less commonly a test is reported in the literature, the more likely it was to be failed (e.g., measurement invariance). This suggests that the pattern of under- reporting in the field may represent widespread hidden invalidity of the measures we use, and therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity. Similar to the better-known concept of p-hacking, we introduce the concept of validity hacking (v-hacking) and argue that it should be acknowledged and addressed.


2020 ◽  
Vol 3 (2) ◽  
pp. 166-184 ◽  
Author(s):  
Ian Hussey ◽  
Sean Hughes

It has recently been demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We comprehensively assessed structural validity in a uniquely large and varied data set ( N = 144,496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures ( k = 15 questionnaires, 26 scales) in social and personality psychology. When the scales were assessed using the modal practice of considering only internal consistency, 88% of them appeared to possess good validity. Yet when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for age and gender groups), only 4% demonstrated good validity. Furthermore, the less commonly a test was reported in the literature, the more likely the scales were to fail that test (e.g., scales failed measurement invariance much more often than internal consistency). This suggests that the pattern of underreporting in the field may represent widespread hidden invalidity of the measures used and may therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity and introduce the concept of validity hacking ( v-hacking), similar to the better-known concept of p-hacking. We argue that the practice of v-hacking should be acknowledged and addressed.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Adam Polnay ◽  
Helen Walker ◽  
Christopher Gallacher

Purpose Relational dynamics between patients and staff in forensic settings can be complicated and demanding for both sides. Reflective practice groups (RPGs) bring clinicians together to reflect on these dynamics. To date, evaluation of RPGs has lacked quantitative focus and a suitable quantitative tool. Therefore, a self-report tool was designed. This paper aims to pilot The Relational Aspects of CarE (TRACE) scale with clinicians in a high-secure hospital and investigate its psychometric properties. Design/methodology/approach A multi-professional sample of 80 clinicians were recruited, completing TRACE and attitudes to personality disorder questionnaire (APDQ). Exploratory factor analysis (EFA) determined factor structure and internal consistency of TRACE. A subset was selected to measure test–retest reliability. TRACE was cross-validated against the APDQ. Findings EFA found five factors underlying the 20 TRACE items: “awareness of common responses,” “discussing and normalising feelings;” “utilising feelings,” “wish to care” and “awareness of complicated affects.” This factor structure is complex, but items clustered logically to key areas originally used to generate items. Internal consistency (α = 0.66, 95% confidence interval (CI) = 0.55–0.76) demonstrated borderline acceptability. TRACE demonstrated good test–retest reliability (intra-class correlation = 0.94, 95% CI = 0.78–0.98) and face validity. TRACE indicated a slight negative correlation with APDQ. A larger data set is needed to substantiate these preliminary findings. Practical implications Early indications suggested TRACE was valid and reliable, suitable to measure the effectiveness of reflective practice. Originality/value The TRACE was a distinctive measure that filled a methodological gap in the literature.


2019 ◽  
Vol 38 (3) ◽  
pp. 337-349
Author(s):  
Fumio Someki ◽  
Masafumi Ohnishi ◽  
Mikael Vejdemo-Johansson ◽  
Kazuhiko Nakamura

To examine reliability, validity, factor structure, and measurement invariance (i.e., configural, metric, and scalar invariance) of the Japanese Conners’ Adult attention deficit hyperactivity disorder (ADHD) Rating Scales (CAARS), Japanese nonclinical adults ( N = 786) completed the CAARS Self-Report (CAARS-S). Each participant was also rated by one observer using the CAARS Observer Form (CAARS-O). For the test of measurement invariance, excerpts from the original (North American) CAARS norming data ( N = 500) were used. Dimensional structure was examined by exploratory and confirmatory factor analyses. Test–retest reliability, internal consistency, and concurrent validity were satisfactory. Based on the DSM-IV model and Japanese four-factor model, configural and metric invariance were established for the CAARS-S/O across Japanese and North American populations. Scalar invariance was established for the CAARS-O based on the Japanese model. The use of the Japanese CAARS for diagnostic purposes in Japan was supported; however, it should be used with caution for cross-cultural comparison research.


2020 ◽  
Vol 29 (3) ◽  
pp. 185-195
Author(s):  
Erol Esen

The My Children’s Future Scale (MCFS) measures the support provided by parents for their children’s careers. The aim of this study was to adapt the MCFS to Turkish and examine its psychometric characteristics in a study conducted in the Turkish context. Participants consisted of 280 parents (190 mothers and 90 fathers). The factor structure of the MCFS and measurement invariance across parent gender were examined. The unidimensional factor structure was confirmed and the scale was invariant across parent gender. In addition, the reliability of the MCFS was assessed for internal consistency and test-retest reliability. Cronbach’s alpha and McDonald’s omega coefficients were calculated as .87, and test-retest reliability coefficient as .83. Our findings suggested that the Turkish form of the MCFS can be considered a valid and reliable data collection tool for use in Turkey to measure the support provided by parents for their children’s careers.


2012 ◽  
Vol 92 (1) ◽  
pp. 111-123 ◽  
Author(s):  
Margreth Grotle ◽  
Andrew M. Garratt ◽  
Hanne Krogstad Jenssen ◽  
Britt Stuge

Background There is little evidence for the measurement properties of instruments commonly used for women with pelvic girdle pain. Objective The aim of this study was to examine the internal consistency, test-retest reliability, and construct validity of instruments used for women with pelvic girdle pain. Design This was a cross-sectional methodology study, including test-retest reliability assessment. Methods Women with pelvic girdle pain in pregnancy and after delivery participated in a postal survey that included the Pelvic Girdle Questionnaire (PGQ), Oswestry Disability Index (ODI), Disability Rating Index (DRI), Fear-Avoidance Beliefs Questionnaire (FABQ), Pain Catastrophizing Scale (PCS), and 8-item version of the Medical Outcomes Study 36-Item Short-Form Health Survey questionnaire (SF-36). Test-retest reliability was assessed with a random subsample 1 week later. Internal consistency was assessed with the Cronbach alpha, and test-retest reliability was assessed with the intraclass correlation coefficient (ICC) and minimal detectable change (MDC). Construct validity based on hypotheses was assessed by correlation analysis. Discriminant validity was assessed with the area under the receiver operating characteristic curve. Results All participants responded to the main (N=87) and test-retest (n=42) surveys. Cronbach alpha values ranged from .88 to .94, and ICCs ranged from .78 to .94. The MDC at the individual level constituted about 7% to 14% of total scores for the 8-item version of the SF-36, ODI, and PGQ activity subscale; about 18% to 22% for the DRI, PGQ symptom subscale, and PCS; and about 25% for the FABQ. Hypotheses were mostly confirmed by correlations between the instruments. The PGQ was the only instrument that significantly discriminated participants who were pregnant from participants who were not pregnant as well as pain locations. Limitations A comparison of responsiveness to change of the various instruments used in this study was not undertaken, but will be carried out in a future study. Conclusions Self-report instruments for assessing health showed good internal consistency, test-retest reliability, and construct validity for women with pelvic girdle pain. The PGQ was the only instrument with satisfactory discriminant validity, thus, it is recommended for evaluating symptoms and disability in patients with pelvic girdle pain.


2021 ◽  
Author(s):  
David Lacko ◽  
Tomáš Prošek ◽  
Jiří Čeněk ◽  
Michaela Helísková ◽  
Pavel Ugwitz ◽  
...  

Cognitive styles are commonly studied constructs in cognitive psychology. It can be argued that measurement of these styles in the past had significant shortcomings in validity and reliability. The theory of analytic and holistic cognitive styles followed from traditional research of cognitive styles and attempted to overcome these shortcomings. Unfortunately, the psychometric properties of its measurement methods in many cases were debatable or not reported. New statistical approaches, such as analysis of reaction times, have been reported in the recent literature but remain overlooked by current research on analytic and holistic cognitive styles. The aim of this pre-registered study was to verify the psychometric properties (i.e., factor structure, split-half reliability, test-retest reliability, discriminant validity with intelligence and personality, and divergent, concurrent and predictive validity) of several methods routinely applied in the field. We developed/adapted six methods, and selected several types frequently applied in cognitive style research: self-report questionnaires, methods based on rod-and-frame test principles, embedded figures, and methods based on hierarchical figures. The analysis was conducted on 392 Czech participants, with two data collection waves. The results indicate that the use of self-report questionnaires and methods based on the rod-and-frame principle may be unreliable, demonstrating unsatisfactory factor structure and no absence of association with intelligence. The use of embedded and hierarchical figures is recommended. Because the concurrent and divergent validity of the methods did not correspond with the original two-dimensional theory, we formulated a new three-level hierarchical model of analytic and holistic cognitive styles which better described our empirical findings.


Sign in / Sign up

Export Citation Format

Share Document