Hidden invalidity among fifteen commonly used measures in social and personality psychology
Flake, Pek, and Hehman (2017) recently demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We apply their recommendations for the comprehensive assessment of structural validity to a uniquely large and varied dataset (N = 144496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures (k = 15 questionnaires, 26 subscales) in social and personality psychology. When assessed using the modal practice of considering only their internal consistency, 89% of scales appeared to possess good validity. Yet, when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for median age and gender) only 4% demonstrated good validity. Furthermore, the less commonly a test is reported in the literature, the more likely it was to be failed (e.g., measurement invariance). This suggests that the pattern of under- reporting in the field may represent widespread hidden invalidity of the measures we use, and therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity. Similar to the better-known concept of p-hacking, we introduce the concept of validity hacking (v-hacking) and argue that it should be acknowledged and addressed.