Measurement Practices in Large-Scale Replications: Insights from Many Labs 2

Validity of measurement is integral to the interpretability of research endeavours and any subsequent replication attempts. To assess current measurement practices and the construct validity of measures in large-scale replication studies, we conducted a systematic review of measures used in Many Labs 2: Investigating Variation in Replicability Across Samples and Settings (Klein et al., 2018). To evaluate the psychometric properties of the scales used in ManyLabs 2 we conducted factor and reliability analyses on the publicly-available data. We report that measures in Many Labs 2 were often short with little validity evidence reported in the original study, that measures with more validity evidence in the original study had stronger psychometric properties in the replication sample, and that translated versions of scales had lower reliability.We discuss the implications of these findings for interpreting replication results, and make recommendations to improve measurement practices in future replications.

Download Full-text

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

10.31234/osf.io/hs7wm ◽

2019 ◽

Cited By ~ 29

Author(s):

Jessica Kay Flake ◽

Eiko I Fried

Keyword(s):

Construct Validity ◽

Degrees Of Freedom ◽

Scientific Research ◽

Replication Studies ◽

Psychological Science ◽

Rigorous Research ◽

Statistical Conclusion ◽

Measurement Practices

In this paper, we define questionable measurement practices (QMPs) as decisions researchers make that raise doubts about the validity of the measures, and ultimately the validity of study conclusions. Doubts arise for a host of reasons including a lack of transparency, ignorance, negligence, or misrepresentation of the evidence. We describe the scope of the problem and focus on how transparency is a part of the solution. A lack of measurement transparency makes it impossible to evaluate potential threats to internal, external, statistical conclusion, and construct validity. We demonstrate that psychology is plagued by a measurement schmeasurement attitude: QMPs are common, hide a stunning source of researcher degrees of freedom, pose a serious threat to cumulative psychological science, but are largely ignored. We address these challenges by providing a set of questions that researchers and consumers of scientific research can consider to identify and avoid QMPs. Transparent answers to these measurement questions promote rigorous research, allow for thorough evaluations of a study’s inferences, and are necessary for meaningful replication studies.

Download Full-text

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920952393 ◽

2020 ◽

Vol 3 (4) ◽

pp. 456-465

Author(s):

Jessica Kay Flake ◽

Eiko I. Fried

Keyword(s):

Construct Validity ◽

Degrees Of Freedom ◽

Scientific Research ◽

Replication Studies ◽

Psychological Science ◽

Rigorous Research ◽

Statistical Conclusion ◽

Measurement Practices

In this article, we define questionable measurement practices (QMPs) as decisions researchers make that raise doubts about the validity of the measures, and ultimately the validity of study conclusions. Doubts arise for a host of reasons, including a lack of transparency, ignorance, negligence, or misrepresentation of the evidence. We describe the scope of the problem and focus on how transparency is a part of the solution. A lack of measurement transparency makes it impossible to evaluate potential threats to internal, external, statistical-conclusion, and construct validity. We demonstrate that psychology is plagued by a measurement schmeasurement attitude: QMPs are common, hide a stunning source of researcher degrees of freedom, and pose a serious threat to cumulative psychological science, but are largely ignored. We address these challenges by providing a set of questions that researchers and consumers of scientific research can consider to identify and avoid QMPs. Transparent answers to these measurement questions promote rigorous research, allow for thorough evaluations of a study’s inferences, and are necessary for meaningful replication studies.

Download Full-text

The quality of mental health literacy measurement tools evaluating the stigma of mental illness: a systematic review

Epidemiology and Psychiatric Sciences ◽

10.1017/s2045796017000178 ◽

2017 ◽

Vol 27 (5) ◽

pp. 433-462 ◽

Cited By ~ 4

Author(s):

Y. Wei ◽

P. McGrath ◽

J. Hayden ◽

S. Kutcher

Keyword(s):

Mental Health ◽

Systematic Review ◽

Mental Illness ◽

Construct Validity ◽

Psychometric Properties ◽

Internal Consistency ◽

Level Of Evidence ◽

Measurement Tools ◽

Stigma Of Mental Illness

Aims.Stigma of mental illness is a significant barrier to receiving mental health care. However, measurement tools evaluating stigma of mental illness have not been systematically assessed for their quality. We conducted a systematic review to critically appraise the methodological quality of studies assessing psychometrics of stigma measurement tools and determined the level of evidence of overall quality of psychometric properties of included tools.Methods.We searched PubMed, PsycINFO, EMBASE, CINAHL, the Cochrane Library and ERIC databases for eligible studies. We conducted risk-of-bias analysis with the Consensus-based Standards for the Selection of Health Measurement Instruments checklist, rating studies as excellent, good, fair or poor. We further rated the level of evidence of the overall quality of psychometric properties, combining the study quality and quality of each psychometric property, as: strong, moderate, limited, conflicting or unknown.Results.We identified 117 studies evaluating psychometric properties of 101 tools. The quality of specific studies varied, with ratings of: excellent (n = 5); good (mostly on internal consistency (n = 67)); fair (mostly on structural validity, n = 89 and construct validity, n = 85); and poor (mostly on internal consistency, n = 36). The overall quality of psychometric properties also varied from: strong (mostly content validity, n = 3), moderate (mostly internal consistency, n = 55), limited (mostly structural validity, n = 55 and construct validity, n = 46), conflicting (mostly test–retest reliability, n = 9) and unknown (mostly internal consistency, n = 36).Conclusions.We identified 12 tools demonstrating limited evidence or above for (+, ++, +++) all their properties, 69 tools reaching these levels of evidence for some of their properties, and 20 tools that did not meet the minimum level of evidence for all of their properties. We note that further research on stigma tool development is needed to ensure appropriate application.

Download Full-text

Psychometric Properties of the Berger HIV Stigma Scale: A Systematic Review

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182413074 ◽

2021 ◽

Vol 18 (24) ◽

pp. 13074

Author(s):

Stanley W. Wanjala ◽

Ezra K. Too ◽

Stanley Luchters ◽

Amina Abubakar

Keyword(s):

Systematic Review ◽

Psychometric Properties ◽

Grey Literature ◽

Experimental Studies ◽

Hiv Stigma ◽

Validity Evidence ◽

Stigma Scale ◽

Related Stigma ◽

Meta Analyses ◽

Test Retest Reliability

Addressing HIV-related stigma requires the use of psychometrically sound measures. However, despite the Berger HIV stigma scale (HSS) being among the most widely used measures for assessing HIV-related stigma, no study has systematically summarised its psychometric properties. This review investigated the psychometric properties of the HSS. A systematic review of articles published between 2001 and August 2021 was undertaken (CRD42020220305) following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Additionally, we searched the grey literature and screened the reference lists of the included studies. Of the total 1241 studies that were screened, 166 were included in the review, of which 24 were development and/or validation studies. The rest were observational or experimental studies. All the studies except two reported some aspect of the scale’s reliability. The reported internal consistency ranged from acceptable to excellent (Cronbach’s alpha ≥ 0.70) in 93.2% of the studies. Only eight studies reported test–retest reliability, and the reported reliability was adequate, except for one study. Only 36 studies assessed and established the HSS’s validity. The HSS appears to be a reliable and valid measure of HIV-related stigma. However, the validity evidence came from only 36 studies, most of which were conducted in North America and Europe. Consequently, more validation work is necessary for more precise insights.

Download Full-text

A systematic review of tools designed for teacher proxy-report of children’s physical literacy or constituting elements

International Journal of Behavioral Nutrition and Physical Activity ◽

10.1186/s12966-021-01162-3 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Inimfon A. Essiet ◽

Natalie J. Lander ◽

Jo Salmon ◽

Michael J. Duncan ◽

Emma L. J. Eyre ◽

...

Keyword(s):

Systematic Review ◽

Construct Validity ◽

Psychometric Properties ◽

Internal Consistency ◽

Systematic Reviews ◽

Single Domain ◽

Physical Literacy ◽

Structural Validity ◽

Proxy Report

Abstract Background Physical literacy (PL) in childhood is essential for a healthy active lifestyle, with teachers playing a critical role in guiding its development. Teachers can assist children to acquire the skills, confidence, and creativity required to perform diverse movements and physical activities. However, to detect and directly intervene on the aspects of children’s PL that are suboptimal, teachers require valid and reliable measures. This systematic review critically evaluates the psychometric properties of teacher proxy-report instruments for assessing one or more of the 30 elements within the four domains (physical, psychological, cognitive, social) of the Australian Physical Literacy Framework (APLF), in children aged 5–12 years. Secondary aims were to: examine alignment of each measure (and relevant items) with the APLF and provide recommendations for teachers in assessing PL. Methods Seven electronic databases (Academic Search Complete, CINAHL Complete, Education Source, Global Health, MEDLINE Complete, PsycINFO, and SPORTDiscus) were systematically searched originally in October 2019, with an updated search in April 2021. Eligible studies were peer-reviewed English language publications that sampled a population of children with mean age between 5 and 12 years and focused on developing and evaluating at least one psychometric property of a teacher proxy-report instrument for assessing one or more of the 30 APLF elements. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance was followed for the conduct and reporting of this review. The methodological quality of included studies and quality of psychometric properties of identified tools were evaluated using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidance. Alignment of each measure (and relevant items) with the APLF domains and 30 elements was appraised. Results Database searches generated 61,412 citations; reduced to 41 studies that evaluated the psychometric properties of 24 teacher proxy-report tools. Six tools were classified as single domain measures (i.e. assessing a single domain of the APLF), eleven as dual-domain measures, and seven as tri-domain measures. No single tool captured all four domains and 30 elements of the APLF. Tools contained items that aligned with all physical, psychological, and social elements; however, four cognitive elements were not addressed by any measure. No tool was assessed for all nine psychometric properties outlined by COSMIN. Included studies reported a median of 3 out of nine psychometric properties. Most reported psychometric properties were construct validity (n = 32; 78% of studies), structural validity (n = 26; 63% of studies), and internal consistency (n = 25; 61% of studies). There was underreporting of content validity, cross-cultural validity, measurement error, and responsiveness. Psychometric data across tools were mostly indeterminate for construct validity, structural validity, and internal consistency. Conclusions There is limited evidence to fully support the use of a specific teacher proxy-report tool in practice. Further psychometric testing and detailed reporting of methodological aspects in future validity and reliability studies is needed. Tools have been designed to assess some elements of the framework. However, no comprehensive teacher proxy-report tool exists to assess all 30 elements of the APLF, demonstrating the need for a new tool. It is our recommendation that such tools be developed and psychometrically tested. Trial registration This systematic review was registered in the PROSPERO international prospective register of systematic reviews, with registration number CRD42019130936.

Download Full-text

Addressing a Crisis of Generalizability with Large-Scale Construct Validation

10.31234/osf.io/c45t9 ◽

2021 ◽

Author(s):

Jessica Kay Flake ◽

Mairead Shaw ◽

Raymond Luong

Keyword(s):

Construct Validity ◽

Large Scale ◽

Construct Validation ◽

Replication Studies ◽

Descriptive Research ◽

Psychological Science ◽

Replication Research ◽

New Ideas ◽

Post Hoc ◽

Validation Research

Yarkoni describes a grim state of psychological science in which the gross misspecification of our models and specificity of our operationalizations produce claims with generality so narrow that no one would be interested in them. We consider this a generalizability of construct validity issue and discuss how construct validation research should precede large-scale replication research. We provide ideas for a path forward by suggesting psychologists take a few steps back. By retooling large-scale replication studies, psychologists can execute the descriptive research needed to assess the generalizability of constructs. We provide examples of reusing large-scale replication data to conduct construct validation research post hoc. We also discuss proof of concept research that is on-going at the Psychological Science Accelerator. Big team psychology makes large-scale construct validity and generalizability research feasible and worthwhile. We assert that no one needs to quit the field, in fact, there is plenty of work to do. The optimistic interpretation is that if psychologists focus less on generating new ideas and more on organizing, synthesizing, measuring, and assessing constructs from existing ideas, we can keep busy for at least 100 years.

Download Full-text

Construct Validity of Sensation Seeking: A Psychometric Investigation

Zeitschrift für Differentielle und Diagnostische Psychologie ◽

10.1024//0170-1789.20.3.155 ◽

1999 ◽

Vol 20 (3) ◽

pp. 155-171 ◽

Cited By ~ 17

Author(s):

André Beauducel ◽

Burkhard Brocke ◽

Alexander Strobel ◽

Anja Strobel

Keyword(s):

Substance Use ◽

Construct Validity ◽

Psychometric Properties ◽

Factor Structure ◽

Sensation Seeking ◽

German Version ◽

Trait Theory ◽

Multilevel Theory ◽

Sensation Seeking Scale ◽

Psychometric Investigation

Abstract: Zuckerman postulated a biopsychological multilevel theory of Sensation Seeking, which is part of a more complex multi-trait theory, the Alternative Five. The Sensation Seeking Scale Form V (SSS V) was developed for the measurement of Sensation Seeking. The process of validation of Sensation Seeking as part of a multilevel theory includes analyses of relations within and between several levels of measurement. The present study investigates validity and basic psychometric properties of a German version of the SSS V in a broader context of psychometric traits. - The 120 participants were mainly students. They completed the SSS V, the Venturesomeness- and Impulsiveness-Scales of the IVE, the BIS/BAS-Scales, the ZKPQ and the NEO-FFI. - The results reveal acceptable psychometric properties for the SSS V but with limitations with regard to factor structure. Indications for criterion validity were obtained by prediction of substance use by the subscales Dis and BS. The results of a MTMM analysis, especially the convergent validities of the SSS V were quite satisfying. On the whole, the results yielded sufficient support for the validity of the Sensation Seeking construct or the instrument respectively. They also point to desirable modifications.

Download Full-text

The Stigma of Suicide Scale

Crisis ◽

10.1027/0227-5910/a000156 ◽

2013 ◽

Vol 34 (1) ◽

pp. 13-21 ◽

Cited By ~ 82

Author(s):

Philip J. Batterham ◽

Alison L. Calear ◽

Helen Christensen

Keyword(s):

Construct Validity ◽

Psychometric Properties ◽

Factor Structure ◽

Internal Consistency ◽

Concurrent Validity ◽

Online Survey ◽

National University ◽

Stigmatizing Attitudes ◽

High Internal Consistency ◽

At Home

Background: There are presently no validated scales to adequately measure the stigma of suicide in the community. The Stigma of Suicide Scale (SOSS) is a new scale containing 58 descriptors of a “typical” person who completes suicide. Aims: To validate the SOSS as a tool for assessing stigma toward suicide, to examine the scale’s factor structure, and to assess correlates of stigmatizing attitudes. Method: In March 2010, 676 staff and students at the Australian National University completed the scale in an online survey. The construct validity of the SOSS was assessed by comparing its factors with factors extracted from the Suicide Opinion Questionnaire (SOQ). Results: Three factors were identified: stigma, isolation/depression, and glorification/normalization. Each factor had high internal consistency and strong concurrent validity with the Suicide Opinion Questionnaire. More than 25% of respondents agreed that people who suicided were “weak,” “reckless,” or “selfish.” Respondents who were female, who had a psychology degree, or who spoke only English at home were less stigmatizing. A 16-item version of the scale also demonstrated robust psychometric properties. Conclusions: The SOSS is the first attitudes scale designed to directly measure the stigma of suicide in the community. Results suggest that psychoeducation may successfully reduce stigma.

Download Full-text

Construction and Validation of a Short Adjectives Checklist to Measure Big Five (SACBIF)

European Journal of Psychological Assessment ◽

10.1027/1015-5759.12.1.33 ◽

1996 ◽

Vol 12 (1) ◽

pp. 33-42 ◽

Cited By ~ 12

Author(s):

Marco Perugini ◽

Luigi Leone

Keyword(s):

Construct Validity ◽

Psychometric Properties ◽

Big Five ◽

Factor Model ◽

Five Factor Model ◽

Selection Procedure ◽

Factorial Structure ◽

Personality Dimensions

The aim of this contribution is to present a new short adjective-based measure of the Five Factor Model (FFM) of personality, the Short Adjectives Checklist of BIg Five (SACBIF). We present the various steps of the construction and the validation of this instrument. First, 50 adjectives were selected with a selection procedure, the “Lining Up Technique” (LUT), specifically used to identify the best factorial markers of the FFM. Then, the factorial structure and the psychometric properties of the SACBIF were investigated. Finally, the SACBIF factorial structure was correlated with some main measures of the FFM to establish its construct validity and with some other personality dimensions to investigate how well these dimensions could be represented in the SACBIF factorial space.

Download Full-text

The Vocabulary and Overclaiming Test (VOC-T)

Journal of Individual Differences ◽

10.1027/1614-0001/a000093 ◽

2013 ◽

Vol 34 (1) ◽

pp. 32-40 ◽

Cited By ~ 16

Author(s):

Matthias Ziegler ◽

Christoph Kemper ◽

Beatrice Rammstedt

Keyword(s):

Construct Validity ◽

Psychometric Properties ◽

Signal Detection Theory ◽

Online Survey ◽

Vocabulary Knowledge ◽

Online Study ◽

Accuracy Index ◽

Bias Index ◽

Reliability Estimates ◽

Verbal Knowledge

The present research aimed at constructing a questionnaire measuring overclaiming tendencies (VOC-T-bias) as an indicator of self-enhancement. An approach was used which also allows estimation of a score for vocabulary knowledge, the accuracy index (VOC-T-accuracy), using signal detection theory. For construction purposes, an online study was conducted with N = 1,176 participants. The resulting questionnaire, named Vocabulary and Overclaiming – Test (VOC-T) was investigated with regard to its psychometric properties in two further studies. Study 2 used data from a population representative sample (N = 527), and Study 3 was another online survey (N = 933). Results show that reliability estimates were satisfactory for the VOC-T-bias index and the VOC-T-accuracy index. Overclaiming did not correlate with knowledge, but it was sensitive to self-enhancement supporting the construct validity of the test scores. The VOC-T-accuracy index in turn covaried with general knowledge and even more so with verbal knowledge, which also supports construct validity. Moreover, the VOC-T-accuracy index had a meaningful correlation with age in both validation studies. All in all, the psychometric properties can be regarded as sufficient to recommend the VOC-T for research purposes.

Download Full-text