An Empirical Approach to Identifying Subject Matter Experts for the Development of Situational Judgment Tests

Situational Judgment ◽

No Gold Standard ◽

Context Free ◽

Best Responses

Abstract. The development of a scoring key for the situational judgment test often requires subject matter experts (SMEs) to identify the best responses for a hypothetical situation. And yet, there is no gold standard for identifying the SMEs. This paper describes an empirical and context-free approach: the Cochran–Weiss–Shanteau (CWS) method, which does not rely on external criteria such as tenure or credential. We first describe the theory behind the empirical approach of expertise. We also outline the CWS method and provide an R script for calculating the CWS index. Next, we demonstrate how the CWS index can be used for improving interrater agreement and the efficiency of SME selection. Finally, we examined the nomological network of the CWS index. We found that the CWS index was associated with reflective thinking and intuition avoidance.

An empirical approach of identifying subject matter experts (SMEs) for the development of situational judgment tests

10.31234/osf.io/qkj8n ◽

2021 ◽

Author(s):

Don C. Zhang ◽

Yi Wang

Keyword(s):

Subject Matter ◽

Empirical Approach ◽

Subject Matter Experts ◽

Hypothetical Situation ◽

Situational Judgment ◽

No Gold Standard ◽

Context Free ◽

Best Responses

The development of a scoring key for the situational judgment test (SJT) often requires subject matter experts (SMEs) to identify the best responses for a hypothetical situation. And yet, there is no gold standard for identifying the SMEs. This paper describes an empirical and context-free approach: the Cochran-Weiss-Shanteau (CWS) method, which does not rely on external criteria such as tenure or credential. We first describe the theory behind the empirical approach of expertise. We also outline the CWS method and provide an R script for calculating the CWS-index. Next, we demonstrate how the CWS-index can be used for improving interrater agreement and the efficiency of SME selection. Finally, we examined the nomological network of the CWS index. We found the CWS index was associated with reflective thinking and intuition avoidance.

In Defense of the Situation: An Interactionist Explanation for Performance on Situational Judgment Tests

Industrial and Organizational Psychology ◽

10.1017/iop.2015.110 ◽

2016 ◽

Vol 9 (1) ◽

pp. 23-28 ◽

Cited By ~ 6

Author(s):

Alexandra M. Harris ◽

Lane E. Siedor ◽

Yi Fan ◽

Benjamin Listyg ◽

Nathan T. Carter

Keyword(s):

Domain Knowledge ◽

Strong Support ◽

Situational Judgment ◽

General Domain ◽

Situational Strength ◽

Activation Theory ◽

Trait Activation ◽

Key Pathways

Whereas Lievens and Motowidlo (2016) propose a model of situational judgment test (SJT) performance that removes the “situation” in favor of conceptualizing SJTs as a measure of general domain knowledge, we argue that the expression of general domain knowledge is in fact contingent on situational judgment. As we explain, the evidence cited by Lievens and Motowidlo against a situational component does not inherently exclude the importance of situations from SJTs and does overlook the strong support for a person–situation interaction explanation of behavior. Based on the interactionist literature—in particular, the trait activation theory (TAT) and situational strength literatures—we propose a model that both maintains the key pathways and definitions posited by Lievens and Motowidlo and integrates the situational component of SJTs.

The “Hot Mess” of Situational Judgment Test Construct Validity and Other Issues

Industrial and Organizational Psychology ◽

10.1017/iop.2015.115 ◽

2016 ◽

Vol 9 (1) ◽

pp. 47-51 ◽

Cited By ~ 15

Author(s):

Michael A. McDaniel ◽

Sheila K. List ◽

Sven Kepes

Keyword(s):

Construct Validity ◽

Situational Judgment ◽

Situational Judgment Tests

The construct validity of situational judgment tests (SJTs) is a “hot mess.” The suggestions of Lievens and Motowidlo (2016) concerning a strategy to make the constructs assessed by an SJT more “clear and explicit” (p. 5) are worthy of serious consideration. In this commentary, we highlight two challenges that will likely need to be addressed before one can develop SJTs with clear and explicit constructs. We also offer critiques of four positions presented by Lievens and Motowidlo that are not well supported by evidence.

Comparison of validities for scoring keys and scoring algorithms in situational judgment test

The Korean Journal of Industrial and Organizational Psychology ◽

10.24230/kjiop.v24i1.231-255 ◽

2011 ◽

Vol 24 (1) ◽

pp. 231-255

Author(s):

EuiSoo Kim ◽

YoungSeok Han ◽

MyoungSo Kim

Keyword(s):

Undergraduate Students ◽

Incremental Validity ◽

Cognitive Test ◽

Negative Effects ◽

Situational Judgment ◽

Situational Judgement Test ◽

Future Direction ◽

Situational Judgement

The purpose of the present study was to examine the fakability of the situational judgment test. Specifically, the study was focused on the following questions; (1) whether participants are able to fake their answers on the situational judgment test in the real situation of selection, (2) whether faking influences the criterion-related validity of the situational judgment test and its incremental validity over cognitive and personality tests, and (3) whether the combination of different scoring key(SME consensus, average in response, and empirical keying) and different scoring algorithm(scenario, Best-Worst, and Pick most) has influence on the degree of fakability as well as both criterion-related validity and incremental validity of the situational judgment test. 110 students who applied to the leadership program were considered the faking group, while 129 students of B department at A university were considered the honest group. The members of both groups completed a cognitive test, a personality questionnaire and a situational judgment test. Only for the situational judgment tests, each group was asked to respond as instructed. Another group of 78 students of A university participated in the survey to develop two scoring key(empirical, average in response keying). SME consensus key was developed by 9 SMEs(5 undergraduate students with leadership and good GPA, 4 graduate students). And then 9 situational judgment scores were produced independently. Results indicated that the all scores of students in the faking group were significantly higher than those of students in the honest group. Furthermore, criterion-related validity of the situational judgement test in the honest group was higher than that of the faking group for both task performance and contextual performance. While faking had negative effects on the criterion-related validity for both criteria of performance, incremental validity of the situational judgement test in the honest group was higher than that of the faking group only for the contextual criteria. Finally, the limitation and future direction of the present study were discussed.

Gamifying a situational judgment test with immersion and control game elements

Journal of Managerial Psychology ◽

10.1108/jmp-10-2018-0446 ◽

2020 ◽

Vol 35 (4) ◽

pp. 225-239

Author(s):

Richard N. Landers ◽

Elena M. Auer ◽

Joseph D. Abraham

Keyword(s):

Convergent Validity ◽

Small Scale ◽

Applicant Reactions ◽

Situational Judgment ◽

Content Type ◽

Measurement Quality ◽

Marketing Tool ◽

Game Elements

PurposeAssessment gamification, which refers to the addition of game elements to existing assessments, is commonly implemented to improved applicant reactions to existing psychometric measures. This study aims to understand the effects of gamification on applicant reactions to and measurement quality of situational judgment tests.Design/methodology/approachIn a 2 × 4 between-subjects experiment, this study randomly assigned 315 people to experience different versions of a gamified situational judgment test, crossing immersive game elements (text, audio, still pictures, video) with control game elements (high and low), measuring applicant reactions and assessing differences in convergent validity between conditions.FindingsThe use of immersive game elements improved perceptions of organizational technological sophistication, but no other reactions outcomes (test attitudes, procedural justice, organizational attractiveness). Convergent validity with cognitive ability was not affected by gamification.Originality/valueThis is the first study to experimentally examine applicant reactions and measurement quality to SJTs based upon the implementation of specific game elements. It demonstrates that small-scale efforts to gamify assessments are likely to lead to only small-scale gains. However, it also demonstrates that such modifications can be done without harming the measurement qualities of the test, making gamification a potentially useful marketing tool for assessment specialists. Thus, this study concludes that utility should be considered carefully and explicitly for any attempt to gamify assessment.

Situational Judgment Tests as an Alternative Measure for Personality Assessment

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000346 ◽

2018 ◽

Vol 34 (5) ◽

pp. 328-335 ◽

Cited By ~ 13

Author(s):

Patrick Mussel ◽

Thomas Gatzka ◽

Johannes Hewig

Keyword(s):

Personality Traits ◽

Personality Assessment ◽

Work Performance ◽

Well Being ◽

Self Report ◽

Alternative Measure ◽

Applied Psychology ◽

Situational Judgment ◽

Situational Judgment Tests

Abstract. Across many domains of applied psychology, personality traits are related to important outcomes such as well-being, psychological disorders, work performance, and academic achievement. However, self-reports, the most common approach to personality assessment, have certain limitations and disadvantages, such as being prone to faking. We investigated whether situational judgment tests, an established assessment technique to predict job performance, might serve as an alternative measure for the assessment of personality. Our results show that a situational judgment test specifically developed to assess narrow personality traits may possess high levels of construct validity. Additionally, our results indicate that the situational judgment was equivalent to a self-report personality measure with regard to predicting a number of theoretically related criteria. We conclude that situational judgment tests may serve as an alternative method for the assessment of personality and discuss potential theoretical and applied drawbacks.

Assessment centers vs situational judgment tests: longitudinal predictors of success

Leadership & Organization Development Journal ◽

10.1108/lodj-12-2014-0235 ◽

2016 ◽

Vol 37 (7) ◽

pp. 899-911 ◽

Cited By ~ 5

Author(s):

Carrie A. Blair ◽

Brian J. Hoffman ◽

Robert T. Ladd

Keyword(s):

Career Success ◽

High Fidelity ◽

Assessment Centers ◽

Situational Judgment ◽

Content Type ◽

General Mental Ability ◽

Utility Company ◽

Practical Implications

Purpose The purpose of this paper is to provide an empirical comparison of a high-fidelity managerial simulation, assessment center (AC) ratings, to that of a lower fidelity managerial simulation, a video situational judgment test (SJT) in the prediction of manager career success. Design/methodology/approach Archival data were collected from a large utility company. A measure of general mental ability (GMA), an SJT, and an AC were examined as predictors of career success as measured by increases in salary. Findings The AC and the video SJT used in this study appeared to assess different constructs, extending previous findings that ACs and written SJTs measure distinct constructs. Furthermore, the AC dimensions and the SJT remained valid predictors of salary over a six year span following the test administration. In addition, the AC explained significant incremental variance beyond GMA and SJTs in career success six years after the assessment. Research limitations/implications The SJTs and AC used in this study are similar in psychological fidelity, yet the ACs remained a more valid predictor over time. The recommendation is that lower fidelity simulations should not be used as prerequisites for higher fidelity simulations. Practical implications The results lend general support to the value of high-fidelity instruments in predicting longitudinal success. Originality/value The paper offers a comparison of the validity of ACs and video SJTs.

Situational judgment test validity: An exploratory model of the participant response process using cognitive and think-aloud interviews

10.21203/rs.3.rs-19804/v1 ◽

2020 ◽

Author(s):

Michael D. Wolcott ◽

Nikki G. Lobczowski ◽

Jacqueline M. Zeeman ◽

Jacqueline E. McLaughlin

Keyword(s):

Test Validity ◽

Cognitive Interview ◽

Validity Evidence ◽

Think Aloud ◽

Response Process ◽

Situational Judgment ◽

Complex Integration ◽

Participant Response

Abstract Background: Situational judgment tests (SJTs) are used in health sciences education to measure knowledge using case-based scenarios. Despite their popularity, there is a significant gap in the validity evidence and research on the response process that demonstrate how SJTs measure their intended constructs. Models of the SJT response process have been proposed in the literature; however, few studies explore and expand these models beyond surface-level attributes. The purpose of this study was to describe the factors and strategies involved in the cognitive process examinees use as they respond to SJT items.Methods: Thirty participants—15 students and 15 experienced practitioners—completed a 12-item SJT designed to measure empathy. Each participant engaged in a think-aloud interview while completing the SJT followed by a cognitive interview probing their decision-making processes. Interviews were transcribed and independently coded by three researchers to identify salient themes and factors that contributed to the response process. Results: Results suggested that the SJT response process included the complex integration of comprehension, retrieval, judgment, and response selections. Each of these response process stages were influenced by attributes such as perceived objective of the task, job-specific knowledge, assumptions about the scenario, and item setting. Conclusions: This study provides an evaluation of the SJT response process and contributes exploratory information to the validity evidence of SJTs; these findings can inform the design, interpretation, and utility of SJTs.

Crowdsourcing expertise: Using Amazon's Mechanical Turk to develop scoring keys for situational judgment tests

10.31234/osf.io/4tpdw ◽

2020 ◽

Cited By ~ 1

Author(s):

Matt Brown ◽

Michael Grossenbacher ◽

Michelle Martin-Raugh ◽

Jonathan Kochert ◽

Matthew Prewett

Keyword(s):

Subject Matter ◽

Procedural Knowledge ◽

Convenience Sample ◽

Situational Judgment ◽

Research Fields ◽

Three Samples ◽

Subject Matter Expertise ◽

Highly Correlated ◽

Amazon's Mechanical Turk

It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing a scoring key for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent across studies. Other research fields have adopted crowdsourcing methods to replace or reproduce judgments thought to require subject matter expertise. Therefore, we conducted the current study to compare crowdsourced scoring keys to SME-based scoring keys for three SJTs in different domains, each varying in job-relatedness. Our results indicate that scoring keys derived from crowdsourced samples are likely to converge with keys based on SME judgment, regardless of test content (correlations ranging from r = .88 to .94). We observed the weakest agreement among individual MTurker and SME ratings for the more job-specific Medical SJT (classification consistency = 61%) but the aggregate scoring keys remained highly correlated. We observed stronger agreement in response option rankings for the Military and Communication SJTs (80% and 85%), which were both designed to require less procedural knowledge. Although general mental ability and conscientiousness were each related to greater expert similarity among MTurkers, the average crowd rating outperformed nearly all individual MTurk raters. Based on an analysis of randomly-drawn bootstrapped samples of MTurker ratings in each of the three samples, we found that as few as 30-40 raters may provide adequate estimates of SME judgments of most SJT items. We hope that these findings help inspire others to consider using crowdsourcing methods as an alternative to SMEs.