Item Desirability Matching in Forced-choice Test Construction
The forced-choice method has been proposed as a viable strategy to prevent socially desirable responding (SDR) on self-report non-cognitive measures. The ability of the method to eliminate SDR may largely depend on how closely items comprising forced-choice item-blocks are matched in terms of perceived desirability. The gold standard in quantifying similarity between items in terms of desirability has been the mean difference index, that is, the absolute difference between items’ mean desirability ratings. The mean difference index relies on the assumption that items have one “true” desirability value, as represented by their means, and may fail if this assumption does not hold. Instead, we propose indexing within-rater agreement with several robust agreement indices to appropriately quantify similarity between items in terms of desirability (i.e., inter-item agreement). On a set of empirically derived desirability ratings, we show that relying on the mean difference index may lead to suboptimal forced-choice item assembly. Implications of our findings and future research directions are discussed. R code for computing the proposed indices on a set of desirability ratings is provided.