item calibration Latest Research Papers

Review of the shadow-test approach to adaptive testing

Behaviormetrika ◽

10.1007/s41237-021-00150-y ◽

2021 ◽

Author(s):

Wim J. van der Linden

Keyword(s):

Real Time ◽

Time Management ◽

Adaptive Testing ◽

Basic Operation ◽

Item Calibration ◽

Security Monitoring ◽

Calibration Time ◽

Sequential Assembly

AbstractConstrained adaptive testing is reviewed as an instance of discrete maximization with the shadow-test approach delivering its solution. The approach may look counterintuitive in that it assumes sequential assembly of full test forms as its basic operation. But it always produces real-time solutions that are optimal and satisfy the set of specifications in effect for the test. Equally importantly, it can be used to run testing programs with different degrees of adaptation for the same set of specifications and/or as a tool to manage programs with simultaneous processes as adaptive item calibration, time management, and/or item-security monitoring.

Download Full-text

An Investigation of Item Calibration Methods in Multistage Testing

Measurement Interdisciplinary Research and Perspectives ◽

10.1080/15366367.2021.1878778 ◽

2021 ◽

Vol 19 (3) ◽

pp. 163-178

Author(s):

Liuhan Cai ◽

Anthony D. Albano ◽

Louis A. Roussos

Keyword(s):

Item Calibration ◽

Multistage Testing ◽

Calibration Methods

Download Full-text

Cross-Classified Random Effects Modeling for Moderated Item Calibration

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620983908 ◽

2021 ◽

pp. 107699862098390

Author(s):

Seungwon Chung ◽

Li Cai

Keyword(s):

Students With Disabilities ◽

Random Effects ◽

Subject Matter ◽

Large Scale ◽

Item Parameter ◽

Special Populations ◽

Parameter Estimates ◽

Subject Matter Experts ◽

Item Calibration ◽

State Education

In the research reported here, we propose a new method for scale alignment and test scoring in the context of supporting students with disabilities. In educational assessment, students from these special populations take modified tests because of a demonstrated disability that requires more assistance than standard testing accommodation. Updated federal education legislation and guidance require that these students be assessed and included in state education accountability systems, and their achievement reported with respect to the same rigorous content and achievement standards that the state adopted. Routine item calibration and linking methods are not feasible because the size of these special populations tends to be small. We develop a unified cross-classified random effects model that utilizes item response data from the general population as well as judge-provided data from subject matter experts in order to obtain revised item parameter estimates for use in scoring modified tests. We extend the Metropolis–Hastings Robbins–Monro algorithm to estimate the parameters of this model. The proposed method is applied to Braille test forms in a large operational multistate English language proficiency assessment program. Our work not only allows a broader range of modifications that is routinely considered in large-scale educational assessments but also directly incorporates the input from subject matter experts who work directly with the students needing support. Their structured and informed feedback deserves more attention from the psychometric community.

Download Full-text

The Soft Skills Inventory: Developmental procedures and psychometric analysis

Psychological Reports ◽

10.1177/0033294120979933 ◽

2020 ◽

pp. 003329412097993

Author(s):

Jacinto Jardim ◽

Anabela Pereira ◽

Paula Vagos ◽

Inês Direito ◽

Sónia Galinha

Keyword(s):

Higher Education ◽

Social Support ◽

Soft Skills ◽

Measurement Model ◽

Female Students ◽

Self Determination ◽

Psychometric Analysis ◽

Item Calibration ◽

Targeted Interventions ◽

Work Related

When attending and participating in Higher Education, students face a multitude of personal, social, and work-related challenges, which may increase the risk of developing psychopathological symptomatology. To date, there is no instrument that grasps the non-technical skills that may help prepare students to respond to these challenges. This paper presents the development and psychometric properties of the Soft Skills Inventory (SSI). The inventory was developed based on theoretical and empirical findings on the skills associated with academic and professional success, and on students’ perception. The SSI was tested with 2030 Portuguese students (of which 77.1% were female) using a two-stage approach: item calibration and model generation (n = 1033), followed by model validation (n = 997). Item calibration analyses led to retaining 49 items that were organized into six-factors: self-determination, resilience, empathy, assertiveness, social support, and teamwork. This measurement model was further validated and proved to be an invariant, and thus credible, tool to compare male and female students on those relevant skills. All measures attained good internal consistency, with alphas ranging from .76 to .88. Female students scored significantly higher than males on self-determination, empathy, social support and teamwork. On the other hand, male students scored significantly higher on resilience. No significant differences were found between men and women for assertiveness. Psychometric analysis showed that the SSI is a reliable and valid instrument to evaluate students intra and interpersonal skills. The SSI may help identify gaps in soft skills and guide targeted interventions to support a more positive student experience in Higher Education.

Download Full-text

Maximum information per time unit designs for continuous online item calibration

British Journal of Mathematical and Statistical Psychology ◽

10.1111/bmsp.12221 ◽

2020 ◽

Author(s):

Yinhong He ◽

Ping Chen ◽

Yong Li

Keyword(s):

Item Calibration ◽

Maximum Information

Download Full-text

Achievement Emotions in Mathematics: Design and Evidence of Validity of a Self-Report Scale

Journal of Education and Learning ◽

10.5539/jel.v9n5p233 ◽

2020 ◽

Vol 9 (5) ◽

pp. 233

Author(s):

Olimpia Gómez ◽

Benilde García-Cabrero ◽

Michael L. Hoover ◽

Sandra Castañeda-Figueiras ◽

Yolanda Guevara Benítez

Keyword(s):

Math Achievement ◽

Construct Validation ◽

Math Performance ◽

Self Report ◽

Mathematical Tasks ◽

Fit Indices ◽

Item Calibration ◽

Achievement Emotions ◽

Reliability Coefficients ◽

Adequate Reliability

The Inventory of Emotions Experienced by Adolescents when Solving Mathematical Tasks (INETAM, for its acronym in Spanish), measures four influential emotions related to Math performance: Enthusiasm, Frustration, Enjoyment, and Boredom. Content validity, construct validation and item calibration analyses were performed to obtain evidence of its validity, using a sample of 448 adolescents enrolled in ninth grade. Factor analysis showed adequate reliability coefficients and fit indices. Calibration analysis showed that the items are highly informative and discriminate between response levels. Regression analysis indicated that emotions are predictors of math achievement. INETAM is solid psychometrically, suitable for measuring academic emotions of adolescents, and can contribute to improve our understanding of their influence on academic achievement in mathematics.

Download Full-text

A Shadow-Test Approach to Adaptive Item Calibration

Psychometrika ◽

10.1007/s11336-020-09703-8 ◽

2020 ◽

Vol 85 (2) ◽

pp. 301-321 ◽

Cited By ~ 1

Author(s):

Wim J. van der Linden ◽

Bingnan Jiang

Keyword(s):

Item Calibration

Download Full-text

irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT

Applied Psychological Measurement ◽

10.1177/0146621620921247 ◽

2020 ◽

Vol 44 (7-8) ◽

pp. 563-565

Author(s):

Hwanggyu Lim ◽

Craig S. Wells

Keyword(s):

Item Response Theory ◽

Item Response ◽

R Package ◽

Model Fit ◽

Parameter Estimates ◽

Model Data ◽

Polytomous Items ◽

Item Calibration ◽

Irt Model ◽

Irt Models

The R package irtplay provides practical tools for unidimensional item response theory (IRT) models that conveniently enable users to conduct many analyses related to IRT. For example, the irtplay includes functions for calibrating online items, scoring test-takers’ proficiencies, evaluating IRT model-data fit, and importing item and/or proficiency parameter estimates from the output of popular IRT software. In addition, the irtplay package supports mixed-item formats consisting of dichotomous and polytomous items.

Download Full-text

An Optimized Bayesian Hierarchical Two-Parameter Logistic Model for Small-Sample Item Calibration

Applied Psychological Measurement ◽

10.1177/0146621619893786 ◽

2019 ◽

Vol 44 (4) ◽

pp. 311-326

Author(s):

Christoph König ◽

Christian Spoden ◽

Andreas Frey

Keyword(s):

Variance Components ◽

Weighted Least Squares ◽

Small Sample ◽

Item Parameter ◽

Parameter Estimates ◽

Calibration Error ◽

Sample Sizes ◽

Item Calibration ◽

Bayesian Hierarchical ◽

Two Parameter

Accurate item calibration in models of item response theory (IRT) requires rather large samples. For instance, [Formula: see text] respondents are typically recommended for the two-parameter logistic (2PL) model. Hence, this model is considered a large-scale application, and its use in small-sample contexts is limited. Hierarchical Bayesian approaches are frequently proposed to reduce the sample size requirements of the 2PL. This study compared the small-sample performance of an optimized Bayesian hierarchical 2PL (H2PL) model to its standard inverse Wishart specification, its nonhierarchical counterpart, and both unweighted and weighted least squares estimators (ULSMV and WLSMV) in terms of sampling efficiency and accuracy of estimation of the item parameters and their variance components. To alleviate shortcomings of hierarchical models, the optimized H2PL (a) was reparametrized to simplify the sampling process, (b) a strategy was used to separate item parameter covariances and their variance components, and (c) the variance components were given Cauchy and exponential hyperprior distributions. Results show that when combining these elements in the optimized H2PL, accurate item parameter estimates and trait scores are obtained even in sample sizes as small as [Formula: see text]. This indicates that the 2PL can also be applied to smaller sample sizes encountered in practice. The results of this study are discussed in the context of a recently proposed multiple imputation method to account for item calibration error in trait estimation.

Download Full-text

Towards a progression of health literacy skills: Establishing the HLS-Q12 cutoff scores

10.21203/rs.2.13456/v2 ◽

2019 ◽

Author(s):

Øystein Guttersrud ◽

Christopher Le ◽

Kjell Sverre Pettersen ◽

Sølvi Helseth ◽

Hanne Søberg Finbråten

Keyword(s):

Health Literacy ◽

Rating Scale ◽

Short Form ◽

Population Sample ◽

Measurement Model ◽

Literacy Skills ◽

Item Calibration ◽

Rasch Modelling ◽

Cutoff Scores ◽

Item Content

Abstract Background The self-reported European Health Literacy Survey Questionnaire (HLS-EU-Q47) is a widely used measure for population health literacy. Based on confirmatory factor analyses and Rasch modelling, the short form HLS-Q12 was developed to meet the Rasch unidimensional measurement model expectations. After its publication, there was a worldwide call to identify HLS-Q12 cutoff scores and establish clearly delineated standards regarding the skills assessed. This study therefore aims to identify the HLS-Q12 scores associated with statistically distinct levels of proficiency and to construct a proficiency scale that may indicate what individuals typically know and can do at increasingly sophisticated levels of health literacy. Methods We applied the unidimensional Rasch measurement model for polytomous items to responses from 900 randomly sampled individuals and 388 individuals with type 2 diabetes. Using Rasch based item calibration, we constructed a proficiency scale by locating the ordered item thresholds along the scale. By applying Wright’s method for the maximum number of strata, we determined the cutoff scores for significantly different levels. By directly referring to item content that people who achieved the cutoff scores viewed as ‘easy’, we suggested what these gradually more advanced levels of health literacy might mean in terms of item content. Results Analysing the population sample, we identified statistically distinct levels of health literacy at the empirically identified cutoff scores 27, 33 and 39. We confirmed them by analysing the responses from individuals with diabetes. Using item calibration, the resulting HLS-Q12 proficiency scale expresses typical knowledge and skills at these three statistically distinct levels. The scale’s cumulative nature indicates what it may mean qualitatively to move from low to high health literacy. Conclusions By identifying levels of health literacy, we may initiate the improvement of current models of health literacy. Determining how to adapt information to patients’ health literacy level is a possible clinical outcome. A substantial methodological outcome is the inevitability of Rasch modelling in measurement. We found that Wright’s method identified rating scale cutoff scores consistently across independent samples. To reveal sources of potential biases, threats to validity and imprecision of benchmarks, replication of our study in other contexts is required

Download Full-text

item calibration
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Review of the shadow-test approach to adaptive testing

An Investigation of Item Calibration Methods in Multistage Testing

Cross-Classified Random Effects Modeling for Moderated Item Calibration

The Soft Skills Inventory: Developmental procedures and psychometric analysis

Maximum information per time unit designs for continuous online item calibration

Achievement Emotions in Mathematics: Design and Evidence of Validity of a Self-Report Scale

A Shadow-Test Approach to Adaptive Item Calibration

irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT

An Optimized Bayesian Hierarchical Two-Parameter Logistic Model for Small-Sample Item Calibration

Towards a progression of health literacy skills: Establishing the HLS-Q12 cutoff scores

Export Citation Format

item calibrationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Review of the shadow-test approach to adaptive testing

An Investigation of Item Calibration Methods in Multistage Testing

Cross-Classified Random Effects Modeling for Moderated Item Calibration

The Soft Skills Inventory: Developmental procedures and psychometric analysis

Maximum information per time unit designs for continuous online item calibration

Achievement Emotions in Mathematics: Design and Evidence of Validity of a Self-Report Scale

A Shadow-Test Approach to Adaptive Item Calibration

irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT

An Optimized Bayesian Hierarchical Two-Parameter Logistic Model for Small-Sample Item Calibration

Towards a progression of health literacy skills: Establishing the HLS-Q12 cutoff scores

item calibration
Recently Published Documents