Score production and quantitative methods used by the National Board of Chiropractic Examiners for postexam analyses

Objective: The National Board of Chiropractic Examiners (NBCE) uses a robust system for data analysis. The aim of this work is to introduce the reader to the process of score production and the quantitative methods used by the psychometrician and data analysts of the NBCE. Methods: The NBCE employs data validation, diagnostic analyses, and item response theory–based modeling of responses to estimate test takers' abilities and item-related parameters. For this article, the authors generated 1303 synthetic item responses to 20 multiple-choice items with 4 response options to each item. These data were used to illustrate and explain the processes of data validation, diagnostic item analysis, and item calibration based on item response theory. Results: The diagnostic item analysis is presented for items 1 and 5 of the data set. The 3-parameter logistic item response theory model was used for calibration. Numerical and graphical results are presented and discussed. Conclusion: Demands for data-driven decision making and evidence-based effectiveness create a need for objective measures to be used in educational program reviews and evaluations. Standardized test scores are often included in that array of objective measures. With this article, we offer transparency of score production used for NBCE testing.

Download Full-text

Mokken Scale Analysis: Discussion and Application

Advances in Social Sciences Research Journal ◽

10.14738/assrj.83.9949 ◽

2021 ◽

Vol 8 (3) ◽

pp. 672-695

Author(s):

Thomas DeVaney

Keyword(s):

Item Response Theory ◽

Item Response ◽

Rating Scale ◽

Scale Analysis ◽

Guttman Scale ◽

Response Theory ◽

Data Set ◽

Mokken Scale Analysis ◽

Irt Models ◽

Guttman Scaling

This article presents a discussion and illustration of Mokken scale analysis (MSA), a nonparametric form of item response theory (IRT), in relation to common IRT models such as Rasch and Guttman scaling. The procedure can be used for dichotomous and ordinal polytomous data commonly used with questionnaires. The assumptions of MSA are discussed as well as characteristics that differentiate a Mokken scale from a Guttman scale. MSA is illustrated using the mokken package with R Studio and a data set that included over 3,340 responses to a modified version of the Statistical Anxiety Rating Scale. Issues addressed in the illustration include monotonicity, scalability, and invariant ordering. The R script for the illustration is included.

Download Full-text

Examination of Different Item Response Theory Models on Tests Composed of Testlets

Journal of Education and Learning ◽

10.5539/jel.v6n4p113 ◽

2017 ◽

Vol 6 (4) ◽

pp. 113

Author(s):

Esin Yilmaz Kogar ◽

Hülya Kelecioglu

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Data Sets ◽

Meaningful Difference ◽

Response Theory ◽

Size Change ◽

Data Set ◽

Testlet Response Theory ◽

Item Response Theory Models

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.

Download Full-text

A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

Applied Psychological Measurement ◽

10.1177/0146621611428447 ◽

2011 ◽

Vol 35 (8) ◽

pp. 604-622 ◽

Cited By ~ 16

Author(s):

Hirotaka Fukuhara ◽

Akihito Kamata

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Estimation Method ◽

Multidimensional Item Response Theory ◽

Multidimensional Item Response ◽

Response Theory ◽

Data Set ◽

Detection Rates ◽

Item Functioning

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.

Download Full-text

An item response theory analysis of the Community of Inquiry Scale

The International Review of Research in Open and Distributed Learning ◽

10.19173/irrodl.v16i2.2052 ◽

2015 ◽

Vol 16 (2) ◽

Cited By ~ 4

Author(s):

Mehmet Barış Horzum ◽

Gülden Kaya Uyanik

Keyword(s):

Online Learning ◽

Item Response Theory ◽

Item Response ◽

Community Of Inquiry ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Validity And Reliability ◽

Response Theory ◽

Learning Programs

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center’s online learning programs at a Turkish state university via internet. The collected data is analyzed by using a statistical software package. Research data is analyzed in three aspects, which are checking model assumptions, checking model-data fit and item analysis. Item and test features of the scale are examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data is analyzed by using GRM. As a result of the study, the Community of Inquiry Scale adapted to Turkish by Horzum (in press) is found to be reliable and valid by the means of Classical Test Theory and Item Response Theory.

Download Full-text

Robustness of Projective IRT to Misspecification of the Underlying Multidimensional Model

Applied Psychological Measurement ◽

10.1177/0146621620909894 ◽

2020 ◽

Vol 44 (5) ◽

pp. 362-375

Author(s):

Tyler Strachan ◽

Edward Ip ◽

Yanyan Fu ◽

Terry Ackerman ◽

Shyh-Huei Chen ◽

...

Keyword(s):

Item Response Theory ◽

Item Response ◽

Real Data ◽

Model Parameters ◽

Simulation Studies ◽

Response Theory ◽

Computational Stability ◽

Data Set ◽

Response Data ◽

Higher Dimensional

As a method to derive a “purified” measure along a dimension of interest from response data that are potentially multidimensional in nature, the projective item response theory (PIRT) approach requires first fitting a multidimensional item response theory (MIRT) model to the data before projecting onto a dimension of interest. This study aims to explore how accurate the PIRT results are when the estimated MIRT model is misspecified. Specifically, we focus on using a (potentially misspecified) two-dimensional (2D)-MIRT for projection because of its advantages, including interpretability, identifiability, and computational stability, over higher dimensional models. Two large simulation studies (I and II) were conducted. Both studies examined whether the fitting of a 2D-MIRT is sufficient to recover the PIRT parameters when multiple nuisance dimensions exist in the test items, which were generated, respectively, under compensatory MIRT and bifactor models. Various factors were manipulated, including sample size, test length, latent factor correlation, and number of nuisance dimensions. The results from simulation studies I and II showed that the PIRT was overall robust to a misspecified 2D-MIRT. Smaller third and fourth simulation studies were done to evaluate recovery of the PIRT model parameters when the correctly specified higher dimensional MIRT or bifactor model was fitted with the response data. In addition, a real data set was used to illustrate the robustness of PIRT.

Download Full-text

Positive and Negative Affect Schedule: Psychometric Properties for the Brazilian Portuguese Version

The Spanish Journal of Psychology ◽

10.1017/sjp.2013.60 ◽

2013 ◽

Vol 16 ◽

Cited By ~ 11

Author(s):

Pedro Pires ◽

Alberto Filgueiras ◽

Rodolfo Ribas ◽

Cristina Santana

Keyword(s):

Item Response Theory ◽

Negative Affect ◽

Psychometric Properties ◽

Item Response ◽

Item Analysis ◽

Theory Model ◽

Positive And Negative Affect ◽

Partial Credit Model ◽

Cultural Issues ◽

Response Theory

AbstractThis study is about the validity and item analysis for the Positive and Negative Affect Schedule (PANAS), respectively through the Exploratory Factor Analysis (principal components method) and the Partial Credit Model (PCM). The scale has been largely used in areas ranging from clinical to social psychology since its release in 1988 by Watson, Clark, and Tellegen. In order to assess validity and item properties (Item Response Theory paradigm), this is study administered PANAS to 354 respondents, 115 male and 239 female subjects, with an average age of 29.5 (SD = 10,18). The results show PANAS’s excellent psychometric properties, with consistent dimensions and reliable item functioning, considering the Rasch measurement paradigm expressed in the PCM as an Item Response Theory model for polytomous data. The study considers important cultural issues and the results support more cautious translations for scales as well as further studies concerned with cross-cultural differences on the perception of affect states.

Download Full-text

Item Analysis of Test of Proficiency in Korean: Classical Test Theory and Item Response Theory

The Korean Language in America ◽

10.5325/korelangamer.23.1.0001 ◽

2019 ◽

Vol 23 (1) ◽

pp. 1

Author(s):

Yu ◽

Kim

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Classical Test

Download Full-text

NP3 AN ITEM ANALYSIS OF THE 24 HOUR HEADACHE DISABILITY QUESTIONNAIRE (DISQ-24) USING CLASSICAL TEST THEORY AND ITEM RESPONSE THEORY METHODS

Value in Health ◽

10.1016/s1098-3015(10)60859-5 ◽

2002 ◽

Vol 5 (3) ◽

pp. 137

Author(s):

A Ayyar Krishnan ◽

WJ Kwong

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Classical Test ◽

Disability Questionnaire ◽

Headache Disability

Download Full-text

Comparison of item analysis results of Korean Medical Licensing Examination according to classical test theory and item response theory

Journal of Educational Evaluation for Health Professions ◽

10.3352/jeehp.2004.1.1.67 ◽

2004 ◽

Vol 1 ◽

pp. 67 ◽

Cited By ~ 1

Author(s):

Eun Young Lim ◽

Jang Hee Park ◽

ll Kwon ◽

Gue Lim Song ◽

Sun Huh

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Computerized Adaptive Test ◽

Classical Test ◽

Difficulty Index ◽

Ability Parameter

The results of the 64th and 65th Korean Medical Licensing Examination were analyzed according to the classical test theory and item response theory in order to know the possibility of applying item response theory to item analys and to suggest its applicability to computerized adaptive test. The correlation coefficiency of difficulty index, discriminating index and ability parameter between two kinds of analysis were got using computer programs such as Analyst 4.0, Bilog and Xcalibre. Correlation coefficiencies of difficulty index were equal to or more than 0.75; those of discriminating index were between - 0.023 and 0.753; those of ability parameter were equal to or more than 0.90. Those results suggested that the item analysis according to item response theory showed the comparable results with that according to classical test theory except discriminating index. Since the ability parameter is most widely used in the criteria-reference test, the high correlation between ability parameter and total score can provide the validity of computerized adaptive test utilizing item response theory.

Download Full-text

Establishing Thresholds for Meaningful Within-individual Change Using Longitudinal Item Response Theory

10.21203/rs.3.rs-371137/v1 ◽

2021 ◽

Author(s):

Jakob Bue Bjorner ◽

Berend Terluin ◽

Andrew Trigg ◽

Jinxiang Hu ◽

Keri J.S. Brady ◽

...

Keyword(s):

Item Response Theory ◽

Item Response ◽

Individual Change ◽

Response Theory ◽

Traditional Methods ◽

Data Set ◽

Score Improvement ◽

Patient Reported ◽

Longitudinal Item Response

Abstract PURPOSE: Thresholds for meaningful within-individual change (MWIC) are useful for interpreting patient-reported outcome measures (PROM). Transition ratings (TR) have been recommended as anchors to establish MWIC. Traditional statistical methods for analyzing MWIC such as mean change analysis, receiver operating characteristic (ROC) analysis, and predictive modeling ignore problems of floor/ceiling effects and measurement error in the PROM scores and the TR item. We present a novel approach to MWIC estimation for multi-item scales using longitudinal item response theory (LIRT).METHODS: A Graded Response LIRT model for baseline and follow-up PROM data was expanded to include a TR item measuring latent change. The LIRT threshold parameter for the TR established the MWIC threshold on the latent metric, from which the observed PROM score MWIC threshold was estimated. We compared the LIRT approach and traditional methods using an example data set with baseline and three follow-up assessments differing by magnitude of score improvement, variance of score improvement, and baseline-follow-up score correlation.RESULTS: The LIRT model provided good fit to the data. LIRT estimates of observed PROM MWIC varied between 3 and 4 points score improvement. In contrast, results from traditional methods varied from 2 points to 10 points - strongly associated with proportion of self-rated improvement. Best agreement between methods was seen when approximately 50% rated their health as improved.CONCLUSION : Results from traditional analyses of anchor-based MWIC are impacted by study conditions. LIRT constitutes a promising and more robust analytic approach to identifying thresholds for MWIC.

Download Full-text