APPLICATIONS OF ITEM RESPONSE THEORY MODELS TO ASSESS ITEM PROPERTIES AND STUDENTS’ ABILITIES IN DICHOTOMOUS RESPONSES ITEMS

A test is a tool meant to measure the ability level of the students, and how well they can recall the subject matter, but items making up a test may be defectives, and thereby unable to measure students’ ability or traits satisfactorily as intended if proper attention is not paid to item properties such as difficulty, discrimination, and pseudo guessing indices (power) of each item. This could be remedied by item analysis and moderation. It is a known fact that the absence or improper use of item analysis could undermine the integrity of assessment, certification and placement in our educational institutions. Both appropriateness and spread of items properties in accessing students’ abilities distribution, and the adequacy of information provided by dichotomous response items in a compulsory university undergraduate statistics course which was scored dichotomously, and analyzed with stata 16 SE on window 7 were focused here. In view of this, three dichotomous Item Response Theory (IRT) measurement models were used in the context of their potential usefulness in an education setting such as in determining these items properties. Ability, item discrimination, difficulty, and guessing parameters as unobservable characteristics were quantified with a binary response test, then discrete item response becomes an observable outcome variable which is associated with student’s ability level is thereby linked by Item Characteristic Curves that is defined by a set of item parameters that models the probability of observing a given item response by conditioning on a specific ability level. These models were used to assess each of the three items properties together with students’ abilities; then identified defective items that were needed to be discarded, moderated, and non-defectives items as the case may be while some of these chosen items were discussed based on underlining models. Finally, the information provided by these items was also discussed.

Download Full-text

Item Response Theory and Music Testing

The Oxford Handbook of Assessment Policy and Practice in Music Education, Volume 1 ◽

10.1093/oxfordhb/9780190248093.013.22 ◽

2019 ◽

pp. 477-503

Author(s):

Brian Wesolowski

Keyword(s):

Item Response Theory ◽

Item Response ◽

Test Scores ◽

General Framework ◽

Logistic Function ◽

Response Theory ◽

Measurement Models ◽

Latent Constructs ◽

Item Parameters ◽

Introductory Overview

This chapter presents an introductory overview of concepts that underscore the general framework of item response theory. “Item response theory” is a broad umbrella term used to describe a family of mathematical measurement models that consider observed test scores to be a function of latent, unobservable constructs. Most musical constructs cannot be directly measured and are therefore unobservable. Musical constructs can therefore only be inferred based on secondary, observable behaviors. Item response theory uses observable behaviors as probabilistic distributions of responses as a logistic function of person and item parameters in order to define latent constructs. This chapter describes philosophical, theoretical, and applied perspectives of item response theory in the context of measuring musical behaviors.

Download Full-text

Applying Item Response Theory Analysis to the Montreal Cognitive Assessment in a Low-Education Older Population

Assessment ◽

10.1177/1073191118821733 ◽

2019 ◽

Vol 27 (7) ◽

pp. 1416-1428 ◽

Cited By ~ 3

Author(s):

Hao Luo ◽

Björn Andersson ◽

Jennifer Y. M. Tang ◽

Gloria H. Y. Wong

Keyword(s):

Item Response Theory ◽

Item Response ◽

Cognitive Assessment ◽

Formal Education ◽

Montreal Cognitive Assessment ◽

Response Theory ◽

Homogeneous Sample ◽

Educational Backgrounds ◽

Multiple Group ◽

Item Properties

The traditional application of the Montreal Cognitive Assessment uses total scores in defining cognitive impairment levels, without considering variations in item properties across populations. Item response theory (IRT) analysis provides a potential solution to minimize the effect of important confounding factors such as education. This research applies IRT to investigate the characteristics of Montreal Cognitive Assessment items in a randomly selected, culturally homogeneous sample of 1,873 older persons with diverse educational backgrounds. Any formal education was used as a grouping variable to estimate multiple-group IRT models. Results showed that item characteristics differed between people with and without formal education. Item functioning of the Cube, Clock Number, and Clock Hand items was superior in people without formal education. This analysis provided evidence that item properties vary with education, calling for more sophisticated modelling based on IRT to incorporate the effect of education.

Download Full-text

An item response theory analysis of the Community of Inquiry Scale

The International Review of Research in Open and Distributed Learning ◽

10.19173/irrodl.v16i2.2052 ◽

2015 ◽

Vol 16 (2) ◽

Cited By ~ 4

Author(s):

Mehmet Barış Horzum ◽

Gülden Kaya Uyanik

Keyword(s):

Online Learning ◽

Item Response Theory ◽

Item Response ◽

Community Of Inquiry ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Validity And Reliability ◽

Response Theory ◽

Learning Programs

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center’s online learning programs at a Turkish state university via internet. The collected data is analyzed by using a statistical software package. Research data is analyzed in three aspects, which are checking model assumptions, checking model-data fit and item analysis. Item and test features of the scale are examined by the means of Graded Response Theory. In order to use this model of IRT, after testing the assumptions out of the data gathered from 1,499 participants, data model compliance was examined. Following the affirmative results gathered from the examinations, all data is analyzed by using GRM. As a result of the study, the Community of Inquiry Scale adapted to Turkish by Horzum (in press) is found to be reliable and valid by the means of Classical Test Theory and Item Response Theory.

Download Full-text

A-16 Latent Item Response Theory Regression Using Neuropsychological Tests to Predict Functional Ability

Archives of Clinical Neuropsychology ◽

10.1093/arclin/acaa067.16 ◽

2020 ◽

Vol 35 (6) ◽

pp. 790-790

Author(s):

W Goette ◽

A Carlew ◽

J Schaffert ◽

H Rossetti ◽

L Lacritz

Keyword(s):

Item Response Theory ◽

Item Response ◽

Functional Ability ◽

Neuropsychological Tests ◽

Rating Scale ◽

Word Association ◽

Digit Span ◽

Outcome Variable ◽

Functional Abilities ◽

Response Theory

Abstract Objective Examine prediction of functional ability with neuropsychological tests using latent item response theory. Method The sample included 3155 individuals (Mage = 69.72, SD = 9.41; Median education =13.15, SD = 4.40; white = 92.81%; female = 62.03%; MCI = 25.13%; Dementia = 28.87%) from the Texas Alzheimer’s Research and Care Consortium who completed functional and cognitive assessments [Mini Mental State Examination (MMSE), Logical Memory (LM), Visual Reproduction (VR), Controlled Oral Word Association Test (COWAT), Trail Making Test (TMT), Boston Naming Test, and Digit Span]. Functional measures [Clinical Dementia Rating Scale, Physical Self Maintenance Scale, and Instrumental Activities of Daily Living)] were combined into a single outcome variable using confirmatory factor analysis. Item response theory (IRT) was used to fit the data, and latent regression to predict the latent trait score using neuropsychological data. Results All three functional scales loaded onto a single factor and demonstrated good construct coverage and measurement reliability (Supporting Figure). A graded response IRT model best fit the functional ability composite measure. MMSE (b = −1.08, p < .001), LM II (b = −0.58, p < .001), VR I and II (b = −0.09, p = .02 and b = −0.43, p < .001, respectively), COWAT (b = −0.10, p = .003), and TMT-B (b = −0.30, p < .001) all significantly predicted functional abilities, as did age (b = 0.61, p < .001) and education (b = 0.31, p < .001). Conclusions Global cognition, memory and executive function tests predicted functional abilities while attention and language tasks did not. These results suggest that certain neuropsychological tests meaningfully predict functional abilities in elderly cognitively normal and cognitively impaired individuals. Further research is needed to determine whether these cognitive domains are predictive of functional abilities in other clinical disorders.

Download Full-text

Positive and Negative Affect Schedule: Psychometric Properties for the Brazilian Portuguese Version

The Spanish Journal of Psychology ◽

10.1017/sjp.2013.60 ◽

2013 ◽

Vol 16 ◽

Cited By ~ 11

Author(s):

Pedro Pires ◽

Alberto Filgueiras ◽

Rodolfo Ribas ◽

Cristina Santana

Keyword(s):

Item Response Theory ◽

Negative Affect ◽

Psychometric Properties ◽

Item Response ◽

Item Analysis ◽

Theory Model ◽

Positive And Negative Affect ◽

Partial Credit Model ◽

Cultural Issues ◽

Response Theory

AbstractThis study is about the validity and item analysis for the Positive and Negative Affect Schedule (PANAS), respectively through the Exploratory Factor Analysis (principal components method) and the Partial Credit Model (PCM). The scale has been largely used in areas ranging from clinical to social psychology since its release in 1988 by Watson, Clark, and Tellegen. In order to assess validity and item properties (Item Response Theory paradigm), this is study administered PANAS to 354 respondents, 115 male and 239 female subjects, with an average age of 29.5 (SD = 10,18). The results show PANAS’s excellent psychometric properties, with consistent dimensions and reliable item functioning, considering the Rasch measurement paradigm expressed in the PCM as an Item Response Theory model for polytomous data. The study considers important cultural issues and the results support more cautious translations for scales as well as further studies concerned with cross-cultural differences on the perception of affect states.

Download Full-text

Item Analysis of Test of Proficiency in Korean: Classical Test Theory and Item Response Theory

The Korean Language in America ◽

10.5325/korelangamer.23.1.0001 ◽

2019 ◽

Vol 23 (1) ◽

pp. 1

Author(s):

Yu ◽

Kim

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Classical Test

Download Full-text

NP3 AN ITEM ANALYSIS OF THE 24 HOUR HEADACHE DISABILITY QUESTIONNAIRE (DISQ-24) USING CLASSICAL TEST THEORY AND ITEM RESPONSE THEORY METHODS

Value in Health ◽

10.1016/s1098-3015(10)60859-5 ◽

2002 ◽

Vol 5 (3) ◽

pp. 137

Author(s):

A Ayyar Krishnan ◽

WJ Kwong

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Classical Test ◽

Disability Questionnaire ◽

Headache Disability

Download Full-text

Comparison of item analysis results of Korean Medical Licensing Examination according to classical test theory and item response theory

Journal of Educational Evaluation for Health Professions ◽

10.3352/jeehp.2004.1.1.67 ◽

2004 ◽

Vol 1 ◽

pp. 67 ◽

Cited By ~ 1

Author(s):

Eun Young Lim ◽

Jang Hee Park ◽

ll Kwon ◽

Gue Lim Song ◽

Sun Huh

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Analysis ◽

Classical Test Theory ◽

Test Theory ◽

Response Theory ◽

Computerized Adaptive Test ◽

Classical Test ◽

Difficulty Index ◽

Ability Parameter

The results of the 64th and 65th Korean Medical Licensing Examination were analyzed according to the classical test theory and item response theory in order to know the possibility of applying item response theory to item analys and to suggest its applicability to computerized adaptive test. The correlation coefficiency of difficulty index, discriminating index and ability parameter between two kinds of analysis were got using computer programs such as Analyst 4.0, Bilog and Xcalibre. Correlation coefficiencies of difficulty index were equal to or more than 0.75; those of discriminating index were between - 0.023 and 0.753; those of ability parameter were equal to or more than 0.90. Those results suggested that the item analysis according to item response theory showed the comparable results with that according to classical test theory except discriminating index. Since the ability parameter is most widely used in the criteria-reference test, the high correlation between ability parameter and total score can provide the validity of computerized adaptive test utilizing item response theory.

Download Full-text

A Bayesian Random Block Item Response Theory Model for Forced-Choice Formats

Educational and Psychological Measurement ◽

10.1177/0013164419871659 ◽

2019 ◽

Vol 80 (3) ◽

pp. 578-603

Author(s):

HyeSun Lee ◽

Weldon Z. Smith

Keyword(s):

Item Response Theory ◽

Item Response ◽

Model Performance ◽

Theory Model ◽

Forced Choice ◽

Simultaneous Estimation ◽

Response Theory ◽

Irt Model ◽

Measurement Models ◽

Random Block

Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.

Download Full-text

Score production and quantitative methods used by the National Board of Chiropractic Examiners for postexam analyses

Journal of Chiropractic Education ◽

10.7899/jce-18-27 ◽

2020 ◽

Vol 34 (1) ◽

pp. 35-42 ◽

Cited By ~ 5

Author(s):

Igor Himelfarb ◽

Bruce L. Shotts ◽

Nai-En Tang ◽

Margaret Smith

Keyword(s):

Item Response Theory ◽

Item Response ◽

Quantitative Methods ◽

Standardized Test ◽

Item Analysis ◽

Objective Measures ◽

Data Validation ◽

Response Theory ◽

Data Set ◽

National Board

Objective: The National Board of Chiropractic Examiners (NBCE) uses a robust system for data analysis. The aim of this work is to introduce the reader to the process of score production and the quantitative methods used by the psychometrician and data analysts of the NBCE. Methods: The NBCE employs data validation, diagnostic analyses, and item response theory–based modeling of responses to estimate test takers' abilities and item-related parameters. For this article, the authors generated 1303 synthetic item responses to 20 multiple-choice items with 4 response options to each item. These data were used to illustrate and explain the processes of data validation, diagnostic item analysis, and item calibration based on item response theory. Results: The diagnostic item analysis is presented for items 1 and 5 of the data set. The 3-parameter logistic item response theory model was used for calibration. Numerical and graphical results are presented and discussed. Conclusion: Demands for data-driven decision making and evidence-based effectiveness create a need for objective measures to be used in educational program reviews and evaluations. Standardized test scores are often included in that array of objective measures. With this article, we offer transparency of score production used for NBCE testing.

Download Full-text