Algorithms for Computerized Test Construction Using Classical Item Parameters

1989 ◽  
Vol 14 (3) ◽  
pp. 279-290 ◽  
Author(s):  
Jos J. Adema ◽  
Wim J. van der Linden

Recently, linear programming models for test construction were developed. These models were based on the information function from item response theory. In this paper another approach is followed. Two 0-1 linear programming models for the construction of tests using classical item and test parameters are given. These models are useful, for instance, when classical test theory has to serve as an interface between an IRT-based item banking system and a test constructor not familiar with the underlying theory.

1998 ◽  
Vol 23 (1) ◽  
pp. 1-17 ◽  
Author(s):  
Ronald D. Armstrong ◽  
Douglas H. Jones ◽  
Zhaobo Wang

This article considers the problem of generating a test from an item bank using a criterion based on classical test theory parameters. A mathematical programming model is formulated that maximizes the reliability coefficient α, subject to logical constraints on the choice of items. The special structure of the problem is exploited with network theory and Lagrangian relaxation techniques. An empirical study shows that the method produces tests with high coefficient a subject to various practicable item constraints.


2001 ◽  
Vol 26 (2) ◽  
pp. 180-198 ◽  
Author(s):  
Ing-Long Wu

This paper discusses the simultaneous approach for test generation of two-stage and multistage tests from an item bank. However, most previous test generation problems involved binary programming models, and the efficiency of available solution algorithms is the major concern for these problems. Therefore, this study considers two important concepts on the solution process, i.e., alternative ways to formulate mathematical models and alternative solution algorithms. Based on these two concepts, this paper is two-fold. First, two binary programming models with a special network structure that would be explored computationally are presented for modeling these problems. The first model maximizes test information function at one specified ability point and the second model matches the target test information function at several specified ability points as closely as possible. Second, an efficient special-purpose network algorithm is then used to solve these two models. Hence, the test construction process tends to be improved in terms of both computational efforts and quality of tests. An empirical study shows the results in line with the above two criteria.


1994 ◽  
Vol 19 (1) ◽  
pp. 73 ◽  
Author(s):  
Ronald D. Armstrong ◽  
Douglas H. Jones ◽  
Zhaobo Wang

Author(s):  
Jerhi Wahyu Fernanda ◽  
Noer Hidayah

<p>Penilaian merupakan akhir dari proses pembelajaran yang dapat dilakukan melalui ujian. Soal yang digunakan harus mampu mengukur kemampuan peserta didik. <em>Classical Test Theory</em> (CTT) dan <em>Rasch model</em> merupakan analisis statistika untuk menganalis butir soal. Desain dalam penelitian ini adalah penelitian deskriptif. Hasil analisis terhadap 50 soal menggunakan metode CTT, didapatkan hasil bahwa hanya 21 soal yang memenuhi kriteria <em>item difficulty</em> dan <em>item discriminant</em>. Analisis <em>Rasch model</em>, memberikan informasi bahwa secara keseluruhan kualitas soal dikatakan baik berdasarkan pola kurva <em>item information function</em>. Analisis ini juga memberikan informasi terdapat 42 soal yang layak karena memenuhi kriteria <em>item fit,</em> dan 8 soal yang harus dievaluasi lagi. Analisis menggunakan <em>Rasch model</em> lebih baik dibandingan dengan CTT, sehingga 8 soal yang tidak layak berdasarkan analisis tersebut harus dievaluasi dengan mengubah bentuk studi kasus pada soal tersebut dan membuat inovasi metode pembelajaran terkait materi pada 8 soal tersebut.</p><p><strong>Kata kunci</strong>: analisis soal, <em>Classical Test Theory</em>, <em>Rasch model</em>.</p>


2014 ◽  
Vol 35 (4) ◽  
pp. 250-261 ◽  
Author(s):  
Matthias Ziegler ◽  
Arthur Poropat ◽  
Julija Mell

Short personality questionnaires are increasingly used in research and practice, with some scales including as few as two to five items per personality domain. Despite the frequency of their use, these short scales are often criticized on the basis of their reduced internal consistencies and their purported failure to assess the breadth of broad constructs, such as the Big 5 factors of personality. One reason for this might be the use of principles routed in Classical Test Theory during test construction. In this study, Generalizability Theory is used to compare psychometric properties of different scales based on the NEO-PI-R and BFI, two widely-used personality questionnaire families. Applying both Classical Test Theory (CTT) and Generalizability Theory (GT) allowed to identify the inner workings of test shortening. CTT-based analyses indicated that longer is generally better for reliability, while GT allowed differentiation between reliability for relative and absolute decisions, while revealing how different variance sources affect test score reliability estimates. These variance sources differed with scale length, and only GT allowed clear description of these internal consequences, allowing more effective identification of advantages and disadvantages of shorter and longer scales. Most importantly, the findings highlight the potential error proneness of focusing solely on reliability and scale length in test construction. Practical as well as theoretical consequences are discussed.


2019 ◽  
Vol 22 (1b) ◽  
pp. 2156759X1983444
Author(s):  
Timothy A. Poynton ◽  
Bernalyn Ruiz ◽  
Richard T. Lapan

This article describes the process used to develop a new measure of college knowledge and the findings from an initial study to examine the measure’s construct validity. We employed common test construction (item response theory, classical test theory) and factor analytic (exploratory factor analysis, confirmatory factor analysis) approaches to analyzing the data from 519 graduating high school seniors; these analyses support the construct validity of the measure. We discuss the importance of college knowledge in the college decision-making process and implications for school counseling practice and research.


2016 ◽  
Vol 12 (28) ◽  
pp. 263 ◽  
Author(s):  
Awopeju, O. A. ◽  
Afolabi, E. R. I.

The study compared Classical Test Theory (CTT) and Item Response Theory (IRT)-estimated item difficulty and item discrimination indices in relation to the ability of examinees in Senior School Certificate Examination (SSCE) in Mathematics with a view to providing empirical basis for informed decisions on the appropriateness of statistical and psychometric tests. The study adopted ex-post-facto design. A sample of 6,000 students was selected from the population of 35,262 students who sat for the NECO SSCE Mathematics Paper 1 in 2008 in Osun State, Nigeria. An instrument consisting of 60-multiple-choice items, May/June 2008 NECO SSCE Mathematics Paper 1 was used. Three sampling plans: random, gender and ability sampling plans were employed to study the behaviours of the examinees scores under the CTT and IRT measurement frameworks. BILOG-MG 3 was used to estimate the indices of item parameters and SPSS 20 was used to compare CTT- and IRT-based item parameters. The results showed that CTT-based item difficulty estimates and oneparameter IRT item difficulty estimates were comparable (the correlations were generally in the -0.702 to -0.988 range in large sample and -0.622 to - 0.989 range in small sample). Results also indicated that CTT-based and two-parameter IRT-based item discrimination estimates were comparable (the correlations were in the 0.430 to 0.880 ranges in large sample and 0.531 to 0.950 range in small sample). The study concluded that CTT and IRT were comparable in estimating item characteristics of statistical and psychometric tests and thus could be used as complementary procedures in the development of national examinations


1994 ◽  
Vol 19 (1) ◽  
pp. 73-90 ◽  
Author(s):  
Ronald D. Armstrong ◽  
Douglas H. Jones ◽  
Zhaobo Wang

A network-flow model is formulated for constructing parallel tests based on classical test theory using test reliability for the criterion. The model enables practitioners to specify a test difficulty distribution for the values of the item difficulties as well as test composition requirements. Use of the network-flow algorithm ensures high computational efficiency, allowing wide applications of optimal test construction using microcomputers. The results of an empirical study show that the generated tests have acceptably high test reliability.


2014 ◽  
Vol 35 (4) ◽  
pp. 201-211 ◽  
Author(s):  
André Beauducel ◽  
Anja Leue

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.


Sign in / Sign up

Export Citation Format

Share Document