Classical Test Theory and Music Testing

Author(s):  
James Austin

Classical testing theory, including its origins within psychological measurement, the fundamental principles of true scores and measurement error, psychometrics, and statistical assumptions are the focus of this chapter. Random and systematic forms of measurement error are addressed, and the standard error of measurement is defined. Major approaches to defining and estimating test reliability and validity are reviewed, and practical applications of classical test theory to K-12 music education assessment are considered, including large-scale standardized testing as well as measurement levels, item analysis, and techniques for enhancing the reliability and validity of classroom-level assessments. Finally, the transition from classical test theory to modern test theory is explored.

2020 ◽  
Vol 64 (3) ◽  
pp. 219-237
Author(s):  
Brandon LeBeau ◽  
Susan G. Assouline ◽  
Duhita Mahatmya ◽  
Ann Lupkowski-Shoplik

This study investigated the application of item response theory (IRT) to expand the range of ability estimates for gifted (hereinafter referred to as high-achieving) students’ performance on an above-level test. Using a sample of fourth- to sixth-grade high-achieving students ( N = 1,893), we conducted a study to compare estimates from two measurement theories, classical test theory (CTT) and IRT. CTT and IRT make different assumptions about the analysis that impact the reliability and validity of the scores obtained from the test. IRT can also differentiate students based on the student’s grade or within a grade by using the unique string of correct and incorrect answers the student makes while taking the test. This differentiation may have implications for identifying or classifying students who are ready for advanced coursework. An exploration of the differentiation for Math, Reading, and Science tests and the impact the different measurement frameworks can have on classification of students are explored. Implications for academic talent identification with the talent search model and development of academic talent are discussed.


2019 ◽  
Vol 33 (2) ◽  
pp. 151-163 ◽  
Author(s):  
Igor Himelfarb

Objective:This article presents health science educators and researchers with an overview of standardized testing in educational measurement. The history, theoretical frameworks of classical test theory, item response theory (IRT), and the most common IRT models used in modern testing are presented.Methods:A narrative overview of the history, theoretical concepts, test theory, and IRT is provided to familiarize the reader with these concepts of modern testing. Examples of data analyses using different models are shown using 2 simulated data sets. One set consisted of a sample of 2000 item responses to 40 multiple-choice, dichotomously scored items. This set was used to fit 1-parameter logistic (PL) model, 2PL, and 3PL IRT models. Another data set was a sample of 1500 item responses to 10 polytomously scored items. The second data set was used to fit a graded response model.Results:Model-based item parameter estimates for 1PL, 2PL, 3PL, and graded response are presented, evaluated, and explained.Conclusion:This study provides health science educators and education researchers with an introduction to educational measurement. The history of standardized testing, the frameworks of classical test theory and IRT, and the logic of scaling and equating are presented. This introductory article will aid readers in understanding these concepts.


2019 ◽  
Author(s):  
Robert Foster

This paper develops a generalized framework which allows for the use of parametric classical test theory inference with non-normal models. Using the theory of natural exponential families and Bayesian theory of their conjugate priors, theoretical properties of test scores under the framework are derived, including a formula for parallel-test reliability in terms of the test length and a parameter of the underlying population distribution of abilities. This framework is shown to satisfy the general properties of classical test theory several common classical test theory results are shown to reduce to parallel-test reliability in this framework. An empirical Bayes method for estimating reliability, both with point estimates and with intervals, is described using maximum likelihood. This method is applied to an example set of data and compared to classical test theory estimators of reliability, and a simulation study is performed to show the coverage of the interval estimates of reliability derived from the framework.


2017 ◽  
Vol 2 (1) ◽  
pp. 34
Author(s):  
Rida Sarwiningsih

<p>This research aims to compare the internal consistency of reliability coefficient on classical test theory. Estimation accuracy of internal consistency reliability coefficient used several methods of the coefficient reliability formulation. The methods are Split-Half Method, Cronbach Alpha formula, and Kuder Richardson formula.  Determination of the test reliability coefficients used also some formula and then their results were compared with the results of their estimation accuracy. This research is a quantitative descriptive. Data were analyzed based on responses of national chemistry examination in Jambi province on academic year 2014/2015. The data of students answer sheets were taken using proportional stratified random sampling technique. There are 200 students’ responses from 162 schools (132 public schools and 30 private schools) in Jambi province. The form of data were dichotomy data and analyzed using Split-Half Method. Their reliabilities were analyzed using Cronbach Alpha formula and Kuder Richardson formula. Reliability criteria used consist of five conditions, they are 0.5; 0.6; 0.7; 0.8 and 0.9. The results of this research indicated that (a) the coefficient of reliability in classical test theory developed by measurement experts (using Split-Half Method, Cronbach Alpha formula and Kuder Richardson formula) have varying estimates of accuracy;  (b) average reliability coefficients have the precision estimation about of 0.78 up to 0.8; (c) the reliability coefficient using Spearman Brown formula was 0.78, with Rulon formula was 0.78, Flanagan formula was 0.77, Cronbach Alpha formula was 0.838, the KR20 formula was 0.838, and KR21 formula was 0.82<em>1.</em></p>


Methodology ◽  
2011 ◽  
Vol 7 (3) ◽  
pp. 103-110 ◽  
Author(s):  
José Muñiz ◽  
Fernando Menéndez

Current availability of computers has led to the use of a new series of response formats that are an alternative to the classical dichotomic format, and to the recovery of other formats, like the case of the answer-until-correct (AUC) format, whose efficient administration requires this kind of technology. The goal of the present study is to determine whether the use of the AUC format improves test reliability and validity in comparison to the classical dichotomic format. Three samples of 174, 431, and 1,446 Spanish students from secondary education, professional training, and high school, ages between 13 and 20 years, were used. A 100-item test and a 25-item test that assessed knowledge of Universal History were used, both tests administered by Internet with the AUC format. There were 56 experimental conditions, resulting from the manipulation of eight scoring models and seven test lengths. The data were analyzed from the perspective of the Classical Test Theory and also with Item Response Theory (IRT) models. Reliability and construct validity, analyzed from the classic perspective, did not seem to improve significantly when using the AUC format; however, when assessing reliability with the Information Function obtained by means of IRT models, the advantages of the AUC format versus the dichotomic format become clear. For low levels of the trait assessed, scores obtained with the AUC format provide more information than scores obtained with the dichotomic format. Lastly, these results are commented on, and the possibilities and limits of the AUC format in highly computerized psychological and educational contexts are analyzed.


Author(s):  
David L. Streiner

This chapter discusses the two major theories underlying scale development: classical test theory, which has dominated the field for the past century, and item response theory, which is more recent. It begins by summarizing the history of measurement, first of physical and physiological parameters and later of intelligence. This is followed by the steps involved in developing a scale: creating the items, determining if they fully span the construct of interest while at the same time not including irrelevant content, and assessing the usability of the items (whether they are understood correctly, whether they are free of jargon, if they avoid negatively worded phrases, etc.). The chapter then describes how to establish the reliability and validity of the scale—what are called the psychometric properties of the scale. It concludes by discussing some of the shortcomings with classical test theory, how item response theory attempts to address them, and the degree to which it has been successful in this regard. This chapter should be useful for those who need to evaluate existing scales as well as for those wanting to develop new scales.


Author(s):  
Zainab Albikawi ◽  
Mohammad Abuadas

Background: Providing care for schizophrenia patients is complex, and it requires dealing with various psychosocial burdens.Aim: To develop and validate a tool that measures the quality of life and self-stigma (SS) of the schizophrenia patient’s caregiver (QLSSoSPC).Setting: Outpatient psychiatric services clinics in Saudi Arabia.Methods: The current study used a methodological cross-sectional design. A sample of 205 schizophrenia patients’ caregivers was recruited by using a convenient sampling method. Classical Test Theory and Rasch Analysis approaches were used.Results: The developed tool has proven acceptable level of reliability and validity. The analysis confirmed seven-factor structure accounted for 74.4% of the total variance. Cronbach’s reliability statistics for the developed tool were satisfactory and ranged from 0.80 to 0.91.Conclusion: The psychometric properties of the QLSSoSPC tool supported its prospective use and allowing us to recommend the implementation of the tool on behalf of clinical and research purposes.


Sign in / Sign up

Export Citation Format

Share Document