Teacher’s Corner: Toward a Coherent View of Reliability in Test Theory

Reliability of test scores, as estimated through measures of internal consistency, has been characterized mathematically in many ways that appear, on the surface at least, to be very dissimilar to one another. In this essay we provide a general mathematical framework that specializes to four different reliability coefficients. Through consideration of this general framework it becomes easier to convey to students both the individual character of the different formulations of reliability and the extent of their underlying similarity. In addition to providing a coherent view of reliability, the unified formula is also found to be a convenient vehicle for introducing more specialized topics, such as the Kaiser-Guttman rule.

Download Full-text

Using Generalizability Theory and the ERP Reliability Analysis (ERA) Toolbox for Assessing Test-Retest Reliability of ERP Scores Part 1: Algorithms, Framework, and Implementation

10.31234/osf.io/kcven ◽

2020 ◽

Author(s):

Peter E Clayson ◽

Kaylie Amanda Carbine ◽

Scott Baldwin ◽

Joseph A. Olsen ◽

Michael J. Larson

Keyword(s):

Reliability Analysis ◽

Internal Consistency ◽

Temporal Stability ◽

Test Theory ◽

Theory Approach ◽

Retest Reliability ◽

Score Reliability ◽

Test Retest Reliability ◽

The Impact ◽

Reliability Coefficients

The reliability of event-related brain potential (ERP) scores depends on study context and how those scores will be used, and reliability must be routinely evaluated. Many factors can influence ERP score reliability, and generalizability (G) theory provides a multifaceted approach to estimating the internal consistency and temporal stability of scores that is well suited for ERPs. G-theory’s approach possesses a number of advantages over classical test theory that make it ideal for pinpointing sources of error in scores. The current primer outlines the G-theory approach to estimating internal consistency (coefficients of equivalence) and test-retest reliability (coefficients of stability). This approach is used to evaluate the reliability of ERP measurements. The primer outlines how to estimate reliability coefficients that consider the impact of the number of trials, events, occasion, and group. The uses of two different G-theory reliability coefficients (i.e., generalizability and dependability) in ERP research are elaborated, and a dataset from the companion manuscript, which examines N2 amplitudes to Go/NoGo stimuli, is used as an example of the application of these coefficients to ERPs. The developed algorithms are implemented in the ERP Reliability Analysis (ERA) Toolbox, which is open-source software designed for estimating score reliability using G theory. The toolbox facilitates the application of G theory in an effort to simplify the study-by-study evaluation of ERP score reliability. The formulas provided in this primer should enable researchers to pinpoint the sources of measurement error in ERP scores from multiple recording sessions and subsequently plan studies that optimize score reliability.

Download Full-text

The Comparison Accuracy Estimation of Test Reliability Coefficients for National Chemistry Examination in Jambi Province on Academic Year 2014/2015

JKPK (Jurnal Kimia dan Pendidikan Kimia) ◽

10.20961/jkpk.v2i1.8740 ◽

2017 ◽

Vol 2 (1) ◽

pp. 34

Author(s):

Rida Sarwiningsih

Keyword(s):

Internal Consistency ◽

Classical Test Theory ◽

Reliability Coefficient ◽

Test Theory ◽

Test Reliability ◽

Estimation Accuracy ◽

Cronbach Alpha ◽

Classical Test ◽

Academic Year ◽

Reliability Coefficients

<p>This research aims to compare the internal consistency of reliability coefficient on classical test theory. Estimation accuracy of internal consistency reliability coefficient used several methods of the coefficient reliability formulation. The methods are Split-Half Method, Cronbach Alpha formula, and Kuder Richardson formula. Determination of the test reliability coefficients used also some formula and then their results were compared with the results of their estimation accuracy. This research is a quantitative descriptive. Data were analyzed based on responses of national chemistry examination in Jambi province on academic year 2014/2015. The data of students answer sheets were taken using proportional stratified random sampling technique. There are 200 students’ responses from 162 schools (132 public schools and 30 private schools) in Jambi province. The form of data were dichotomy data and analyzed using Split-Half Method. Their reliabilities were analyzed using Cronbach Alpha formula and Kuder Richardson formula. Reliability criteria used consist of five conditions, they are 0.5; 0.6; 0.7; 0.8 and 0.9. The results of this research indicated that (a) the coefficient of reliability in classical test theory developed by measurement experts (using Split-Half Method, Cronbach Alpha formula and Kuder Richardson formula) have varying estimates of accuracy; (b) average reliability coefficients have the precision estimation about of 0.78 up to 0.8; (c) the reliability coefficient using Spearman Brown formula was 0.78, with Rulon formula was 0.78, Flanagan formula was 0.77, Cronbach Alpha formula was 0.838, the KR20 formula was 0.838, and KR21 formula was 0.82<em>1.</em></p>

Download Full-text

Implications of Person Fluctuation for the Stability and Validity of Test Scores

Methodology ◽

10.1027/1614-2241.2.4.142 ◽

2006 ◽

Vol 2 (4) ◽

pp. 142-148 ◽

Cited By ~ 1

Author(s):

Pere J. Ferrando

Keyword(s):

Item Response ◽

Test Scores ◽

Temporal Stability ◽

Person Fit ◽

Item Parameters ◽

Individual Trait ◽

The Stability ◽

The Individual ◽

Different Levels ◽

Fluctuation Model

In the IRT person-fluctuation model, the individual trait levels fluctuate within a single test administration whereas the items have fixed locations. This article studies the relations between the person and item parameters of this model and two central properties of item and test scores: temporal stability and external validity. For temporal stability, formulas are derived for predicting and interpreting item response changes in a test-retest situation on the basis of the individual fluctuations. As for validity, formulas are derived for obtaining disattenuated estimates and for predicting changes in validity in groups with different levels of fluctuation. These latter formulas are related to previous research in the person-fit domain. The results obtained and the relations discussed are illustrated with an empirical example.

Download Full-text

Development and destruction of the liberal prison system in Spain: a general framework for studying the topic

História Unicap ◽

10.25247/hu.2018.v5n10.p424-439 ◽

2019 ◽

Vol 5 (10) ◽

pp. 424

Author(s):

Luis Gargallo Vaamonde

Keyword(s):

Civil War ◽

20Th Century ◽

General Framework ◽

Belief System ◽

Prison System ◽

Early Years ◽

Second Republic ◽

Social Harmony ◽

The Government ◽

The Individual

During the Restoration and the Second Republic, up until the outbreak of the Civil War, the prison system that was developed in Spain had a markedly liberal character. This system had begun to acquire robustness and institutional credibility from the first dec- ade of the 20th Century onwards, reaching a peak in the early years of the government of the Second Republic. This process resulted in the establishment of a penitentiary sys- tem based on the widespread and predominant values of liberalism. That liberal belief system espoused the defence of social harmony, property and the individual, and penal practices were constructed on the basis of those principles. Subsequently, the Civil War and the accompanying militarist culture altered the prison system, transforming it into an instrument at the service of the conflict, thereby wiping out the liberal agenda that had been nurtured since the mid-19th Century.

Download Full-text

Item Response Theory and Music Testing

The Oxford Handbook of Assessment Policy and Practice in Music Education, Volume 1 ◽

10.1093/oxfordhb/9780190248093.013.22 ◽

2019 ◽

pp. 477-503

Author(s):

Brian Wesolowski

Keyword(s):

Item Response Theory ◽

Item Response ◽

Test Scores ◽

General Framework ◽

Logistic Function ◽

Response Theory ◽

Measurement Models ◽

Latent Constructs ◽

Item Parameters ◽

Introductory Overview

This chapter presents an introductory overview of concepts that underscore the general framework of item response theory. “Item response theory” is a broad umbrella term used to describe a family of mathematical measurement models that consider observed test scores to be a function of latent, unobservable constructs. Most musical constructs cannot be directly measured and are therefore unobservable. Musical constructs can therefore only be inferred based on secondary, observable behaviors. Item response theory uses observable behaviors as probabilistic distributions of responses as a logistic function of person and item parameters in order to define latent constructs. This chapter describes philosophical, theoretical, and applied perspectives of item response theory in the context of measuring musical behaviors.

Download Full-text

Material Geometry of Binary Composites

Symmetry ◽

10.3390/sym13050892 ◽

2021 ◽

Vol 13 (5) ◽

pp. 892

Author(s):

Marcelo Epstein

Keyword(s):

Continuum Mechanics ◽

Mathematical Framework ◽

Double Groupoid ◽

The Individual ◽

Elastic Composites

The constitutive characterization of the uniformity and homogeneity of binary elastic composites is presented in terms of a combination of the material groupoids of the individual constituents. The incorporation of these two groupoids within a single double groupoid is proposed as a viable mathematical framework for a unified formulation of this and similar kinds of problems in continuum mechanics.

Download Full-text

The comparison of the scores obtained by Bayesian nonparametric model and classical test theory methods

Science Progress ◽

10.1177/00368504211028371 ◽

2021 ◽

Vol 104 (3) ◽

pp. 003685042110283

Author(s):

Meltem Yurtcu ◽

Hülya Kelecioglu ◽

Edward L Boone

Keyword(s):

Classical Test Theory ◽

Small Sample ◽

Test Theory ◽

Nonparametric Model ◽

Bayesian Nonparametric ◽

Test Equating ◽

Classical Test ◽

A Value ◽

Item Functioning ◽

The Individual

Bayesian Nonparametric (BNP) modelling can be used to obtain more detailed information in test equating studies and to increase the accuracy of equating by accounting for covariates. In this study, two covariates are included in the equating under the Bayes nonparametric model, one is continuous, and the other is discrete. Scores equated with this model were obtained for a single group design for a small group in the study. The equated scores obtained with the model were compared with the mean and linear equating methods in the Classical Test Theory. Considering the equated scores obtained from three different methods, it was found that the equated scores obtained with the BNP model produced a distribution closer to the target test. Even the classical methods will give a good result with the smallest error when using a small sample, making equating studies valuable. The inclusion of the covariates in the model in the classical test equating process is based on some assumptions and cannot be achieved especially using small groups. The BNP model will be more beneficial than using frequentist methods, regardless of this limitation. Information about booklets and variables can be obtained from the distributors and equated scores that obtained with the BNP model. In this case, it makes it possible to compare sub-categories. This can be expressed as indicating the presence of differential item functioning (DIF). Therefore, the BNP model can be used actively in test equating studies, and it provides an opportunity to examine the characteristics of the individual participants at the same time. Thus, it allows test equating even in a small sample and offers the opportunity to reach a value closer to the scores in the target test.

Download Full-text

Multimodal Transport in the Context of Sustainable Development of a City

Sustainability ◽

10.3390/su13042239 ◽

2021 ◽

Vol 13 (4) ◽

pp. 2239

Author(s):

Marzena Kramarz ◽

Edyta Przybylska

Keyword(s):

Sustainable Development ◽

Freight Transport ◽

Key Factors ◽

The Sustainable Development ◽

Individual Character ◽

Third Stage ◽

Multimodal Transport ◽

Research Questions ◽

The Individual ◽

The Impact

Multimodal freight transport in cities is a complex, valid, and vitally important problem. It is more seldom underlined in scientific studies and included in cities’ strategies that devote more attention to passenger transport than freight transport. The increased utilization of multimodal transport matches current transport policy and at the same time, it is one of the most important challenges put before cities striving to achieve sustainable development. In this case, the paper embarks upon the problem of relations between multimodal transport development and the sustainable development of the cities. The objective of the paper is an analysis of the impact of the selected city of the Upper Silesian metropolis on the development of multimodal freight transport and an assessment of the impact of the development of multimodal transport on the sustainable development of the cities of the Upper Silesian metropolis. The authors developed three research questions in order to implement the adopted objective. The process of looking for the answer included four stages. Within the first and second stages, the literature studies and experts’ research allowed for identifying key factors of the multimodal transport development that a city may have an impact on. In the third stage, the research was two-fold and was based on a questionnaire and scenario analysis. Due to the individual character of each of the cities, scenarios were developed for Katowice, being the main economic center of Upper Silesian and Zagłębie Metropolis. As a result of the research, factors have been identified that must be included in a strategy of a city that strives for sustainable development. The last stage of the research focused on the initial concept of the multimodal transport development impact assessment on sustainable development of the cities. Conclusions developed at individual stages allowed for answering the research questions.

Download Full-text

Bringing the Automation-Related Complacency Scale into the 21st Century

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641292 ◽

2020 ◽

Vol 64 (1) ◽

pp. 1228-1232

Keyword(s):

Internal Consistency ◽

21St Century ◽

Rating Scale ◽

System Failure ◽

Individual Values ◽

Cronbach's Alpha ◽

Important Measure ◽

Frequency Of Use ◽

Performance Error ◽

The Individual

Complacency potential is an important measure to avoid performance error, such as neglecting to detect a system failure. This study updates and expands upon Singh, Molloy, and Parasuraman’s 1993 Complacency-Potential Rating Scale (CPRS). We updated and expanded the CPRS questions to include technology commonly used today and how frequently the technology is used. The goal of our study was to update the scale, analyze for factor shifts and internal consistency, and to explore correlations between the individual values for each factor and the frequency of use questions. We hypothesized that the factors would not shift from the original and the revised CPRS’s four subscales. Our research found that the revised CPRS consisted of only three subscales with the following Cronbach’s Alpha values: Confidence: 0.599, Safety/Reliability: 0.534, and Trust: 0.201. Correlations between the subscales and the revised complacency-potential and the frequency of use questions are also discussed.

Download Full-text

The Individual Recovery Outcomes Counter: preliminary validation of a personal recovery measure

The Psychiatrist ◽

10.1192/pb.bp.112.041889 ◽

2013 ◽

Vol 37 (7) ◽

pp. 221-227 ◽

Cited By ~ 9

Author(s):

Bridey Monger ◽

Scott M. Hardie ◽

Robin Ion ◽

Jane Cumming ◽

Nigel Henderson

Keyword(s):

Internal Consistency ◽

Recovery Factor ◽

Clinical Settings ◽

Service Users ◽

Validity And Reliability ◽

Personal Recovery ◽

Preliminary Validation ◽

Service Outcomes ◽

High Internal Consistency ◽

The Individual

Aims and methodThe Individual Recovery Outcomes Counter (I.ROC) is to date the only recovery outcomes instrument developed in Scotland. This paper describes the steps taken to initially assess its validity and reliability, including factorial analysis, internal consistency and a correlation benchmarking analysis.ResultsThe I.ROC tool showed high internal consistency. Exploratory factor analysis indicated a two-factor structure comprising intrapersonal recovery (factor 1) and interpersonal recovery (factor 2), explaining between them over 50% of the variance in I.ROC scores. There were no redundant items and all loaded on at least one of the factors. The I.ROC significantly correlated with widely used existing instruments assessing both personal recovery and clinical outcomes.Clinical implicationsI.ROC is a valid and reliable measure of recovery in mental health, preferred by service users when compared with well-established instruments. It could be used in clinical settings to map individual recovery, providing feedback for service users and helping to assess service outcomes.

Download Full-text