Cross-Classified Random Effects Modeling for Moderated Item Calibration

2021 ◽  
pp. 107699862098390
Author(s):  
Seungwon Chung ◽  
Li Cai

In the research reported here, we propose a new method for scale alignment and test scoring in the context of supporting students with disabilities. In educational assessment, students from these special populations take modified tests because of a demonstrated disability that requires more assistance than standard testing accommodation. Updated federal education legislation and guidance require that these students be assessed and included in state education accountability systems, and their achievement reported with respect to the same rigorous content and achievement standards that the state adopted. Routine item calibration and linking methods are not feasible because the size of these special populations tends to be small. We develop a unified cross-classified random effects model that utilizes item response data from the general population as well as judge-provided data from subject matter experts in order to obtain revised item parameter estimates for use in scoring modified tests. We extend the Metropolis–Hastings Robbins–Monro algorithm to estimate the parameters of this model. The proposed method is applied to Braille test forms in a large operational multistate English language proficiency assessment program. Our work not only allows a broader range of modifications that is routinely considered in large-scale educational assessments but also directly incorporates the input from subject matter experts who work directly with the students needing support. Their structured and informed feedback deserves more attention from the psychometric community.

2020 ◽  
Vol 50 (175) ◽  
pp. 136-160
Author(s):  
Eric Passone ◽  
Karlane Holanda Araújo

Abstract This article deals with the paradox of school inclusion in the basic education evaluation policy of the state of Ceará, a federative unit that stands out among the state education systems of the country in terms of proficiency indicators in basic education, although there is within its system a normative device that deduces from the evaluation calculation the performance of students with disabilities, generating a state of “internal exclusion” to the school system. Based on the debate about evaluation policies as a mechanism of educational management in the national context and on the observation of studies that point to the exclusionary trend of large-scale evaluation in relation to inclusive education, a law provision is addressed that promotes the exclusion of special education from the results of the evaluations of the Permanent Evaluation System of Ceará Basic Education [Sistema Permanente de Avaliação da Educação Básica do Ceará] (Spaece).


2019 ◽  
Vol 45 (4) ◽  
pp. 383-402
Author(s):  
Paul A. Jewsbury ◽  
Peter W. van Rijn

In large-scale educational assessment data consistent with a simple-structure multidimensional item response theory (MIRT) model, where every item measures only one latent variable, separate unidimensional item response theory (UIRT) models for each latent variable are often calibrated for practical reasons. While this approach can be valid for data from a linear test, unacceptable item parameter estimates are obtained when data arise from a multistage test (MST). We explore this situation from a missing data perspective and show mathematically that MST data will be problematic for calibrating multiple UIRT models but not MIRT models. This occurs because some items that were used in the routing decision are excluded from the separate UIRT models, due to measuring a different latent variable. Both simulated and real data from the National Assessment of Educational Progress are used to further confirm and explore the unacceptable item parameter estimates. The theoretical and empirical results confirm that only MIRT models are valid for item calibration of multidimensional MST data.


2019 ◽  
Vol 44 (4) ◽  
pp. 311-326
Author(s):  
Christoph König ◽  
Christian Spoden ◽  
Andreas Frey

Accurate item calibration in models of item response theory (IRT) requires rather large samples. For instance, [Formula: see text] respondents are typically recommended for the two-parameter logistic (2PL) model. Hence, this model is considered a large-scale application, and its use in small-sample contexts is limited. Hierarchical Bayesian approaches are frequently proposed to reduce the sample size requirements of the 2PL. This study compared the small-sample performance of an optimized Bayesian hierarchical 2PL (H2PL) model to its standard inverse Wishart specification, its nonhierarchical counterpart, and both unweighted and weighted least squares estimators (ULSMV and WLSMV) in terms of sampling efficiency and accuracy of estimation of the item parameters and their variance components. To alleviate shortcomings of hierarchical models, the optimized H2PL (a) was reparametrized to simplify the sampling process, (b) a strategy was used to separate item parameter covariances and their variance components, and (c) the variance components were given Cauchy and exponential hyperprior distributions. Results show that when combining these elements in the optimized H2PL, accurate item parameter estimates and trait scores are obtained even in sample sizes as small as [Formula: see text]. This indicates that the 2PL can also be applied to smaller sample sizes encountered in practice. The results of this study are discussed in the context of a recently proposed multiple imputation method to account for item calibration error in trait estimation.


Author(s):  
Judith A. Russo-Converso ◽  
Ronald D. Offutt

The evolution of complex and distributed commerce requires the implementation of training design and development models that capture and mold the expertise of subject matter experts (SMEs). A SME is defined as “that individual who exhibits the highest level of expertise in performing a specialized job, task, or skill within the organization”. SMEs possess in-depth knowledge of the subject you are attempting to document (http://www.isixsigma.com/dictionary/ Subject_Matter_Expert_-_SME-396.htm). This chapter describes a unique issue, and potential risk, along with a solution to work with a large number of geographically dispersed SMEs (separated from one another due to their respective locations), whose efforts are standardized and synchronized. This solution is based on a collaboration model implemented and led by an integration team whose role and responsibility is to allow the SMEs to achieve consensus, efficiency, and standard of quality in both products and processes.


2011 ◽  
pp. 317-329
Author(s):  
Judith A. Russo-Converso ◽  
Ronald D. Offutt

The evolution of complex and distributed commerce requires the implementation of training design and development models that capture and mold the expertise of subject matter experts (SMEs). A SME is defined as “that individual who exhibits the highest level of expertise in performing a specialized job, task, or skill within the organization”. SMEs possess in-depth knowledge of the subject you are attempting to document (http://www.isixsigma.com/dictionary/ Subject_Matter_Expert_-_SME-396.htm). This chapter describes a unique issue, and potential risk, along with a solution to work with a large number of geographically dispersed SMEs (separated from one another due to their respective locations), whose efforts are standardized and synchronized. This solution is based on a collaboration model implemented and led by an integration team whose role and responsibility is to allow the SMEs to achieve consensus, efficiency, and standard of quality in both products and processes.


Author(s):  
Barbara Kuenzle Haake ◽  
Yan Xiao ◽  
Colin Mackenzie ◽  
F. Jacob Seagull ◽  
Thomas Grissom ◽  
...  

Teamwork training is critical for patient safety and has been advocated for widespread application in many settings. A key challenge for evaluating teamwork training is measurement. Despite much effort, the team performance instruments reported thus far suffer from a variety shortcomings that prevent their wide application in assessing teams in real settings. Based on review of video recorded trauma team activities in real patient care, a multi-disciplinary research team developed an instrument based on observable behaviors (UMTOP). A set of video clips were reviewed by 6 subject matter experts who were requested to provide “descriptors” about the observed team activities. The 167 collated descriptors were combined to a reduced list, which was then sent to the subject matter experts for revision. The revised list was then categorized into 5 areas of team performance (task and clinical performance, leadership organization, teamwork organization, social environment, sterile precaution). UMTOP was developed to be a tradeoff among four criteria: ease of use, reliability, usefulness for team performance feedback, and speed of scoring. An initial assessment of reliability was conducted with surgeon and nursing reviewers.


2002 ◽  
Vol 16 (1) ◽  
pp. 6-8 ◽  
Author(s):  
Sebastian Ciancio

Powered toothbrushes were first introduced on a large scale in the early 1960s. However, because of a clear lack of superiority compared with manual brushes, and problems with mechanical breakdowns, their sales decreased significantly. However, recommendation for their use continued in special populations with dexterity and cognition problems. The 1990s ushered in an era of new technology, and studies began to suggest superiority of some powered brushes, particularly those using oscillating-rotating or counter-rotational actions. Some studies have shown interproximal cleansing abilities superior to those of manual brushes and yielding results similar to those achieved with the use of a manual brush and floss. Both controlled and open-labeled studies have suggested that electric brushes improve gingival health with patients who routinely used manual brushes prior to using these new powered brushes, and safety has been clearly established. In recommending powered toothbrushes, practitioners should familiarize themselves with the products available, with the clinical studies supporting their benefits compared with manual brushes, their safety and ease of use, and the patient's economic status.


1998 ◽  
Vol 64 (4) ◽  
pp. 439-450 ◽  
Author(s):  
Gerald Tindal ◽  
Bill Heath ◽  
Keith Hollenbeck ◽  
Patricia Almond ◽  
Mark Harniss

In this study, fourth-grade special and general education students took a large-scale state-wide test using standard test administration procedures and two major accommodations addressing response conditions and test administration. On both reading and math tests, students bubbled in answers on a separate sheet (the standard condition) for half the test and marked the test booklet directly (the accommodated condition) for the other half of the test. For a subgroup of students, the math test was read to them by a trained teacher. Although no differences were found in the response conditions, an interaction was found in the test administration conditions (orally reading the test), supporting this accommodation for students with disabilities.


Sign in / Sign up

Export Citation Format

Share Document