scholarly journals Robust and Nonrobust Linking of Two Groups for the Rasch Model with Balanced and Unbalanced Random DIF: A Comparative Simulation Study and the Simultaneous Assessment of Standard Errors and Linking Errors with Resampling Techniques

Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2198
Author(s):  
Alexander Robitzsch

In this article, the Rasch model is used for assessing a mean difference between two groups for a test of dichotomous items. It is assumed that random differential item functioning (DIF) exists that can bias group differences. The case of balanced DIF is distinguished from the case of unbalanced DIF. In balanced DIF, DIF effects on average cancel out. In contrast, in unbalanced DIF, the expected value of DIF effects can differ from zero and on average favor a particular group. Robust linking methods (e.g., invariance alignment) aim at determining group mean differences that are robust to the presence of DIF. In contrast, group differences obtained from nonrobust linking methods (e.g., Haebara linking) can be affected by the presence of a few DIF effects. Alternative robust and nonrobust linking methods are compared in a simulation study under various simulation conditions. It turned out that robust linking methods are preferred over nonrobust alternatives in the case of unbalanced DIF effects. Moreover, the theory of M-estimation, as an important approach to robust statistical estimation suitable for data with asymmetric errors, is used to study the asymptotic behavior of linking estimators if the number of items tends to infinity. These results give insights into the asymptotic bias and the estimation of linking errors that represent the variability in estimates due to selecting items in a test. Moreover, M-estimation is also used in an analytical treatment to assess standard errors and linking errors simultaneously. Finally, double jackknife and double half sampling methods are introduced and evaluated in a simulation study to assess standard errors and linking errors simultaneously. Half sampling outperformed jackknife estimators for the assessment of variability of estimates from robust linking methods.

2021 ◽  
Author(s):  
Alexander Robitzsch

In this article, the Rasch model is used for assessing a mean difference between two groups for a test of dichotomous items. It is assumed that random differential item functioning (DIF) exists that has the potential to bias group differences. The case of balanced DIF is distinguished from the case of unbalanced DIF. In balanced DIF, DIF effects cancel out on average. In contrast, in unbalanced DIF, the expected value of DIF effects can differ from zero and favors a particular group on average. Robust linking methods (e.g., invariance alignment) aim at determining group mean differences that are robust to the presence of DIF. In contrast, group differences obtained from nonrobust linking methods (e.g., Haebara linking) can be affected by the presence of a few DIF effects. Alternative robust and nonrobust linking methods are compared in a simulation study under various simulation conditions. It turned out that robust linking methods are preferred over nonrobust alternatives in the case of unbalanced DIF effects. Moreover, M-estimation theory is used for studying the asymptotic behavior of linking estimators if the number of items tends to infinity. These results give insights into asymptotic bias and the estimation of linking errors that represent the variability in estimates due to selecting items in a test. Moreover, M-estimation theory is also used in an analytical treatment to assess standard errors and linking errors simultaneously. Finally, double jackknife and double half sampling methods are introduced and evaluated in a simulation study to assess standard errors and linking errors simultaneously. Half sampling outperformed jackknife estimators for the assessment of variability of estimates from robust linking methods.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 814-836
Author(s):  
Alexander Robitzsch

The Rasch model is one of the most prominent item response models. In this article, different item parameter estimation methods for the Rasch model are systematically compared through a comprehensive simulation study: Different alternatives of joint maximum likelihood (JML) estimation, different alternatives of marginal maximum likelihood (MML) estimation, conditional maximum likelihood (CML) estimation, and several limited information methods (LIM). The type of ability distribution (i.e., nonnormality), the number of items, sample size, and the distribution of item difficulties were systematically varied. Across different simulation conditions, MML methods with flexible distributional specifications can be at least as efficient as CML. Moreover, in many situations (i.e., for long tests), penalized JML and JML with ε adjustment resulted in very efficient estimates and might be considered alternatives to JML implementations currently used in statistical software. Moreover, minimum chi-square (MINCHI) estimation was the best-performing LIM method. These findings demonstrate that JML estimation and LIM can still prove helpful in applied research.


Author(s):  
Mikail Ibrahim ◽  
Osama Omar M. Elazzabi

The present study addressed the issue of reliability and validity of scale through the comparison between the traditional method and the Rasch model, which is seen by statisticians as the best method to psychometrically validate the scale and test its properties. Researchers have demonstrated the failure of traditional statistical methods to take into account the characteristics of the items and people when testing the reliability of the scale as well as respondents. They usually turn to alpha Cronbanch to examine the internal consistency of the scale without taking into consideration that alpha Cronbanch has been affected by external factors such as the length of the scale. However, the Rasch model that is relatively considered to be modern statistics, is not affected by external factors especially the length of the scale, even a short scale might be more reliable than the long one. Moreover, the Rasch model can also be used to investigate various types of validities such as content validity, construct validity, and criterion validity. Interestingly, the Rasch model is also a powerful statistics tool used to determine the contribution of the items and people in the total reliability of the scale. For example, the variable map is used to examine the extent to which the items adequately target the respondents taking into account the difficulty of items and ability of the subjects. Nevertheless, the traditional methods normally calculate the uniqueness of the scale by focusing on the sum of squares and these methods do not offer standard errors for each item to determine the accuracy of the measurement. Hence, the researchers suggested in this paper that favorability of the Rasch model in testing scales’ reliability and validity was recommended compared to conventional statistical methods.


2011 ◽  
Author(s):  
Klaus Kubinger ◽  
D. Rasch ◽  
T. Yanagida

2021 ◽  
Author(s):  
Bryant A Seamon ◽  
Steven A Kautz ◽  
Craig A Velozo

Abstract Objective Administrative burden often prevents clinical assessment of balance confidence in people with stroke. A computerized adaptive test (CAT) version of the Activities-specific Balance Confidence Scale (ABC CAT) can dramatically reduce this burden. The objective of this study was to test balance confidence measurement precision and efficiency in people with stroke with an ABC CAT. Methods We conducted a retrospective cross-sectional simulation study with data from 406 adults approximately 2-months post-stroke in the Locomotor-Experience Applied Post-Stroke (LEAPS) trial. Item parameters for CAT calibration were estimated with the Rasch model using a random sample of participants (n = 203). Computer simulation was used with response data from remaining 203 participants to evaluate the ABC CAT algorithm under varying stopping criteria. We compared estimated levels of balance confidence from each simulation to actual levels predicted from the Rasch model (Pearson correlations and mean standard error (SE)). Results Results from simulations with number of items as a stopping criterion strongly correlated with actual ABC scores (full item, r = 1, 12-item, r = 0.994; 8-item, r = 0.98; 4-item, r = 0.929). Mean SE increased with decreasing number of items administered (full item, SE = 0.31; 12-item, SE = 0.33; 8-item, SE = 0.38; 4-item, SE = 0.49). A precision-based stopping rule (mean SE = 0.5) also strongly correlated with actual ABC scores (r = .941) and optimized the relationship between number of items administrated with precision (mean number of items 4.37, range [4–9]). Conclusions An ABC CAT can determine accurate and precise measures of balance confidence in people with stroke with as few as 4 items. Individuals with lower balance confidence may require a greater number of items (up to 9) and attributed to the LEAPS trial excluding more functionally impaired persons. Impact Statement Computerized adaptive testing can drastically reduce the ABC’s test administration time while maintaining accuracy and precision. This should greatly enhance clinical utility, facilitating adoption of clinical practice guidelines in stroke rehabilitation. Lay Summary If you have had a stroke, your physical therapist will likely test your balance confidence. A computerized adaptive test version of the ABC scale can accurately identify balance with as few as 4 questions, which takes much less time.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 727
Author(s):  
Moustafa M. Nasralla ◽  
Basiem Al-Shattarat ◽  
Dhafer J. Almakhles ◽  
Abdelhakim Abdelhadi ◽  
Eman S. Abowardah

The literature on engineering education research highlights the relevance of evaluating course learning outcomes (CLOs). However, generic and reliable mechanisms for evaluating CLOs remain challenges. The purpose of this project was to accurately assess the efficacy of the learning and teaching techniques through analysing the CLOs’ performance by using an advanced analytical model (i.e., the Rasch model) in the context of engineering and business education. This model produced an association pattern between the students and the overall achieved CLO performance. The sample in this project comprised students who are enrolled in some nominated engineering and business courses over one academic year at Prince Sultan University, Saudi Arabia. This sample considered several types of assessment, such as direct assessments (e.g., quizzes, assignments, projects, and examination) and indirect assessments (e.g., surveys). The current research illustrates that the Rasch model for measurement can categorise grades according to course expectations and standards in a more accurate manner, thus differentiating students by their extent of educational knowledge. The results from this project will guide the educator to track and monitor the CLOs’ performance, which is identified in every course to estimate the students’ knowledge, skills, and competence levels, which will be collected from the predefined sample by the end of each semester. The Rasch measurement model’s proposed approach can adequately assess the learning outcomes.


2021 ◽  
Vol 11 (5) ◽  
pp. 201
Author(s):  
Clelia Cascella ◽  
Chiara Giberti ◽  
Giorgio Bolondi

This study is aimed at exploring how different formulations of the same mathematical item may influence students’ answers, and whether or not boys and girls are equally affected by differences in presentation. An experimental design was employed: the same stem-items (i.e., items with the same mathematical content and question intent) were formulated differently and administered to a probability sample of 1647 students (grade 8). All the achievement tests were anchored via a set of common items. Students’ answers, equated and then analysed using the Rasch model, confirmed that different formulations affect students’ performances and thus the psychometric functionality of items, with discernible differences according to gender. In particular, we explored students’ sensitivity to the effect of a typical misconception about multiplication with decimal numbers (often called “multiplication makes bigger”) and tested the hypothesis that girls are more prone than boys to be negatively affected by misconception.


Sign in / Sign up

Export Citation Format

Share Document