Reframing rankings in educational assessments

Science ◽  
2021 ◽  
Vol 372 (6540) ◽  
pp. 338-340
Author(s):  
Steffi Pohl ◽  
Esther Ulitzsch ◽  
Matthias von Davier
Methodology ◽  
2007 ◽  
Vol 3 (4) ◽  
pp. 149-159 ◽  
Author(s):  
Oliver Lüdtke ◽  
Alexander Robitzsch ◽  
Ulrich Trautwein ◽  
Frauke Kreuter ◽  
Jan Marten Ihme

Abstract. In large-scale educational assessments such as the Third International Mathematics and Sciences Study (TIMSS) or the Program for International Student Assessment (PISA), sizeable numbers of test administrators (TAs) are needed to conduct the assessment sessions in the participating schools. TA training sessions are run and administration manuals are compiled with the aim of ensuring standardized, comparable, assessment situations in all student groups. To date, however, there has been no empirical investigation of the effectiveness of these standardizing efforts. In the present article, we probe for systematic TA effects on mathematics achievement and sample attrition in a student achievement study. Multilevel analyses for cross-classified data using Markov Chain Monte Carlo (MCMC) procedures were performed to separate the variance that can be attributed to differences between schools from the variance associated with TAs. After controlling for school effects, only a very small, nonsignificant proportion of the variance in mathematics scores and response behavior was attributable to the TAs (< 1%). We discuss practical implications of these findings for the deployment of TAs in educational assessments.


2020 ◽  
Author(s):  
Jennifer Randall ◽  
Joseph Rios ◽  
Hyun Joo Jung

For nearly three decades, researchers have been concerned that the educational measurement field is not producing enough graduate-level specialists to meet the growing demand driven by the increased use of educational assessments in the U.S. This study examined the supply-side aspect of the proposed labor shortage by relying on data from the National Science Foundation’s Survey of Earned Doctorates collected between 1997 and 2016. Over the 20 years examined, measurement programs produced 3,124 doctoral graduates, and across this time span, the annual production of graduates nearly doubled. This supply expansion can largely be attributed to the increase in the number of international graduates, which outpaced the annual growth rate of domestic PhD recipients by 156%. Moreover, 85% of graduates were found to either self-identify as White or Asian. Less than 10 Hispanic and no more than 20 Black graduates were produced in any of the years examined. Of the 76% of graduates that reported having a job offer or accepted a position upon graduation, most entered the academy despite the overall average starting salary ($59,484) being considerably lower than the starting salary for their counterparts entering industry ($84,918), government ($69,970), or other educational institutions ($81,428).


2018 ◽  
Vol 104 (4) ◽  
pp. 348-353 ◽  
Author(s):  
David Odd ◽  
David Evans ◽  
Alan M Emond

ObjectiveTo identify if the educational trajectories of preterm infants differ from those of their term peers.DesignThis work is based on the Avon Longitudinal Study of Parents and Children (ALSPAC). Educational measures were categorised into 10 deciles to allow comparison of measures across time periods. Gestational age was categorised as preterm (23–36 weeks) or term (37–42 weeks). Multilevel mixed-effects linear regression models were derived to examine the trajectories of decile scores across the study period. Gestational group was added as an interaction term to assess if the trajectory between educational measures varied between preterm and term infants. Adjustment for possible confounders was performed.SubjectsThe final dataset contained information on 12 586 infants born alive at between 23 weeks and 42 weeks of gestation.Main outcome measuresUK mandatory educational assessments (SATs) scores throughout educational journal (including final GCSE results at 16 years of age).ResultsPreterm infants had on average lower Key Stage (KS) scores than term children (−0.46 (−0.84 to −0.07)). However, on average, they gained on their term peers in each progressive measure (0.10 (0.01 to 0.19)), suggesting ‘catch up’ during the first few years at school. Preterm infants appeared to exhibit the increase in decile scores mostly between KS1 and KS2 (p=0.005) and little between KS2 and KS3 (p=0.182) or KS3 and KS4 (p=0.149).ConclusionsThis work further emphasises the importance of early schooling and environment in these infants and suggests that support, long after the premature birth, may have additional benefits.


2011 ◽  
Vol 57 (4) ◽  
pp. 2212-2223 ◽  
Author(s):  
Tzone-I. Wang ◽  
Chien-Yuan Su ◽  
Tung-Cheng Hsieh

2015 ◽  
Vol 8 (1) ◽  
pp. 49-58 ◽  
Author(s):  
Jeffrey Chi Hoe Mok ◽  
Anita Ann Lee Toh

Purpose – This paper aims to investigate the use of blind marking to increase the ability of criterion-referenced marking to discriminate students’ varied levels of knowledge and skill mastery in a business communication skills course. Design/methodology/approach – The business communication course in this study involved more than 10 teachers and 350 students each semester. Data were collected from four semesters of assignment grades to compare the distribution of grades in semesters that used blind marking and in the one that did not (the control group). The standard deviations of marks for each assignment were calculated and compared. Findings – Findings show that blind marking contributed to a wider spread of marks. The study concludes that blind marking, when implemented together with criterion-referenced marking rubrics, can improve the ability of qualitative assessments to discriminate student achievement levels. Originality/value – Research in the use of criterion-referenced marking rubrics has revealed that assessing with marking rubrics resulted in a wider range of marks awarded because assessors felt that the rubrics helped them make more objective judgments of students’ work (Kuisma, 1999). By this token, it could be argued that because blind marking allows more objective judgment of students’ work (by reducing rater bias), it seems to reason that marks might be awarded on a wider range of the marking scale. However, current literature on blind marking and grade/mark dispersion has yet to reveal a study on whether blind marking is able to increase the spread of marks, and therefore, indicate that an assessment instrument is effective is discriminating a range of student achievement levels. This paper should add to the current research on higher quality of educational assessments.


2006 ◽  
Vol 31 (1) ◽  
pp. 1-33 ◽  
Author(s):  
Sandip Sinharay

Bayesian networks are frequently used in educational assessments primarily for learning about students’ knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A number of aspects of model fit, those of usual interest to practitioners, are assessed using various diagnostic tools. This article suggests a direct data display for assessing overall fit, suggests several diagnostics for assessing item fit, suggests a graphical approach to examine if the model can explain the association among the items, and suggests a version of the Mantel–Haenszel statistic for assessing differential item functioning. Limited simulation studies and a real data application demonstrate the effectiveness of the suggested model diagnostics.


Sign in / Sign up

Export Citation Format

Share Document