Validating human and automated scoring of essays against “True” scores

2018 ◽  
Vol 31 (3) ◽  
pp. 241-250
Author(s):  
Yoav Cohen ◽  
Effi Levi ◽  
Anat Ben-Simon
1985 ◽  
Vol 10 (1) ◽  
pp. 1-17 ◽  
Author(s):  
David Jarjoura

Issues regarding tolerance and confidence intervals are discussed within the context of educational measurement and conceptual distinctions are drawn between these two types of intervals. Points are raised about the advantages of tolerance intervals when the focus is on a particular observed score rather than a particular examinee. Because tolerance intervals depend on strong true score models, a practical implication of the study is that true score tolerance intervals are fairly insensitive to differences in assumptions among the five models studied.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Salman Sohrabi ◽  
Danielle E. Mor ◽  
Rachel Kaletsky ◽  
William Keyes ◽  
Coleen T. Murphy

AbstractWe recently linked branched-chain amino acid transferase 1 (BCAT1) dysfunction with the movement disorder Parkinson’s disease (PD), and found that RNAi-mediated knockdown of neuronal bcat-1 in C. elegans causes abnormal spasm-like ‘curling’ behavior with age. Here we report the development of a machine learning-based workflow and its application to the discovery of potentially new therapeutics for PD. In addition to simplifying quantification and maintaining a low data overhead, our simple segment-train-quantify platform enables fully automated scoring of image stills upon training of a convolutional neural network. We have trained a highly reliable neural network for the detection and classification of worm postures in order to carry out high-throughput curling analysis without the need for user intervention or post-inspection. In a proof-of-concept screen of 50 FDA-approved drugs, enasidenib, ethosuximide, metformin, and nitisinone were identified as candidates for potential late-in-life intervention in PD. These findings point to the utility of our high-throughput platform for automated scoring of worm postures and in particular, the discovery of potential candidate treatments for PD.


2009 ◽  
Vol 178 (2) ◽  
pp. 323-326 ◽  
Author(s):  
Jon Pham ◽  
Sara M. Cabrera ◽  
Carles Sanchis-Segura ◽  
Marcelo A. Wood
Keyword(s):  

2002 ◽  
Vol 15 (4) ◽  
pp. 391-412 ◽  
Author(s):  
Yongwei Yang ◽  
Chad W. Buckendahl ◽  
Piotr J. Juszkiewicz ◽  
Dennison S. Bhola
Keyword(s):  

2010 ◽  
Vol 27 (3) ◽  
pp. 335-353 ◽  
Author(s):  
Sara Cushing Weigle

Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study approaches validity by comparing human and automated scores on responses to TOEFL® iBT Independent writing tasks with several non-test indicators of writing ability: student self-assessment, instructor assessment, and independent ratings of non-test writing samples. Automated scores were produced using e-rater ®, developed by Educational Testing Service (ETS). Correlations between both human and e-rater scores and non-test indicators were moderate but consistent, providing criterion-related validity evidence for the use of e-rater along with human scores. The implications of the findings for the validity of automated scores are discussed.


Sign in / Sign up

Export Citation Format

Share Document