scholarly journals Validity of Automated Text Evaluation Tools for Written-Expression Curriculum-Based Measurement: A Comparison Study

2021 ◽  
Author(s):  
Milena A. Keller-Margulis ◽  
Sterett Mercer ◽  
MICHAEL MATTA

Existing approaches to measuring writing performance are insufficient in terms of both technical adequacy as well as feasibility for use as a screening measure. This study examined the validity and diagnostic accuracy of several approaches to automated essay scoring as well as written expression curriculum-based measurement (WE-CBM) to determine whether an automated approach improves technical adequacy. A sample of 140 fourth grade students generated writing samples that were then scored using traditional and automated approaches and examined with the statewide measure of writing performance. Results indicated that the validity and diagnostic accuracy for the best performing WE-CBM metric, correct minus incorrect word sequences (CIWS) and the automated approaches to scoring were comparable with automated approaches offering potentially improved feasibility for use in screening. Averaging scores across three time points was necessary, however, in order to achieve improved validity and adequate levels of diagnostic accuracy across the scoring approaches. Limitations, implications, and directions for future research regarding the use of automated scoring approaches for screening are discussed.

2018 ◽  
Vol 37 (5) ◽  
pp. 539-552 ◽  
Author(s):  
Milena A. Keller-Margulis ◽  
Sarah Ochs ◽  
Erin K. Reid ◽  
Erin L. Faith ◽  
G. Thomas Schanding

Many students struggle with the basic skill of writing, yet schools lack technically adequate screening measures to identify students at risk in this area. Measures that allow for valid screening decisions that identify students in need of interventions to improve performance are greatly needed. The purpose of this study was to evaluate the validity and diagnostic accuracy of early writing screeners. Two early writing screening measures, Picture Word and Word Dictation, were administered to a diverse sample of 95 kindergarten students, almost half of whom were classified as English language learners and almost 70% identified ethnically as Hispanic. It was hypothesized that the early writing screening measures would demonstrate moderate to strong relationships with a standardized norm-referenced measure of written expression and adequate diagnostic accuracy for identifying kindergarten students at risk. Findings indicate that concurrent validity coefficients for both Picture Word and Word Dictation tasks ranged from .32 to .70 with the Written Expression cluster of the Woodcock–Johnson Tests of Achievement–IV and .26 to .61 with the Writing Samples and Sentence Writing Fluency subtests. Diagnostic accuracy results suggest these measures are a promising option for screening early writing skills. Implications for practice and directions for future research are discussed.


Author(s):  
Corey Palermo ◽  
Margareta Maria Thomson

The majority of United States students demonstrate only partial mastery of the knowledge and skills necessary for proficient writing. Researchers have called for increased classroom-based formative writing assessment to provide students with regular feedback about their writing performance and to support the development of writing skills. Automated writing evaluation (AWE) is a type of assessment for learning (AfL) that combines automated essay scoring (AES) and automated feedback with the goal of supporting improvements in students' writing performance. The current chapter first describes AES, AWE, and automated feedback. Next, results of an original study that examined students' and teachers' perceptions of automated feedback are presented and discussed. The chapter concludes with recommendations and directions for future research.


2018 ◽  
Vol 42 (2) ◽  
pp. 117-128 ◽  
Author(s):  
Sterett H. Mercer ◽  
Milena A. Keller-Margulis ◽  
Erin L. Faith ◽  
Erin K. Reid ◽  
Sarah Ochs

Written-expression curriculum-based measurement (WE-CBM) is used for screening and progress monitoring students with or at risk of learning disabilities (LD) for academic supports; however, WE-CBM has limitations in technical adequacy, construct representation, and scoring feasibility as grade-level increases. The purpose of this study was to examine the structural and external validity of automated text evaluation with Coh-Metrix versus traditional WE-CBM scoring for narrative writing samples (7-min duration) collected in fall and winter from 144 second- through fifth-grade students. Seven algorithms were applied to train models of Coh-Metrix and traditional WE-CBM scores to predict holistic quality of the writing samples as evidence of structural validity; then, external validity was evaluated via correlations with rated quality on other writing samples. Key findings were that (a) structural validity coefficients were higher for Coh-Metrix compared with traditional WE-CBM but similar in the external validity analyses, (b) external validity coefficients were higher than reported in prior WE-CBM studies with holistic or analytic ratings as a criterion measure, and (c) there were few differences in performance across the predictive algorithms. Overall, the results highlight the potential use of automated text evaluation for WE-CBM scoring. Implications for screening and progress monitoring are discussed.


2016 ◽  
Vol 35 (3) ◽  
pp. 323-335 ◽  
Author(s):  
Tyler L. Renshaw

The present study reports on an investigation of the generalizability of the technical adequacy of the Positive Experience at School Scale (PEASS) with a sample of students ( N = 1,002) who differed substantially in age/grade level (i.e., adolescents in middle school as opposed to children in elementary school) and ethnic identity (i.e., majority Black/African American as opposed to majority Latino/a) in comparison with the measure’s primary development sample. Findings from confirmatory factor analyses indicated the original latent structure of the PEASS was tenable in the current sample and that the measure was invariant across gender and grade level, with some small demographic differences identified via latent means testing. Additional psychometric findings regarding the technical adequacy of the PEASS with this sample, including its observed scale characteristics and simulated classification utility with criterion measures of academic self-efficacy and school connectedness, are also presented. Implications for future research and practice are discussed.


Author(s):  
Shih-Tseng Tina Huang ◽  
Vinh-Long Tran-Chi

Empathy is an important social skill. It is believed to play an essential role in socioemotional and moral development. The current study aimed to explore empathy development during childhood especially among students in the primary and middle schools located in Southern Vietnam. Bryant's Empathy Index for children and adolescents was administrated on 403 children, including 210 boys and 193 girls. The results showed that there was no significant difference between boys and girls in affective empathy. The results further indicated that there is a significant grade difference on affective empathy with the fourth-grade students being placed higher than those of the second and the sixth grades. A separate analysis was conducted for each of the dependent variables. It was found that the fourth graders were significantly higher than the second and the sixth graders on Understanding Feelings, Feelings of Sadness and Bryant's Empathy Index respectively. The result also showed that the Vietnamese version of Bryant's Empathy Index has acceptable reliability and can be used for future research.


Sign in / Sign up

Export Citation Format

Share Document