Cognitive validity evidence of computer- and paper-based writing tests and differences in the impact on EFL test-takers in classroom assessment

This study reveals the impact of using parallel corpora on EFL students' writing, and how students perceive it. Female undergraduates (n=46) in an EFL writing course in Saudi Arabia were divided randomly into experimental and control groups taught by the same instructor, using the same materials. Students in the experimental group were introduced to three parallel corpora and encouraged to use them in their writing. Tests at the beginning of the semester showed no difference in English proficiency or writing ability between groups. Over the semester, students in both groups also wrote 5 writing assignments and took three writing exams. To examine students' perceptions of parallel corpora, students in the experimental group were asked to write a self-evaluation report and answer an evaluation questionnaire. Quantitative and qualitative analysis showed significant improvement in their writing but no significant difference between groups. However, students' perception of parallel corpora was generally positive.

Download Full-text

Validity Evidence Based on Internal Structure of Scores of the Emotional Quotient Inventory: Youth Version Short (EQ-i: YV-S) in a Chinese Sample

Journal of Psychoeducational Assessment ◽

10.1177/0734282916689439 ◽

2017 ◽

Vol 36 (6) ◽

pp. 576-587 ◽

Cited By ~ 2

Author(s):

Igor Esnaola ◽

Víctor B. Arias ◽

John Freeman ◽

Yina Wang ◽

Benito Arias

Keyword(s):

Temporal Stability ◽

Empirical Support ◽

Reliability And Validity ◽

Validity Evidence ◽

Chinese Sample ◽

Reliability Indices ◽

Emotional Quotient ◽

Emotional Quotient Inventory ◽

Positive Impression ◽

The Impact

Given the lack of any Chinese instrument validated for emotional intelligence (EI) among adolescents, the purpose of this study was to explore new sources of validity evidence drawn from scores on the Emotional Quotient Inventory: Youth Version Short (EQ-i: YV-S) in a sample of Chinese adolescents. The sample was composed of 406 adolescents (236 girls). Results support the multidimensionality of the EQ-i: YV-S, but its hierarchical structure did not receive empirical support. Three of the four main subscales (all but interpersonal) had acceptable reliability indices. In addition, although the impact of the Positive Impression subscale on responses to the main scales was generally low, the effect is not ignorable, and its impact should be modeled in further investigations of the EQ-i: YV-S. Finally, four main subscales of EQ-i: YV-S showed significant power in the prediction of general self-concept and moderate temporal stability. The findings provide overall support for the reliability and validity of the Chinese version of EQ-i: YV-S.

Download Full-text

From Perception to Practice: The Impact of Teachers' Scoring Experience on Performance-Based Instruction and Classroom Assessment

Educational Assessment ◽

10.1207/s15326977ea0604_3 ◽

2000 ◽

Vol 6 (4) ◽

pp. 257-290 ◽

Cited By ~ 24

Author(s):

Gail Lynn Goldberg ◽

Barbara Sherr Roswell

Keyword(s):

Classroom Assessment ◽

Based Instruction ◽

The Impact

Download Full-text

The Impact of Technology on Assessment and Evaluation in Higher Education

Technology Integration in Higher Education - Advances in Higher Education and Professional Development ◽

10.4018/978-1-60960-147-8.ch016 ◽

2011 ◽

pp. 222-235 ◽

Cited By ~ 1

Author(s):

James P. Van Haneghan

Keyword(s):

Higher Education ◽

Classroom Assessment ◽

Assessment Practices ◽

Organizational Assessment ◽

Assessment And Evaluation ◽

Summative Assessments ◽

The Social ◽

Technological Tools ◽

Impact Of Technology ◽

The Impact

This chapter explores the impact of technology on assessment and evaluation in higher education. The impacts on classroom, program, and organizational assessment are discussed. Both formative and summative assessments in classrooms have been impacted by emerging technologies. However, the impact of many of the technological tools developed by measurement specialists has not been as widespread as one would expect given the age of many assessment technologies. Nevertheless, there remains a great potential for new measurement technologies to significantly improve classroom assessment practices. Technology for organizational assessment has continued to boom in light of the dual push for both accountability and continuous improvement by accreditors. The social impacts and burden of organizational assessment and evaluation are discussed. Overall, it is concluded that in order to evaluate the impact of technology, attention needs to be paid to the consequences of both classroom and organization assessment.

Download Full-text

The Impact of Wikis & Videos Integration Through Cooperative Writing Tasks Processes

English Language Teaching ◽

10.5539/elt.v11n5p116 ◽

2018 ◽

Vol 11 (5) ◽

pp. 116

Author(s):

Lubin Fernando Franco-Camargo ◽

Gonzalo Camacho-Vásquez

Keyword(s):

English Language ◽

Teaching Practice ◽

Positive Impact ◽

Writing Skills ◽

Structured Interviews ◽

Writing Processes ◽

Implementation Stages ◽

Teaching Learning ◽

Writing Tests ◽

The Impact

ICT role in education nowadays is not only important, but also effective; its advancement allows a vast opportunity to be explored by EFL teachers into the EFL classroom. This action-research study envisioned and carried out from our teaching practice basis with English language B1 level students at Weisheit institute. Observation and instruments Implementation stages determined the positive impact of the integration of Wikis in EFL classrooms and how cooperative writing processes eased and helped the students improve their writing performance. Indeed, taking into account as a main strategy the “ICT” as a tool to improve teaching practices. This research was conducted through mixed-method approach and included a methodical process through data collection of journals, pre and post writing tests, semi-structured interviews and aptitude test. Of course, by looking upon that the application of these instruments helped us identify certain points of particular interest providing self-reflection on our own teaching-learning processes regarding as main problems; lack of writing skills, lack of vocabulary, grammar mistakes and writing inaccuracy. The strategies implemented had to do mainly with the integration of Wiki websites as a pedagogical instrument to improve writing skills through pre-writing eye-catching elements such as videos implementation in order to trigger motivational writing processes.

Download Full-text

Validity evidence for Quality Improvement Knowledge Application Tool Revised (QIKAT-R) scores: consequences of rater number and type using neurology cases

BMJ Quality & Safety ◽

10.1136/bmjqs-2018-008689 ◽

2019 ◽

Vol 28 (11) ◽

pp. 925-933 ◽

Cited By ~ 1

Author(s):

Charles Kassardjian ◽

Yoon Soo Park ◽

Sherri Braksick ◽

Jeremy Cutsforth-Gregory ◽

Carrie Robertson ◽

...

Keyword(s):

Quality Improvement ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Validity Evidence ◽

Knowledge Application ◽

Before And After ◽

The Mean ◽

Phi Coefficient ◽

The Impact ◽

Multiple Scenarios

ObjectivesTo develop neurology scenarios for use with the Quality Improvement Knowledge Application Tool Revised (QIKAT-R), gather and evaluate validity evidence, and project the impact of scenario number, rater number and rater type on score reliability.MethodsSix neurological case scenarios were developed. Residents were randomly assigned three scenarios before and after a quality improvement (QI) course in 2015 and 2016. For each scenario, residents crafted an aim statement, selected a measure and proposed a change to address a quality gap. Responses were scored by six faculty raters (two with and four without QI expertise) using the QIKAT-R. Validity evidence from content, response process, internal structure, relations to other variables and consequences was collected. A generalisability (G) study examined sources of score variability, and decision analyses estimated projected reliability for different numbers of raters and scenarios and raters with and without QI expertise.ResultsRaters scored 163 responses from 28 residents. The mean QIKAT-R score was 5.69 (SD 1.06). G-coefficient and Phi-coefficient were 0.65 and 0.60, respectively. Interrater reliability was fair for raters without QI expertise (intraclass correlation = 0.53, 95% CI 0.30 to 0.72) and acceptable for raters with QI expertise (intraclass correlation = 0.66, 95% CI 0.02 to 0.88). Postcourse scores were significantly higher than precourse scores (6.05, SD 1.48 vs 5.22, SD 1.5; p < 0.001). Sufficient reliability for formative assessment (G-coefficient > 0.60) could be achieved by three raters scoring six scenarios or two raters scoring eight scenarios, regardless of rater QI expertise.ConclusionsValidity evidence was sufficient to support the use of the QIKAT-R with multiple scenarios and raters to assess resident QI knowledge application for formative or low-stakes summative purposes. The results provide practical information for educators to guide implementation decisions.

Download Full-text

Building Teacher Capacity within the Evolving Assessment Culture in Canadian Education

Policy Futures in Education ◽

10.2304/pfie.2012.10.4.447 ◽

2012 ◽

Vol 10 (4) ◽

pp. 447-460 ◽

Cited By ~ 10

Author(s):

Don A. Klinger ◽

Louis Volante ◽

Christopher Deluca

Keyword(s):

Professional Development ◽

Professional Learning ◽

Teaching And Learning ◽

Large Scale ◽

Classroom Assessment ◽

Assessment Practices ◽

Teacher Capacity ◽

Canadian Education ◽

Assessment Culture ◽

The Impact

Lost in the focus on large-scale educational assessments for accountability purposes is the important role of teachers' classroom assessment practices. Teachers must understand the use of both large-scale and classroom assessment practices and theories, and professional development remains the primary method to develop these assessment capacities. However, traditional models of professional development typically have little, if any, effect. In recognition of the importance of building teachers' assessment capacity, and the limitations of traditional professional development, the Elementary Teachers' Federation of Ontario, Canada, developed a Classroom Assessment Workshop Series to begin to build a systemic assessment framework for teachers. Through pre- and post-series surveys with 300 participants, and interviews and focus groups with facilitators, the authors' review and research explored the impact of the series on teachers' beliefs, self-efficacy, and knowledge of assessment practices and theory. The authors also explored the challenges that teachers experienced as they worked to understand and implement current conceptions of assessment. While teachers certainly valued the community created through the series and the opportunities to share their experiences, the findings found that teachers struggled to understand the theoretical foundations and use these foundations to further develop their own assessment practices. The research highlights the need for teachers to embrace a philosophy that integrates formative assessment practices and theories into their teaching and learning while also identifying the challenges associated with creating such an assessment culture. Current models of professional development may be more aligned with principles of effective professional learning, but truly changing teachers' classroom assessment practices may require a much more prolonged effort than those being provided.

Download Full-text

Development and evaluation of a chemistry-specific version of the academic motivation scale (AMS-Chemistry)

Chemistry Education Research and Practice ◽

10.1039/c6rp00200e ◽

2017 ◽

Vol 18 (1) ◽

pp. 191-213 ◽

Cited By ~ 15

Author(s):

Yujuan Liu ◽

Brent Ferrell ◽

Jack Barbera ◽

Jennifer E. Lewis

Keyword(s):

Academic Achievement ◽

Pilot Study ◽

Intrinsic Motivation ◽

Student Motivation ◽

Academic Motivation ◽

Medium Effect ◽

Validity Evidence ◽

Academic Motivation Scale ◽

Motivation Scale ◽

The Impact

Fundamentally concerned with motivation, self-determination theory (SDT) represents a framework of several mini-theories to explore how social context interacts with people's motivational types categorized by degree of regulation internalization. This paper aims to modify an existing theory-based instrument (Academic Motivation Scale, or AMS) and provide validity evidence for the modified instrument (Academic Motivation Scale-Chemistry) as a measure of seven types of student motivation toward chemistry. The paper explores how motivation as measured by AMS-Chemistry is related to student academic achievement and attendance. In a pilot study, the unmodified AMS showed good reliability, reasonable data fit, and the ability to detect motivational differences by sex in college chemistry courses. Based on the pilot study results, expert panel discussions, and cognitive interviews with students, the Academic Motivation Scale – Chemistry (AMS-Chemistry) was developed. AMS-Chemistry was administered to university students in a first semester general chemistry course twice within a semester. An examination of validity evidence suggested that the AMS-Chemistry data could be used to investigate student motivation toward chemistry. Results showed students were extrinsically motivated toward chemistry on average, and there was an overall motivational difference favoring males with a medium effect size. Correlation studies showed motivation was not associated with academic achievement at the beginning of the term, but intrinsic motivation subscales (to know, to experience, and to accomplish) were positively associated with academic achievement at the end of the term. Results also showed that students who persisted in class attendance scored higher on intrinsic motivation subscales than those who did not persist. The 28-item AMS-Chemistry is easy to administer and can be used to better understand students’ motivation status and how it might change across the curriculum. Faculty interested in promoting student intrinsic motivation may also use the AMS-Chemistry to evaluate the impact of their efforts.

Download Full-text

Changes in scoring of Direct Observation of Procedural Skills (DOPS) forms and the impact on competence assessment

Endoscopy ◽

10.1055/a-0576-6667 ◽

2018 ◽

Vol 50 (08) ◽

pp. 770-778 ◽

Cited By ~ 14

Author(s):

Keith Siau ◽

Paul Dunckley ◽

Roland Valori ◽

Mark Feeney ◽

Neil Hawkes ◽

...

Keyword(s):

Observational Study ◽

Learning Curve ◽

Direct Observation ◽

Assessment Tool ◽

Competence Assessment ◽

Validity Evidence ◽

Point Scale ◽

Procedural Skills ◽

The Impact ◽

Scale Format

Abstract Background Direct Observation of Procedural Skills (DOPS) is an established competence assessment tool in endoscopy. In July 2016, the DOPS scoring format changed from a performance-based scale to a supervision-based scale. We aimed to evaluate the impact of changes to the DOPS scale format on the distribution of scores in novice trainees and on competence assessment. Methods We performed a prospective, multicenter (n = 276), observational study of formative DOPS assessments in endoscopy trainees with ≤ 100 lifetime procedures. DOPS were submitted in the 6-months before July 2016 (old scale) and after (new scale) for gastroscopy (n = 2998), sigmoidoscopy (n = 1310), colonoscopy (n = 3280), and polypectomy (n = 631). Scores for old and new DOPS were aligned to a 4-point scale and compared. Results 8219 DOPS (43 % new and 57 % old) submitted for 1300 trainees were analyzed. Compared with old DOPS, the use of the new DOPS was associated with greater utilization of the lowest score (2.4 % vs. 0.9 %; P < 0.001), broader range of scores, and a reduction in competent scores (60.8 % vs. 86.9 %; P < 0.001). The reduction in competent scores was evident on subgroup analysis across all procedure types (P < 0.001) and for each quartile of endoscopy experience. The new DOPS was superior in characterizing the endoscopy learning curve by demonstrating progression of competent scores across quartiles of procedural experience. Conclusions Endoscopy assessors applied a greater range of scores using the new DOPS scale based on degree of supervision in two cohorts of trainees matched for experience. Our study provides construct validity evidence in support of the new scale format.

Download Full-text

Pre-Clerkship EPA Assessments: A Thematic Analysis of Rater Cognition

10.21203/rs.3.rs-826239/v1 ◽

2021 ◽

Author(s):

Eric G. Meyer ◽

Emily Harvey ◽

Steven J. Durning ◽

Sebastian Uijtdehaage

Keyword(s):

Student Performance ◽

Thematic Analysis ◽

Qualitative Approach ◽

Frame Of Reference ◽

Validity Evidence ◽

Professional Activities ◽

Think Alouds ◽

Workplace Based Assessment ◽

The Impact ◽

Over Time

Abstract Background. Entrustable Professional Activities (EPAs) assessments measure learners’ competence with an entrustment or supervisory scale. Designed for workplace-based assessment EPA assessments have also been proposed for undergraduate medical education (UME), where assessments frequently occur outside the workplace and may be less intuitive, raising validity concerns. This study explored how assessors make entrustment determinations in UME, to include the impact of longitudinal student-assessor relationships.Methods. A qualitative approach using think-alouds was employed. Assessors assessed two students (familiar and unfamiliar) completing a history and physical exam using a supervisory scale and then thought-aloud after each assessment. We conducted a thematic analysis of assessors’ response processes and compared them based on their familiarity with a student.Results. Four themes and fifteen subthemes were identified. The most prevalent theme related to “student performance.” The other three themes included “frame of reference,” “assessor uncertainty,” and “the patient.” “Previous student performance” and “affective reactions” were subthemes more likely to inform scoring when faculty were familiar with a student, while unfamiliar faculty were more likely to reference “self” and “lack confidence in their ability to assess.”Conclusions. Student performance appears to be assessors’ main consideration for all students, providing some validity evidence for the response process in EPA assessments. Several problematic themes could be addressed with faculty development while others appear to be inherent to entrustment and may be more challenging to mitigate. Differences based on assessor familiarity with student merits further research on how trust develops over time.

Download Full-text