Scoring Second Signed Language Assessment

2021 ◽  
pp. 315-328
Author(s):  
Tobias Haug ◽  
Eveline Boers-Visker ◽  
Wolfgang Mann ◽  
Geoffrey Poor ◽  
Beppie Van den Bogaerde

There exists a scarcity in signed language assessment research, especially on scoring issues and interrater reliability. This chapter describes two related assessment instruments, the SLPI and the NFA, which offer scoring criteria. Raters are provided with scales for evaluating the different components of the language production of the candidate. Through its use, the rating system has been proved successful; there is, however, hardly any data on interrater reliability. In this chapter, the authors describe reliability issues with attention to raters’ training and score resolution techniques and discuss how to identify and increase rater reliability. The dearth of knowledge on signed language assessment, and in particular its validity and reliability, indicates an urgent need for more research in this area.

2018 ◽  
Author(s):  
Aisiah Aisiah

This paper examines the development of script assessment instruments in history department. So far, the assessment instrument has not been valid as a measurement so that the development is needed to be done. The aim is to develop the components and indicators of scripts' assessment as a basis for a valid instrument and describe the characteristics and forms of the instrument. The development is done through R & D. Subjects of the research were assessment instruments itself used by the rater (examiner) to assess the script. Rater consists of 10 persons. The research data is in the form of rater assessment scores at the time of the exam. Data analysis was performed by testing the validity and reliability of the instrument through expert judgment in the FGD and inter-rater reliability. R & D results indicate that the instrument in the form of assessment format history script consists of four components and 20 indicators/assessment items. While the assessment format of history education script consists of six components and 19 indicators/assessment items. The characteristic of the resulting instrument has been qualified as a good instrument in the term of validity and reliability. The analysis showed that the level of inter-rater reliability was significant to be said to be reliable (the assessment instruments of history script r between 0, 9751 and 0.9949, while the assessment instruments of history education r between 0, 9947 and 0.9990).


1985 ◽  
Vol 16 (4) ◽  
pp. 244-255
Author(s):  
Penelope K. Hall ◽  
Linda S. Jordan

The performance of 123 language-disordered children on the DeRenzi and Faglioni form of the Token Test and the DeRenzi and Ferrari Reporter's Test were analyzed using two scoring conventions, and then compared with the performances of children with presumed normal language development. Correlations with other commonly used language assessment instruments are cited. Use of the Token and Reporter's Tests with children exhibiting language disorders is suggested.


Author(s):  
Henriëtte A. W. Meijer ◽  
Maurits Graafland ◽  
Miryam C. Obdeijn ◽  
Marlies P. Schijven ◽  
J. Carel Goslings

Abstract Purpose To determine the validity of wrist range of motion (ROM) measurements by the wearable-controlled ReValidate! wrist-rehabilitation game, which simultaneously acts as a digital goniometer. Furthermore, to establish the reliability of the game by contrasting ROM measurements to those found by medical experts using a universal goniometer. Methods As the universal goniometer is considered the reference standard, inter-rater reliability between surgeons was first determined. Internal validity of the game ROM measurements was determined in a test–retest setting with healthy volunteers. The reliability of the game was tested in 34 patients with a restricted range of motion, in whom the ROM was measured by experts as well as digitally. Intraclass-correlation coefficients (ICCs) were determined and outcomes were analyzed using Bland–Altman plots. Results Inter-rater reliability between experts using a universal goniometer was poor, with ICCs of 0.002, 0.160 and 0.520. Internal validity testing of the game found ICCs of − 0.693, 0.376 and 0.863, thus ranging from poor to good. Reliability testing of the game compared to medical expert measurements, found that mean differences were small for the flexion–extension arc and the radial deviation-ulnar deviation arc. Conclusion The ReValidate! game is a reliable home-monitoring device digitally measuring ROM in the wrist. Interestingly, the test–retest reliability of the serious game was found to be considerably higher than the inter-rater reliability of the reference standard, being healthcare professionals using a universal goniometer. Trial registration number (internal hospital registration only) MEC-AMC W17_003 #17.015.


Medicina ◽  
2019 ◽  
Vol 55 (9) ◽  
pp. 548 ◽  
Author(s):  
Salvioli ◽  
Pozzi ◽  
Testa

Background and objectives: Low back pain is one of the most common health problems. In 85% of cases, it is not possible to identify a specific cause, and it is therefore called Non-Specific Low Back Pain (NSLBP). Among the various attempted classifications, the subgroup of patients with impairment of motor control of the lower back (MCI) is between the most studied. The objective of this systematic review is to summarize the results from trials about validity and reliability of clinical tests aimed to identify MCI in the NSLBP population. Materials and Methods: The MEDLINE, Cochrane Library, and MedNar databases have been searched until May 2018. The criteria for inclusion were clinical trials about evaluation methods that are affordable and applicable in a usual clinical setting and conducted on populations aged > 18 years. A single author summarized data in synoptic tables relating to the clinical property; a second reviewer intervened in case of doubts about the relevance of the studies. Results: 13 primary studies met the inclusion criteria: 10 investigated inter-rater reliability, 4 investigated intra-rater reliability, and 6 investigated validity for a total of 23 tests (including one cluster of tests). Inter-rater reliability is widely studied, and there are tests with good, consistent, and substantial values (waiter’s bow, prone hip extension, sitting knee extension, and one leg stance). Intra-rater reliability has been less investigated, and no test have been studied for more than one author. The results of the few studies about validity aim to discriminate only the presence or absence of LBP in the samples. Conclusions: At the state of the art, results related to reliability support the clinical use of the identified tests. No conclusions can be drawn about validity.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9687
Author(s):  
Vanina Costa ◽  
Óscar Ramírez ◽  
Abraham Otero ◽  
Daniel Muñoz-García ◽  
Sandra Uribarri ◽  
...  

Background Elbow and wrist chronic conditions are very common among musculoskeletal problems. These painful conditions affect muscle function, which ultimately leads to a decrease in the joint’s Range Of Motion (ROM). Due to their portability and ease of use, goniometers are still the most widespread tool for measuring ROM. Inertial sensors are emerging as a digital, low-cost and accurate alternative. However, whereas inertial sensors are commonly used in research studies, due to the lack of information about their validity and reliability, they are not widely used in the clinical practice. The goal of this study is to assess the validity and intra-inter-rater reliability of inertial sensors for measuring active ROM of the elbow and wrist. Materials and Methods Measures were taken simultaneously with inertial sensors (Werium™ system) and a universal goniometer. The process involved two physiotherapists (“rater A” and “rater B”) and an engineer responsible for the technical issues. Twenty-nine asymptomatic subjects were assessed individually in two sessions separated by 48 h. The procedure was repeated by rater A followed by rater B with random order. Three repetitions of each active movement (elbow flexion, pronation, and supination; and wrist flexion, extension, radial deviation and ulnar deviation) were executed starting from the neutral position until the ROM end-feel; that is, until ROM reached its maximum due to be stopped by the anatomy. The coefficient of determination (r2) and the Intraclass Correlation Coefficient (ICC) were calculated to assess the intra-rater and inter-rater reliability. The Standard Error of the Measurement and the Minimum Detectable Change and a Bland–Altman plots were also calculated. Results Similar ROM values when measured with both instruments were obtained for the elbow (maximum difference of 3° for all the movements) and wrist (maximum difference of 1° for all the movements). These values were within the normal range when compared to literature studies. The concurrent validity analysis for all the movements yielded ICC values ≥0.78 for the elbow and ≥0.95 for the wrist. Concerning reliability, the ICC values denoted a high reliability of inertial sensors for all the different movements. In the case of the elbow, intra-rater and inter-rater reliability ICC values range from 0.83 to 0.96 and from 0.94 to 0.97, respectively. Intra-rater analysis of the wrist yielded ICC values between 0.81 and 0.93, while the ICC values for the inter-rater analysis range from 0.93 to 0.99. Conclusions Inertial sensors are a valid and reliable tool for measuring elbow and wrist active ROM. Particularly noteworthy is their high inter-rater reliability, often questioned in measurement tools. The lowest reliability is observed in elbow prono-supination, probably due to skin artifacts. Based on these results and their advantages, inertial sensors can be considered a valid assessment tool for wrist and elbow ROM.


In Language Assessment Across Modalities: Paired-Papers on Signed and Spoken Language Assessment, volume editors Tobias Haug, Wolfgang Mann, and Ute Knoch bring together—for the first time—researchers, clinicians, and practitioners from two different fields: signed language and spoken language. The volume examines theoretical and practical issues related to 12 topics ranging from test development and language assessment of bi-/multilingual learners to construct issues of second-language assessment (including the Common European Framework of Reference [CEFR]) and language assessment literacy in second-language assessment contexts. Each topic is addressed separately for spoken and signed language by experts from the relevant field. This is followed by a joint discussion in which the chapter authors highlight key issues in each field and their possible implications for the other field. What makes this volume unique is that it is the first of its kind to bring experts from signed and spoken language assessment to the same table. The dialogues that result from this collaboration not only help to establish a shared appreciation and understanding of challenges experienced in the new field of signed language assessment but also breathes new life into and provides a new perspective on some of the issues that have occupied the field of spoken language assessment for decades. It is hoped that this will open the door to new and exciting cross-disciplinary collaborations.


Author(s):  
Kathy J. Bohan ◽  
Cynthia A. Conn ◽  
Suzanne L. Pieper

Locally developed performance-based assessment instruments must provide evidence of validity and reliability supporting their intended interpretation and use. Accrediting bodies, such as the Council for the Accreditation of Educator Preparation (CAEP), require Educator Preparation Programs (EPPs) to provide this evidence in their accreditation self-study. However, faculty may not have the expertise to conduct an effective examination of their assessments. This chapter describes a process for gathering evidence to build a validity argument for locally developed performance-based assessments. Grounded in measurement theory, the Validity Inquiry Process (VIP) guides faculty through a reflective practice approach towards making defensible claims about the use of results from locally developed performance-based assessments. Using this process, faculty can have greater confidence in using their performance-based assessments to provide feedback to their students, as well as offer assurances of program quality or to identify areas for improvement.


2021 ◽  
pp. 145-152
Author(s):  
Amy Kissel Frisbie ◽  
Aaron Shield ◽  
Deborah Mood ◽  
Nicole Salamy ◽  
Jonathan Henner

This chapter is a joint discussion of key items presented in Chapters 4.1 and 4.2 related to the assessment of deaf and hearing children on the autism spectrum . From these chapters it becomes apparent that a number of aspects associated with signed language assessment are relevant to spoken language assessment. For example, there are several precautions to bear in mind about language assessments obtained via an interpreter. Some of these precautions apply solely to D/HH children, while others are applicable to assessments with hearing children in multilingual contexts. Equally, there are some aspects of spoken language assessment that can be applied to signed language assessment. These include the importance of assessing pragmatic language skills, assessing multiple areas of language development, differentiating between ASD and other developmental disorders, and completing the language evaluation within a developmental framework. The authors conclude with suggestions for both spoken and signed language assessment.


2021 ◽  
pp. 329-332
Author(s):  
Tobias Haug ◽  
Ute Knoch ◽  
Wolfgang Mann

This chapter is a joint discussion of key items related to scoring issues related to signed and spoken language assessment that were discussed in Chapters 9.1 and 9.2. One aspect of signed language assessment that has the potential to stimulate new research in spoken second language (L2) assessment is the scoring of nonverbal speaker behaviors. This aspect is rarely represented in the scoring criteria of spoken assessments and in many cases not even available to raters during the scoring process. The authors argue, therefore, for a broadening of the construct of spoken language assessment to also include elements of nonverbal communication in the scoring descriptors. Additionally, the importance of rater training for signed language assessments, application of Rasch analysis to investigate possible reasons of disagreement between raters, and the need to conduct research on rasting scales are discussed.


Sign in / Sign up

Export Citation Format

Share Document