Exploring decision consistency and decision accuracy across rating designs in rater-mediated music performance assessments

2018 ◽  
Vol 23 (4) ◽  
pp. 465-485 ◽  
Author(s):  
Stefanie A. Wind ◽  
Pey Shin Ooi ◽  
George Engelhard

Music performance assessments frequently include panels of raters who evaluate the quality of musical performances using rating scales. As a result of practical considerations, it is often not possible to obtain ratings from every rater on every performance (i.e., complete rating designs). When there are differences in rater severity, and not all raters rate all performances, ratings of musical performances and their resulting classification (e.g., pass or fail) depend on the “luck of the rater draw.” In this study, we explored the implications of different types of incomplete rating designs for the classification of musical performances in rater-mediated musical performance assessments. We present a procedure that researchers and practitioners can use to adjust student scores for differences in rater severity when incomplete rating designs are used, and we consider the effects of the adjustment procedure across different types of rating designs. Our results suggested that differences in rater severity have large practical consequences for ratings of musical performances that impact individual students and group of students differently. Furthermore, our findings suggest that it is possible to adjust musical performance ratings for differences in rater severity as long as there are common raters across scoring panels. We consider the implications of our findings as they relate to music assessment research and practice.

2019 ◽  
Vol 8 (2) ◽  
pp. 164-183
Author(s):  
Karen Moukheiber

Musical performance was a distinctive feature of urban culture in the formative period of Islamic history. At the court of the Abbasid caliphs, and in the residences of the ruling elite, men and women singers performed to predominantly male audiences. The success of a performer was linked to his or her ability to elicit ṭarab, namely a spectrum of emotions and affects, in their audiences. Ṭarab was criticized by religious scholars due, in part, to the controversial performances at court of slave women singers depicted as using music to induce passion in men, diverting them from normative ethical social conduct. This critique, in turn, shaped the ethical boundaries of musical performances and affective responses to them. Abū l-Faraj al-Iṣfahānī’s tenth-century Kitāb al-Aghānī (‘The Book of Songs’) compiles literary biographies of prominent male and female singers from the formative period of Islamic history. It offers rich descriptions of musical performances as well as ensuing manifestations of ṭarab in audiences, revealing at times the polemics with which they were associated. Investigating three biographical narratives from Kitāb al-Aghānī, this paper seeks to answer the following question: How did emotions, gender and status shape on the one hand the musical performances of women singers and on the other their audiences’ emotional responses, holistically referred to as ṭarab. Through this question, this paper seeks to nuance and complicate our understanding of the constraints and opportunities that shaped slave and free women's musical performances, as well as men's performances, at the Abbasid court.


Author(s):  
Daniel Massoth

When technology is used for assessment in music, certain considerations can affect the validity, reliability, and depth of analysis. This chapter explores factors that are present in the three phases of the assessment process: recognition, analysis, and display of assessment of a musical performance. Each phase has inherent challenges embedded within internal and external factors. The goal here is not to provide an exhaustive analysis of any or all aspects of assessment but, rather, to present the rationale for and history of using technology in music assessment and to examine the philosophical and practical considerations. A discussion of possible future directions of product research and development concludes the chapter.


2021 ◽  
pp. 001316442098810
Author(s):  
Stefanie A. Wind ◽  
Yuan Ge

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for detecting rater biases, or differential rater functioning (DRF). The purpose of this study is to illustrate and explore the sensitivity of DRF indices in realistic sparse rating designs that have been documented in the literature that include different types and levels of connectivity among raters and students. The results indicated that it is possible to detect DRF in sparse rating designs, but the sensitivity of DRF indices varies across designs. We consider the implications of our findings for practice related to monitoring raters in performance assessments.


2015 ◽  
Vol 10 (1-2) ◽  
pp. 104 ◽  
Author(s):  
Erin Heisel

One way of understanding empathy in music performance is as a process by which singers closely identify with the characters they encounter and portray in opera or art song. As singers embody these characters, they literally give them voice. Musical performance thus humanizes characters as well as performers and audiences as deeper, empathetic engagement may also reflect or elicit new pathways of growth, knowledge, and understanding. What is the process a singer goes through in empathizing with a character? How can young singers learn to empathize with the characters they are tasked with portraying, even when they may find the characters or their behavior to fall outside of their own moral convictions?  This paper posits that empathy is a necessary part of the role preparation process for singers and introduces the “role journal” as a way for young singers to track embodiment processes and develop healthy habits of empathy and boundaries in their work.


2000 ◽  
Vol 6 (5) ◽  
pp. 362-370 ◽  
Author(s):  
Robin G. Morris ◽  
Claire Worsley ◽  
David Matthews

Neuropsychological assessment, in the broader sense, is common clinical practice with older adults because of the widespread use of mental status examinations and dementia rating scales. In the more narrow sense, a neuropsychological assessment conducted by a clinical psychologist or clinical neuropsychologist is used less frequently and for more specific purposes. This paper outlines these uses and provides a brief overview of the different types of test that might be used, with a clinical example to illustrate the type of information gained. This review is designed not to be comprehensive, but to provide a pointer towards the latest trends in test development.


Author(s):  
Sinéad O’Neill ◽  
John Sloboda

Musical performance is an irreducibly social phenomenon, manifested through the multiple relationships between performers and audience. In live contexts, the nature and meaning of performance encompass the two-way interplay between performers and audience. This chapter surveys a range of research, from the philosophical to the empirical, into the parameters of this interplay, both during and after performances, focusing most specifically on those aspects that have implications for the creative practice of the musician. These aspects go beyond sound parameters to features of the performance often seen as ‘extra-musical’, such as the visual and gestural aspects of performance, the architecture of the performance space and perceived norms of behaviour within the concert context. Consideration is given to how these elements contribute to different levels of experience, from the ‘basic’ appreciation of structural elements through to the ‘peak’ experiences which music performance sometimes engenders. Also considered is audience feedback, both formal and informal, and how it may have an impact on creative performance.


2020 ◽  
pp. 1321103X1987107 ◽  
Author(s):  
Gina Ryan

Relationship dynamics between students and teachers are an essential element of one-on-one teaching and learning in music schools. The purpose of this study was to investigate factors leading to student–teacher dyad dissolution in post-secondary music performance studios. A total of 30 students and 30 teachers were interviewed. Interview questionnaires contained closed-ended rating scales and open-ended questions. Unstructured responses were transcribed, coded by units that each represented a contributing factor to dyad dissolution, and then subjected to a frequency count to determine decisive factors leading to dyad dissolution. All factors were subjected to the Framework of Social Levels, which is based on four levels – Interpersonal, Self, Other, and Outside. The majority of students’ dissolution factors were attributed at the Interpersonal level, whereas the majority of teachers attributed dissolution to factors at the Student ( Other) level. Participants cited several factors leading to dyad dissolution including different expectations, different professional goals, poor communication, incompatible personalities, student commitment, teacher teaching abilities, lesson satisfaction, and lack of personal connection.


Author(s):  
Masaki Uto

Abstract Performance assessments, in which human raters assess examinee performance in practical tasks, have attracted much attention in various assessment contexts involving measurement of higher-order abilities. However, difficulty persists in that ability measurement accuracy strongly depends on rater and task characteristics such as rater severity and task difficulty. To resolve this problem, various item response theory (IRT) models incorporating rater and task parameters, including many-facet Rasch models (MFRMs), have been proposed. When applying such IRT models to datasets comprising results of multiple performance tests administered to different examinees, test linking is needed to unify the scale for model parameters estimated from individual test results. In test linking, test administrators generally need to design multiple tests such that raters and tasks partially overlap. The accuracy of linking under this design is highly reliant on the numbers of common raters and tasks. However, the numbers of common raters and tasks required to ensure high accuracy in test linking remain unclear, making it difficult to determine appropriate test designs. We therefore empirically evaluate the accuracy of IRT-based performance-test linking under common rater and task designs. Concretely, we conduct evaluations through simulation experiments that examine linking accuracy based on a MFRM while changing numbers of common raters and tasks with various factors that possibly affect linking accuracy.


1999 ◽  
Vol 17 (2) ◽  
pp. 197-221 ◽  
Author(s):  
Patrik N. Juslin ◽  
Guy Madison

The purpose of this study was to explore whether listeners can use timing patterns to decode the intended emotional expression of musical performances. We gradually removed different acoustic cues (tempo, dynamics, timing, articulation) from piano performances rendered with various intended expressions (anger, sadness, happiness, fear) to see how such manipulations would affect a listener's ability to decode the emotional expression. The results show that (a) removing the timing patterns yielded a significant decrease in listeners' decoding accuracy, (b) timing patterns were by themselves capable of communicating some emotions with accuracy better than chance, and (c) timing patterns were less effective in communicating emotions than were tempo and dynamics. Implications for research on timing in performance are discussed.


Sign in / Sign up

Export Citation Format

Share Document