Bayesian approaches to the weighted kappa-like inter-rater agreement measures

2021 ◽  
Vol 30 (10) ◽  
pp. 2329-2351
Author(s):  
Quoc Duyet Tran ◽  
Haydar Demirhan ◽  
Anil Dolgun

Inter-rater agreement measures are used to estimate the degree of agreement between two or more assessors. When the agreement table is ordinal, different weight functions that incorporate row and column scores are used along with the agreement measures. The selection of row and column scores is effectual on the estimated degree of agreement. The weighted measures are prone to the anomalies frequently seen in agreement tables such as unbalanced table structures or grey zones due to the assessment behaviour of the raters. In this study, Bayesian approaches for the estimation of inter-rater agreement measures are proposed. The Bayesian approaches make it possible to include prior information on the assessment behaviour of the raters in the analysis and impose order restrictions on the row and column scores. In this way, we improve the accuracy of the agreement measures and mitigate the impact of the anomalies in the estimation of the strength of agreement between the raters. The elicitation of prior distributions is described theoretically and practically for the Bayesian estimation of five agreement measures with three different weights using an agreement table having two grey zones. A Monte Carlo simulation study is conducted to assess the classification accuracy of the Bayesian and classical approaches for the considered agreement measures for a given level of agreement. Recommendations for the selection of the highest performing agreement measure and weight combination are made in the breakdown of the table structure and sample size.

2018 ◽  
Vol 8 (6) ◽  
pp. 227
Author(s):  
Muhammad Arfan Lodhi ◽  
Irum Robab ◽  
Sumera Mukhtar ◽  
Hifza Farman ◽  
Sana Farrukh

This descriptive study explores the impact of washback on ESL students’ performance at secondary level. In this study, the term “washback” refers to the test effect on content of curriculum, learning English, teaching and the activities conducted in classroom. The factors other than the test itself may affect positive washback; lack of positive washback does not make test invalid whereas the negative washback effect occurs when there is lack of construct validity of test. Test design and validity plays vital role in achieving positive washback (Messick, 1996). The study aims to investigate the effects of positive washback and benefits in learning and teaching processes in ESL classrooms, while negative washback effects are destructive and can be a hindrance in achieving the goals in ESL classrooms. Recent research is descriptive in nature and survey based method was adopted for this study. 50 teachers were selected by using purposive sampling technique and 100 students were selected by using simple random sampling technique. Three tools were used for this study including: Questionnaire, Test and Observation checklist. The findings of the study exhibit that negative washback effect has its influence on tests, learning and teaching. The study concludes with a realization of the fact that language pedagogy is affected by washback. However, it is claimed by majority of the teachers that washback affects the selection of teaching methods because exams stress brings pressure and it becomes necessary for English teachers to develop linguistic competence in their students. For future researches it is recommended that other studies should be made in order to find out the impact of washback on the strategies adopted by learners while learning second language.


2019 ◽  
Vol 8 (1) ◽  
pp. 118-148 ◽  
Author(s):  
Carla Rice ◽  
Ingrid Mündel

In this article, the authors examine the impact of using their evolving multimedia storytelling method (digital art and video) to challenge dominant representations of non-normative bodies and foster more inclusive spaces. Drawing on their collaborative work with disability and non-normatively embodied artists and communities, they investigate the challenges of negotiating what ‘access’ and ‘inclusion’ mean beyond the individualizing discourses of neoliberalism without erasing the specificities of differentially-lived experiences. Reflecting on their experiences in a variety of workshops and on a selection of videos made in those workshops, they identify and analyze three iterative ‘movements’ that mark their storytelling processes: from failure to vulnerability, from time to temporality, and from individual voice to collective concerns. The authors end by considering some of the ways they have experimented with developing an iterative workshop method that welcomes difference while simultaneously allowing for an examination of the terms of the shared space and of the mechanisms of inclusion and exclusion operating within that space.


2020 ◽  
pp. 40-48
Author(s):  
И.Р. Ерёмин

Статья посвящена анализу и оценке влияния дивидендов на рыночную стоимость компании. В основу исследования положены данные отчетности компаний и статистика Московской биржи. В данной работе представлен регрессионный анализ выборки, состоящий из крупнейших российских компаний за период 2013-2019 гг. Результаты регрессии показывают, что дивиденды положительно влияют на капитализацию, только если дивидендная политика определяется по остаточному принципу. Выводы, сделанные в работе, в которых приведены рекомендации и статистическая оценка зависимости, могут найти практическое применение при определении дивидендной политики организацией и прогнозировании изменения капитализации компаний. The article is devoted to the analysis and assessment of the impact of dividends on the market value of the company. The study is based on data from company reports and statistics from the Moscow Exchange. This paper presents a regression analysis of a selection of the largest Russian companies for the period 2013-2019. The regression results show that dividends have a positive effect on capitalization only if the dividend policy is determined on a residual basis. The conclusions made in the work, which provide recommendations and a statistical assessment of dependence, can find practical application in determining the dividend policy of an organization and predicting changes in the capitalization of companies.


2018 ◽  
Vol 64 (4) ◽  
pp. 145-159
Author(s):  
A. Brzeziński ◽  
K. Brzeziński ◽  
T. Dybicz ◽  
Ł. Szymański

AbstractWithin the INMOP 3 research project, an attempt was made to solve a number of problems associated with the methodology of modelling travel in urban areas and the application of intermodal models. One of these is the ability to describe the behaviour of transport system users, when it comes to making decisions regarding the selection of means of transport and searching for relationships between travel describing factors and the decisions made in regard of means of transport choice.The paper describes a probabilistic approach to the determination of modal split, and the application of a logistic regression model to determine the impact of variables describing individual and mass transport travels on the probability of selecting specific means of transport. Travels in local model of Warsaw city divided into 9 motivation groups were tested, for which ultimately 8 models were developed, out of which 7 were deemed very well fitted (obtained pseudo R2 was well above 0.2).


2017 ◽  
Vol 10 (5) ◽  
pp. 462-466 ◽  
Author(s):  
Scott L Zuckerman ◽  
Nikita Lakomkin ◽  
Jordan A Magarik ◽  
Jan Vargas ◽  
Marcus Stephens ◽  
...  

BackgroundThe angiographic evaluation of previously coiled aneurysms can be difficult yet remains critical for determining re-treatment.ObjectiveThe main objective of this study was to determine the inter-rater reliability for both the Raymond Scale and per cent embolization among a group of neurointerventionalists evaluating previously embolized aneurysms.MethodsA panel of 15 neurointerventionalists examined 92 distinct cases of immediate post-coil embolization and 1 year post-embolization angiographs. Each case was presented four times throughout the study, along with alterations in demographics in order to evaluate intra-rater reliability. All respondents were asked to provide the per cent embolization (0–100%) and Raymond Scale grade (1-3) for each aneurysm. Inter-rater reliability was evaluated by computing weighted kappa values (for the Raymond Scale) and intraclass correlation coefficients (ICC) for per cent embolization.Results10 neurosurgeons and 5 interventional neuroradiologists evaluated 368 simulated cases. The agreement among all readers employing the Raymond Scale was fair (κ=0.35) while concordance in per cent embolization was good (ICC=0.64). Clinicians with fewer than 10 years of experience demonstrated a significantly greater level of agreement than the group with greater than 10 years (κ=0.39 and ICC=0.70 vs κ=0.28 and ICC=0.58). When the same aneurysm was presented multiple times, clinicians demonstrated excellent consistency when assessing per cent embolization (ICC=0.82), but moderate agreement when employing the Raymond classification (κ=0.58).ConclusionsIdentifying the per cent embolization in previously coiled aneurysms resulted in good inter- and intra-rater agreement, regardless of years of experience. The strong agreement among providers employing per cent embolization may make it a valuable tool for embolization assessment in this patient population.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Quoc Duyet Tran ◽  
Anil Dolgun ◽  
Haydar Demirhan

Abstract Background In an inter-rater agreement study, if two raters tend to rate considering different aspects of the subject of interest or have different experience levels, a grey zone occurs among the levels of a square contingency table showing the inter-rater agreement. These grey zones distort the degree of agreement between raters and negatively impact the decisions based on the inter-rater agreement tables. In this sense, it is important to know how the existence of a grey zone impacts the inter-rater agreement coefficients to choose the most reliable agreement coefficient against the grey zones to reach out with more reliable decisions. Methods In this article, we propose two approaches to create grey zones in simulations setting and conduct an extensive Monte Carlo simulation study to figure out the impact of having grey zones on the weighted inter-rater agreement measures for ordinal tables over a comprehensive simulation space. Results The weighted inter-rater agreement coefficients are not reliable against the existence of grey zones. Increasing sample size and the number of categories in the agreement table decreases the accuracy of weighted inter-rater agreement measures when there is a grey zone. When the degree of agreement between the raters is high, the agreement measures are not significantly impacted by the existence of grey zones. However, if there is a medium to low degree of inter-rater agreement, all the weighted coefficients are more or less impacted. Conclusions It is observed in this study that the existence of grey zones has a significant negative impact on the accuracy of agreement measures especially for a low degree of true agreement and high sample and tables sizes. In general, Gwet’s AC2 and Brennan-Prediger’s κ with quadratic or ordinal weights are reliable against the grey zones.


Methodology ◽  
2007 ◽  
Vol 3 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juan Ramon Barrada ◽  
Julio Olea ◽  
Vicente Ponsoda

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.


1997 ◽  
Vol 78 (04) ◽  
pp. 1189-1192 ◽  
Author(s):  
Yvonne P Graafsma ◽  
Martin H Prins ◽  
Anthonie W A Lensing ◽  
Rob J de Haan ◽  
Menno V Huisman ◽  
...  

SummaryTo evaluate the bleeding classification in a recent trial on venous thrombosis treatment, a selection of reported bleeding episodes was adjudicated twice by an independent committee and graded by the treating physician and independent clinical experts on the clinical severity and impact on the patient’s life.The kappa values for the dichotomy major bleeding versus minor or no bleeding were 0.79 (95% CI, 0.57-1.0) for the agreement between the two members of the adjudication committee and 0.77 (95% CI, 0.52-1.0) for the agreement between both adjudication sessions. The kappa values for the dichotomy major or minor bleeding versus no bleeding were 0.42 and 0.44. The weighted kappa values for the agreement between the treating physician and the independent experts were 0.76 for the Clinical severity and 0.79 for the impact on the patient’s life (95% CI, 0.63-0.88 and 0.70-0.89). The association between the adjudication result expressed as major bleeding or minor or no bleeding and the Clinical grading by the treating physician resulted in an ROC curve with an area under the curve of 0.98 for the Clinical severity and 0.99 for the impact on the patient’s life. The dichotomy major or minor bleeding versus no bleeding resulted in areas under the curve of 0.70 and 0.66.In conCIusion, the applied criteria for major bleeding are reproducible and Clinically relevant. The criteria for minor bleeding are not reproducible and are less associated with the observed Clinical relevance.


Author(s):  
I.V. TORBINA ◽  
◽  
I.R. FARDEYEVA ◽  

The paper assesses the promising varieties of winter wheat in a competitive variety test by the main economic and biological characteristics that determine the suitability of the variety for commercial use. The object of research was the authors’ own breeding material. The experiments on the selection of winter wheat were made in the experimental crop rotation pattern of the Institute.


Author(s):  
John Hunsley ◽  
Eric J. Mash

Evidence-based assessment relies on research and theory to inform the selection of constructs to be assessed for a specific assessment purpose, the methods and measures to be used in the assessment, and the manner in which the assessment process unfolds. An evidence-based approach to clinical assessment necessitates the recognition that, even when evidence-based instruments are used, the assessment process is a decision-making task in which hypotheses must be iteratively formulated and tested. In this chapter, we review (a) the progress that has been made in developing an evidence-based approach to clinical assessment in the past decade and (b) the many challenges that lie ahead if clinical assessment is to be truly evidence-based.


Sign in / Sign up

Export Citation Format

Share Document