measuring agreement Latest Research Papers

Many methods for measuring agreement among raters have been proposed and applied in many domains in the areas of education, psychology, sociology, and medical research. A brief overview of the most used measures of interrater absolute agreements for ordinal rating scales is provided, and a new index is proposed that has several advantages. In particular, the new index allows to evaluate the agreement between raters for each single case (subject or object), and to obtain also a global measure of the interrater agreement for the whole group of cases evaluated. The possibility of having evaluations of the agreement on the single case is particularly useful, for example, in situations where the rating scale is being tested, and it is necessary to identify any changes to it, or to request the raters for a specific comparison on the single case in which the disagreement occurred. The index is not affected by the possible concentration of ratings on a very small number of levels of the ordinal scale.

Download Full-text

On measuring agreement with numerically bounded linguistic probability schemes: A re-analysis of data from Wintle, Fraser, Wills, Nicholson, and Fidler (2019)

PLoS ONE ◽

10.1371/journal.pone.0248424 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248424

Author(s):

David R. Mandel ◽

Daniel Irwin

Keyword(s):

Intelligence Community ◽

Shared Understanding ◽

Alternative Measure ◽

Methodological Choices ◽

Wide Range ◽

The Us ◽

The Status ◽

Presentation Formats ◽

Positive Attribute ◽

Measuring Agreement

Across a wide range of domains, experts make probabilistic judgments under conditions of uncertainty to support decision-making. These judgments are often conveyed using linguistic expressions (e.g., x is likely). Seeking to foster shared understanding of these expressions between senders and receivers, the US intelligence community implemented a communication standard that prescribes a set of probability terms and assigns each term an equivalent numerical probability range. In an earlier PLOS ONE article, [1] tested whether access to the standard improves shared understanding and also explored the efficacy of various enhanced presentation formats. Notably, they found that embedding numeric equivalents in text (e.g., x is likely [55–80%]) substantially outperformed the status-quo approach in terms of the percentage overlap between participants’ interpretations of linguistic probabilities (defined in terms of the numeric range equivalents they provided for each term) and the numeric ranges in the standard. These results have important prescriptive implications, yet Wintle et al.’s percentage overlap measure of agreement may be viewed as unfairly punitive because it penalizes individuals for being more precise than the stipulated guidelines even when the individuals’ interpretations fall perfectly within the stipulated ranges. Arguably, subjects’ within-range precision is a positive attribute and should not be penalized in scoring interpretive agreement. Accordingly, in the present article, we reanalyzed Wintle et al.’s data using an alternative measure of percentage overlap that does not penalize in-range precision. Using the alternative measure, we find that percentage overlap is substantially elevated across conditions. More importantly, however, the effects of presentation format and probability level are highly consistent with the original study. By removing the ambiguity caused by Wintle et al.’s unduly punitive measure of agreement, these findings buttress Wintle et al.’s original claim that the methods currently used by intelligence organizations are ineffective at coordinating the meaning of uncertainty expressions between intelligence producers and intelligence consumers. Future studies examining agreement between senders and receivers are also encouraged to reflect carefully on the most appropriate measures of agreement to employ in their experiments and to explicate the bases for their methodological choices.

Download Full-text

krippendorffsalpha: An R Package for Measuring Agreement Using Krippendorff's Alpha Coefficient

The R Journal ◽

10.32614/rj-2021-046 ◽

2021 ◽

Vol 13 (1) ◽

pp. 413

Author(s):

John Hughes

Keyword(s):

R Package ◽

Alpha Coefficient ◽

Measuring Agreement

Download Full-text

On measuring agreement with numerically bounded linguistic probability schemes: A re-analysis of data from Wintle, Fraser, Wills, Nicholson, and Fidler (2019)

10.31234/osf.io/6atw5 ◽

2020 ◽

Author(s):

David R. Mandel ◽

Daniel Irwin

Keyword(s):

Intelligence Community ◽

Shared Understanding ◽

Alternative Measure ◽

Wide Range ◽

The Us ◽

The Status ◽

Presentation Formats ◽

Positive Attribute ◽

Measuring Agreement ◽

Support Decision Making

Across a wide range of domains, experts make probabilistic judgments under conditions of uncertainty to support decision-making. These judgments are often conveyed using linguistic expressions (e.g., x is likely). Seeking to foster shared understanding of these expressions between senders and receivers, the US intelligence community implemented a communication standard that prescribes a set of probability terms and assigns each term an equivalent numerical probability range. Wintle et al. (2019) tested whether access to the standard improves shared understanding and also explored the efficacy of various enhanced presentation formats. Notably, they found that embedding numeric equivalents in text (e.g., x is likely [55-80%]) substantially outperformed the status quo approach in terms of the percentage overlap between participants’ interpretations of linguistic probabilities (defined in terms of the numeric range equivalents they provided for each term) and the numeric ranges in the standard. These results have important prescriptive implications, yet Wintle et al.’s percentage overlap measure of agreement may be viewed as unfairly punitive since it penalizes individualsfor being more precise than the stipulated guidelines even when the individuals’ interpretations fall perfectly within the stipulated ranges. Arguably, within-range precision is a positive attribute. Accordingly, we reanalyzed Wintle et al.’s data using an alternative measure of percentage overlap that does not penalize in-range precision. Predictably, we find that percentage overlap is elevated across conditions. More importantly, the effects of presentation format and probability level are highly consistent with the original study. By removing further ambiguity, these findings buttress the claim that the methods currently used by intelligence organizations are ineffective at coordinating the meaning of uncertainty expressions between intelligence producers and intelligence consumers.

Download Full-text