scholarly journals Weighting Schemes and Incomplete Data: A Generalized Bayesian Framework for Chance-Corrected Interrater Agreement

2021 ◽  
Author(s):  
Rutger Van Oest ◽  
Jeffrey M. Girard

Van Oest (2019) developed a framework to assess interrater agreement for nominal categories and complete data. We generalize this framework to all four situations of nominal or ordinal categories and complete or incomplete data. The mathematical solution yields a chance-corrected agreement coefficient that accommodates any weighting scheme for penalizing rater disagreements and any number of raters and categories. By incorporating Bayesian estimates of the category proportions, the generalized coefficient also captures situations in which raters classify only subsets of items; that is, incomplete data. Furthermore, this coefficient encompasses existing chance-corrected agreement coefficients: the S-coefficient, Scott’s pi, Fleiss’ kappa, and Van Oest’s uniform prior coefficient, all augmented with a weighting scheme and the option of incomplete data. We use simulation to compare these nested coefficients. The uniform prior coefficient tends to perform best, in particular, if one category has a much larger proportion than others. The gap with Scott’s pi and Fleiss’ kappa widens if the weighting scheme becomes more lenient to small disagreements and often if more item classifications are missing; missingness biases play a moderating role. The uniform prior coefficient usually performs much better than the S-coefficient, but the S-coefficient sometimes performs best for small samples, missing data, and lenient weighting schemes. The generalized framework implies a new interpretation of chance-corrected weighted agreement coefficients: These coefficients estimate the probability that both raters in a pair assign an item to its correct category without guessing. Whereas Van Oest showed this interpretation for unweighted agreement, we generalize to weighted agreement.

1970 ◽  
Vol 60 (4) ◽  
pp. 1291-1296
Author(s):  
W. H. Bakun ◽  
A. Eisenberg

abstract Computation of Fourier transforms of oscillatory signals by refined quadrature rules using uneven weighting schemes leads to spurious spectral estimates. The degree of spectral contamination can be estimated as a function of frequency by an analysis in terms of aliasing. In particular, the extended Simpson's rule introduces a secondary −13 amplitude aliasing across a folding frequency equal to 12 the Nyquist frequency. This secondary folding frequency is clearly related to the pseudo-sampling interval introduced by the quadrature weighting scheme. Quadrature-introduced aliasing results in spectral contamination within the principal band (0 to fN Hz). An example has been given to demonstrate the quadrature-introduced aliasing and discussed in terms of seismic signals.


2021 ◽  
pp. 1-12
Author(s):  
K. Seethappan ◽  
K. Premalatha

Although there have been various researches in the detection of different figurative language, there is no single work in the automatic classification of euphemisms. Our primary work is to present a system for the automatic classification of euphemistic phrases in a document. In this research, a large dataset consisting of 100,000 sentences is collected from different resources for identifying euphemism or non-euphemism utterances. In this work, several approaches are focused to improve the euphemism classification: 1. A Combination of lexical n-gram features 2.Three Feature-weighting schemes 3.Deep learning classification algorithms. In this paper, four machine learning (J48, Random Forest, Multinomial Naïve Bayes, and SVM) and three deep learning algorithms (Multilayer Perceptron, Convolutional Neural Network, and Long Short-Term Memory) are investigated with various combinations of features and feature weighting schemes to classify the sentences. According to our experiments, Convolutional Neural Network (CNN) achieves precision 95.43%, recall 95.06%, F-Score 95.25%, accuracy 95.26%, and Kappa 0.905 by using a combination of unigram and bigram features with TF-IDF feature weighting scheme in the classification of euphemism. These results of experiments show CNN with a strong combination of unigram and bigram features set with TF-IDF feature weighting scheme outperforms another six classification algorithms in detecting the euphemisms in our dataset.


Author(s):  
William P Fox ◽  
Gregory Spence ◽  
Reed Kitchen ◽  
Steven Powell

This article compares the entropy weight scheme to other subjective weighting schemes using various multi-attribute decision making criteria. We apply the entropy weighting scheme to improve the CARVER center of gravity analysis and targeting analysis that are currently used by Special Operations Forces. We also compare the entropy weighing schemes to other weighting schemes using the ranking of terrorist for targeting. Next, we apply several multi-attribute decision making (MADM) methods using our suggested various weighting schemes to obtain the rankings of the alternatives. We compare the results and provide sensitivity analysis to examine the robustness of each MADM analysis. We conclude that any decision methodology for CARVER and terrorist ranking that used the actual data collected to compute the weights might be a better method than subjective weights.


2012 ◽  
Vol 52 (No. 11) ◽  
pp. 510-515
Author(s):  
M. Prášilová ◽  
J. Šulc

Price indices of agricultural producers and forestry price indices were revised in the year 2000. The process in question involved a revision of the set of commodities and re-basing of the current base period to the new base period effective for the calculation of agricultural price indices. The weighting scheme of products and their aggregated groups has been changed. The contribution deals with the analysis of a new project regarding the construction of price indices of agricultural producers according to the Eurostat methodology. This project shows different approaches in measuring the weighting schemes of seasonal products and products without seasonality.


2020 ◽  
Author(s):  
Julie Inge-Marie Helene Borchsenius ◽  
Rasmus Hasselbalch ◽  
Morten Lind ◽  
Lisbet Ravn ◽  
Thomas Kallemose ◽  
...  

Abstract Introduction Systematic triage is performed in the Emergency Department (ED) to assess the urgency of care for each patient. The Copenhagen Triage Algorithm (CTA) is a newly developed, evidence-based triage system, however the interrater agreement remains unknown. Method This was a prospective cohort study. The collection of data was conducted in the three sections (Acute/Cardiology, Medicine and Surgery) of the ED of Herlev Hospital. Patients were assessed independently by two different nurses using CTA. The interrater variability of CTA was calculated using Fleiss kappa. The analysis was stratified according to less or more than 2 years of ED experience. Results A total of 110 patients were included of which 10 were excluded due to incomplete data. The raters agreed on triage category 80 % of the time corresponding to a kappa value of 0.70 (95% confidence interval 0.57-0.83). Stratified on ED sections, the agreement was 83 % in the Acute/Cardiology section corresponding to a kappa value of 0.73 (0.55-0.91), 79 % in the Medicine section corresponding to a kappa value of 0.64 (0.39-0.89) and 0.56 % in the Surgery section corresponding to a kappa value of 0.56 (0.21-0.90). The experienced raters had an interrater agreement of 0.73 (0.56-0.90), while the less experienced raters had an agreement of 0.76, (0.28-1.24). Conclusion A substantial interrater agreement was found for the Copenhagen triage algorithm.


2019 ◽  
Vol 55 (5) ◽  
pp. 704-721
Author(s):  
Keenan A. Pituch ◽  
Megha Joshi ◽  
Molly E. Cain ◽  
Tiffany A. Whittaker ◽  
Wanchen Chang ◽  
...  

Author(s):  
Chao Hu ◽  
Byeng D. Youn ◽  
Pingfeng Wang

The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust, i.e., it may be less accurate when the real data acquired after the deployment differs from the testing data; (ii) it wastes the resources for constructing the algorithms that are discarded in the deployment; (iii) it requires the testing data in addition to the training data, which increases the overall expenses for the algorithm selection. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely, the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms for data-driven prognostics. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. Two case studies were employed to demonstrate the effectiveness of the proposed prognostic approach. The results suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.


Crisis ◽  
2013 ◽  
Vol 34 (6) ◽  
pp. 434-437 ◽  
Author(s):  
Donald W. MacKenzie

Background: Suicide clusters at Cornell University and the Massachusetts Institute of Technology (MIT) prompted popular and expert speculation of suicide contagion. However, some clustering is to be expected in any random process. Aim: This work tested whether suicide clusters at these two universities differed significantly from those expected under a homogeneous Poisson process, in which suicides occur randomly and independently of one another. Method: Suicide dates were collected for MIT and Cornell for 1990–2012. The Anderson-Darling statistic was used to test the goodness-of-fit of the intervals between suicides to distribution expected under the Poisson process. Results: Suicides at MIT were consistent with the homogeneous Poisson process, while those at Cornell showed clustering inconsistent with such a process (p = .05). Conclusions: The Anderson-Darling test provides a statistically powerful means to identify suicide clustering in small samples. Practitioners can use this method to test for clustering in relevant communities. The difference in clustering behavior between the two institutions suggests that more institutions should be studied to determine the prevalence of suicide clustering in universities and its causes.


Sign in / Sign up

Export Citation Format

Share Document