A practical application of analysing weighted kappa for panels of experts and EQA schemes in pathology

BackgroundKappa statistics are frequently used to analyse observer agreement for panels of experts and External Quality Assurance (EQA) schemes and generally treat all disagreements as total disagreement. However, the differences between ordered categories may not be of equal importance (eg, the difference between grades 1 vs 2 compared with 1 vs 3). Weighted kappa can be used to adjust for this when comparing a small number of readers, but this has not as yet been applied to the large number of readers typical of a national EQA scheme.AimTo develop and validate a method for applying weighted kappa to a large number of readers within the context of a real dataset: the UK National Urological Pathology EQA Scheme for prostatic biopsies.MethodsData on Gleason grade recorded by 19 expert readers were extracted from the fixed text responses of 20 cancer cases from four circulations of the EQA scheme. Composite kappa, currently used to compute an unweighted kappa for large numbers of readers, was compared with the mean kappa for all pairwise combinations of readers. Weighted kappa generalised for multiple readers was compared with the newly developed ‘pairwise-weighted’ kappa.ResultsFor unweighted analyses, the median increase from composite to pairwise kappa was 0.006 (range −0.005 to +0.052). The difference between the pairwise-weighted kappa and generalised weighted kappa for multiple readers never exceeded ±0.01.ConclusionPairwise-weighted kappa is a suitable and highly accurate approximation to weighted kappa for multiple readers.

Download Full-text

Dental Arch Relationship in Children with Complete Unilateral Cleft Lip and Palate following Warsaw (One-Stage Repair) and Oslo Protocols

The Cleft Palate-Craniofacial Journal ◽

10.1597/09-010.1 ◽

2009 ◽

Vol 46 (6) ◽

pp. 648-653 ◽

Cited By ~ 30

Author(s):

Piotr Fudalej ◽

Maria Hortis-Dzierzbicka ◽

Zofia Dudkiewicz ◽

Gunvor Semb

Keyword(s):

Cleft Lip ◽

Cleft Lip And Palate ◽

Weighted Kappa ◽

Dental Arch ◽

Kappa Statistics ◽

Consecutive Series ◽

One Stage ◽

Mother And Child ◽

The Difference ◽

Dental Arch Relationship

Objective: To compare the dental arch relationship following one-stage repair of unilateral cleft lip and palate (UCLP) in Warsaw with a matched sample of patients treated by the Oslo Cleft Team. Material: Study models of 61 children (mean age, 11.2; SD, 1.7) with a nonsyndromic complete UCLP consecutively treated with one-stage closure of the cleft at 9.2 months (range, 6.0 to 15.8 months; SD, 2.0) by the Warsaw Cleft Team at the Institute of Mother and Child, Poland, were compared with a sample drawn from a consecutive series of patients with UCLP treated by the Oslo Cleft Team and matched for age, gender, and soft tissue band. Methods: The study models were given random numbers to blind their origin. Four examiners rated the dental arch relationship using the GOSLON Yardstick. The strength of agreement of rating was assessed with weighted Kappa statistics. An independent t-test was carried out to compare the GOSLON scores between Warsaw and Oslo samples, and Fisher's exact tests were performed to evaluate the difference of distribution of the GOSLON scores. Results: The intrarater and interrater agreements were high (K ≥ .800). No difference in dental arch relationship between Warsaw and Oslo groups was found (mean GOSLON score = 2.68 and 2.65 for Warsaw and Oslo samples, respectively). The distribution of the GOSLON grades was similar in both groups. Conclusions: The dental arch relationship following one-stage repair (Warsaw protocol) was comparable with the outcome of the Oslo Cleft Team's protocol.

Download Full-text

Multi-Institutional Validation of the Predictive Power of the Hematopoietic Cell Transplantation Comorbidity Index (HCT-CI) for HCT Outcomes

Blood ◽

10.1182/blood.v118.21.145.145 ◽

2011 ◽

Vol 118 (21) ◽

pp. 145-145 ◽

Cited By ~ 3

Author(s):

Mohamed L. Sorror ◽

Fabiana Ostronoff ◽

Rainer Storb ◽

Smita Bhatia ◽

Richard T. Maziarz ◽

...

Keyword(s):

Sample Size ◽

Hematopoietic Cell Transplantation ◽

Proportional Hazards ◽

Conflicts Of Interest ◽

Weighted Kappa ◽

Observer Agreement ◽

Kappa Statistics ◽

Observer Variability ◽

Allogeneic Hct ◽

Inter Observer Variability

Abstract Abstract 145 In 2005, the HCT-CI was introduced as a weighted scoring system to predict mortality risk following allogeneic HCT. Since then, not all investigators were able to validate the HCT-CI after testing in their respective institutions. In 2007, a collaborative multi-institutional study was initiated to investigate 1) whether the HCT-CI was predictive of outcomes across different institutions, 2) the degree of homogeneity of outcome prediction, and 3) the reasons for lack of agreement among investigators. To this end, data were collected from 3347 consecutive patients (pts) treated with allogeneic HCT between 2000 and 2006 from HLA-matched related or unrelated donors at 5 institutions. All data were collected by a single investigator, blinded from the final outcomes of pts, to ensure consistent comorbidity coding. Numbers of pts, percentages of available comorbidity data, and other transplant and pt characteristics were statistically significantly different among institutions (Table 1). Pts missing comorbidity or other covariate data were excluded from further analyses, yielding a final sample size of 2523.Table 1:Pre-transplant risk factors among the five institutionsInstitutionsA (n=1073), %B (n=973), %C (n=336), %D (n=237), %E (n=206), %pMissing comorbidity data<1202623<0.001HCT-CI scores 02930324232<0.001 1,23428292822 ≥33743393046Donor Unrelated5038514031<0.001Age, years ≥504229472151<0.001Conditioning Regimens High-dose5367796746<0.001 Reduced-intensity1329101331 Nonmyeloablative344102123ATG in regimen11431514<0.001Diagnoses Myeloid6356595751<0.001 Lymphoid2841382546 Other cancers23131 Non-malignant diseases702154Disease risk High5962675167<0.001Stem cell source Marrow1919245610<0.001Pt CMV Positive5673706551<0.001KPS ≤802918303825<0.001Prior regimens ≥423222420300.25 Overall, pts with HCT-CI scores of 0 vs. 1–2 vs. ≥3 had 2-year non-relapse mortality (NRM) rates of 14%, 23%, and 39% (p <0.0001), respectively, and 2-year overall survival (OS) rates of 74%, 61%, and 39% (p <0.0001), respectively. Proportional hazards models were used to estimate the hazard ratio (HR) for NRM and OS associated with HCT-CI scores in each of the 5 institutions (Table 2). The models were adjusted for covariates in Table 1. Increased HCT-CI scores were associated with increases in the HR for NRM and OS across all 5 institutions and these increases were highly statistically significant except for institution E, which had the smallest sample size. Of note, the magnitudes of increases in HRs were not entirely comparable across institutions. In a unified model including all institutions, we found a statistically significant lack of homogeneity across institutions for the HRs associated with scores 1–2 (p=0.03) and ≥3 (p=0.04) for NRM and with scores ≥3 (p=0.01) for OS but not with scores 1–2 for OS (p=0.18). We also found a statistically significant, independent impact of institution on NRM (p=0.001) and OS (p<0.001).Table 2:Multivariate risk modelInstitutionsNRM HROverall survival HRHCT-CI scores01–2≥3p01–2≥3pA1.01.42.5<0.00011.01.362.23<0.0001B1.02.884.15<0.00011.01.882.77<0.0001C1.01.33.62<0.00011.01.333.28<0.0001D1.01.656.89<0.00011.01.845.81<0.0001E1.01.762.660.091.01.132.280.09 We then assessed, among 80 pts from institution A, the inter-observer variability in scoring comorbidity between two individual investigators and between each of them and unknown individuals from a pool of other evaluators. Weighted kappa statistics were highest (0.59) between two single evaluators and lowest between each and multiple evaluators (0.43 and 0.55, respectively). The principal investigator then developed a comprehensive guideline to code comorbidities and used it to train the other single investigator in a single session. Additional evaluation of inter-observer agreement demonstrated marked improvement of the weighted kappa statistic to 0.78. The reported disagreements on the validity of the HCT-CI may be explained by different institutional experiences in managing transplant pts, small number of pts at some institutions, and inter-observer variability in score assignment. The HCT-CI is valid to discriminate relative risks of mortalities after HCT across different institutions and should be used regularly for counseling pts and clinical trial design. Efforts to improve methods for coding comorbidity are in progress. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Measuring Agreement for Ordered Ratings in 3 x 3 Tables

Methods of Information in Medicine ◽

10.1055/s-0038-1634116 ◽

2006 ◽

Vol 45 (05) ◽

pp. 541-547 ◽

Cited By ~ 3

Author(s):

P. Aubas ◽

F. Seguret ◽

A. Kramar ◽

P. Dujols ◽

D. Neveu

Keyword(s):

Qualitative Agreement ◽

Kappa Statistic ◽

Weighted Kappa ◽

Kappa Statistics ◽

Data Sets ◽

Kappa Index ◽

Qualitative Variable ◽

Weighted Kappa Statistic ◽

The Difference ◽

Measuring Agreement

Summary Objectives: When two raters consider a qualitative variable ordered according to three categories, the qualitative agreement is commonly assessed with a symmetrically weighted kappa statistic. However, these statistics can present paradoxes, since they may be insensitive to variations of either complete agreements or disagreements. Methods: Agreement may be summarized by the relative amounts of complete agreements, partial and maximal disagreements beyond chance. Fixing the marginal totals and the trace, we computed symmetrically weighted kappa statistics and we developed a new statistic for qualitative agreements. Data sets from the literature were used to illustrate the methods. Results: We show that agreement may be better assessed with the unweighted kappa index, κc, and a new statistic ζ, which assesses the excess of maximal disagreements with respect to the partial ones, and does not depend on a particular weighting system. When ζis equal to zero, maximal and partial disagreements beyond chance are equal. With its estimated large sample variance, we compared the values of two contingency tables. Conclusions: The (κc, ζ) pair is sensitive to variations in agreements and/or disagreements and enables locating the difference between two qualitative agreements. The qualitative agreement is better with increasing values of κc and ζ.

Download Full-text

Reliability of the modified lateral pillar classification for Legg Calvé Perthes disease performed by a large group of international paediatric orthopaedic surgeons

Journal of Children s Orthopaedics ◽

10.1302/1863-2548.14.200055 ◽

2020 ◽

Vol 14 (6) ◽

pp. 529-536

Author(s):

Jennifer C. Laine ◽

Susan A. Novotny ◽

Stefan Huhnstock ◽

Andrew J. Ries ◽

John E. Tis ◽

...

Keyword(s):

Gold Standard ◽

Weighted Kappa ◽

Observer Agreement ◽

Perthes Disease ◽

Kappa Statistics ◽

Level Of Evidence ◽

Lateral Pillar ◽

Orthopaedic Surgeons ◽

Observer Reliability ◽

Paediatric Orthopaedic

Purpose The modified lateral pillar classification (mLPC) is used for prognostication in the fragmentation stage of Legg Calvé Perthes disease. Previous reliability assessments of mLPC range from fair to good agreement when evaluated by a small number of observers with pre-selected radiographs. The purpose of this study was to determine the inter-observer and intra-observer reliability of mLPC performed by a group of international paediatric orthopaedic surgeons. Surgeons self-selected the radiograph for mLPC assessment, as would be done clinically. Methods In total, 40 Perthes cases with serial radiographs were selected. For each case, 26 surgeons independently selected a radiograph and assigned mLPC and 21 raters re-evaluated the same 40 cases to establish intra-observer reliability. Rater performance was determined through surgeon consensus using the mode mLPC as ‘gold standard’. Inter-observer and intra-observer reliability data were analysed using weighted kappa statistics. Results The weighted kappa for inter-observer correlation for mLPC was 0.64 (95% confidence interval: 0.55 to 0.74) and was 0.82 (range: 0.35 to 0.99) for intra-observer correlation. Individual surgeon’s overall performance varied from 48% to 88% agreement. Surgeon mLPC performance was not influenced by years of experience (p = 0.51). Radiograph selection did not influence gold standard assignment of mLPC. There was greater agreement on cases of mild B hips and severe C hips. Conclusions mLPC has low good inter-observer agreement when performed by a large number of surgeons with varied experience. Surgeons frequently chose different radiographs, with no impact on mLPC agreement. Further refinement is needed to help differentiate hips on the border of group B and C. Level of evidence III

Download Full-text

ATTITUDE TO OPEN ACCESS IN RUSSIAN SCHOLARLY COMMUNITY: 2018. SURVEY RESULTS AND ANALYSIS

Scholarly Research and Information ◽

10.24108/2658-3143-2018-1-1-6-21 ◽

2018 ◽

Vol 1 (1) ◽

pp. 6-21 ◽

Cited By ~ 2

Author(s):

I. K. Razumova ◽

N. N. Litvinova ◽

M. E. Shvartsman ◽

A. Yu. Kuznetsov

Keyword(s):

Open Access ◽

Open Access Publishing ◽

Scholarly Community ◽

Research Areas ◽

Survey Results ◽

The Difference ◽

Policies And Programs ◽

The Uk ◽

High Level ◽

Access Policies

Introduction. The paper presents survey results on the awareness towards and practice of Open Access scholarly publishing among Russian academics.Materials and Methods. We employed methods of statistical analysis of survey results. Materials comprise results of data processing of Russian survey conducted in 2018 and published results of the latest international surveys. The survey comprised 1383 respondents from 182 organizations. We performed comparative studies of the responses from academics and research institutions as well as different research areas. The study compares results obtained in Russia with the recently published results of surveys conducted in the United Kingdom and Europe.Results. Our findings show that 95% of Russian respondents support open access, 94% agree to post their publications in open repositories and 75% have experience in open access publishing. We did not find any difference in the awareness and attitude towards open access among seven reference groups. Our analysis revealed the difference in the structure of open access publications of the authors from universities and research institutes. Discussion andConclusions. Results reveal a high level of awareness and support to open access and succeful practice in the open access publications in the Russian scholarly community. The results for Russia demonstrate close similarity with the results of the UK academics. The governmental open access policies and programs would foster the practical realization of the open access in Russia.

Download Full-text

THE VALUE OF ROBUST STATISTICAL FORECASTS IN THE COVID-19 PANDEMIC

National Institute Economic Review ◽

10.1017/nie.2021.9 ◽

2021 ◽

Vol 256 ◽

pp. 19-43

Author(s):

Jennifer L. Castle ◽

Jurgen A. Doornik ◽

David F. Hendry

Keyword(s):

Structural Change ◽

Measurement Errors ◽

Epidemiological Models ◽

Short Term ◽

Data Measurement ◽

Policy Interventions ◽

Changing Trends ◽

Technological Advances ◽

The Difference ◽

The Uk

The Covid-19 pandemic has put forecasting under the spotlight, pitting epidemiological models against extrapolative time-series devices. We have been producing real-time short-term forecasts of confirmed cases and deaths using robust statistical models since 20 March 2020. The forecasts are adaptive to abrupt structural change, a major feature of the pandemic data due to data measurement errors, definitional and testing changes, policy interventions, technological advances and rapidly changing trends. The pandemic has also led to abrupt structural change in macroeconomic outcomes. Using the same methods, we forecast aggregate UK unemployment over the pandemic. The forecasts rapidly adapt to the employment policies implemented when the UK entered the first lockdown. The difference between our statistical and theory based forecasts provides a measure of the effect of furlough policies on stabilising unemployment, establishing useful scenarios had furlough policies not been implemented.

Download Full-text

Gender inequality in COVID-19 times: evidence from UK prolific participants

Journal of Demographic Economics ◽

10.1017/dem.2021.2 ◽

2021 ◽

pp. 1-27

Author(s):

Sonia Oreffice ◽

Climent Quintana-Domeque

Keyword(s):

Mental Health ◽

Gender Inequality ◽

Gender Gap ◽

Gender Gaps ◽

Health Concerns ◽

Unemployment Rates ◽

Future State ◽

Multiple Dimensions ◽

The Difference ◽

The Uk

Abstract We investigate gender differences across multiple dimensions after 3 months of the first UK lockdown of March 2020, using an online sample of approximately 1,500 Prolific respondents’ residents in the UK. We find that women's mental health was worse than men along the four metrics we collected data on, that women were more concerned about getting and spreading the virus, and that women perceived the virus as more prevalent and lethal than men did. Women were also more likely to expect a new lockdown or virus outbreak by the end of 2020, and were more pessimistic about the contemporaneous and future state of the UK economy, as measured by their forecasted contemporaneous and future unemployment rates. We also show that between earlier in 2020 before the outbreak of the Coronavirus pandemic and June 2020, women had increased childcare and housework more than men. Neither the gender gaps in COVID-19-related health and economic concerns nor the gender gaps in the increase in hours of childcare and housework can be accounted for by a rich set of control variables. Instead, we find that the gender gap in mental health can be partially accounted for by the difference in COVID-19-related health concerns between men and women.

Download Full-text

Interchangeability of light and virtual microscopy for histopathological evaluation of prostate cancer

Scientific Reports ◽

10.1038/s41598-021-82911-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Renata Zelic ◽

Francesca Giunchi ◽

Luca Lianas ◽

Cecilia Mascia ◽

Gianluigi Zanetti ◽

...

Keyword(s):

Prostate Cancer ◽

Virtual Microscopy ◽

Observer Agreement ◽

Gleason Grade ◽

Gleason Grading ◽

Gleason Pattern ◽

Inter Observer Variability ◽

Histopathological Evaluation ◽

Good Repeatability ◽

Repeatability And Reproducibility

AbstractVirtual microscopy (VM) holds promise to reduce subjectivity as well as intra- and inter-observer variability for the histopathological evaluation of prostate cancer. We evaluated (i) the repeatability (intra-observer agreement) and reproducibility (inter-observer agreement) of the 2014 Gleason grading system and other selected features using standard light microscopy (LM) and an internally developed VM system, and (ii) the interchangeability of LM and VM. Two uro-pathologists reviewed 413 cores from 60 Swedish men diagnosed with non-metastatic prostate cancer 1998–2014. Reviewer 1 performed two reviews using both LM and VM. Reviewer 2 performed one review using both methods. The intra- and inter-observer agreement within and between LM and VM were assessed using Cohen’s kappa and Bland and Altman’s limits of agreement. We found good repeatability and reproducibility for both LM and VM, as well as interchangeability between LM and VM, for primary and secondary Gleason pattern, Gleason Grade Groups, poorly formed glands, cribriform pattern and comedonecrosis but not for the percentage of Gleason pattern 4. Our findings confirm the non-inferiority of VM compared to LM. The repeatability and reproducibility of percentage of Gleason pattern 4 was poor regardless of method used warranting further investigation and improvement before it is used in clinical practice.

Download Full-text

The contribution of changes to tax and social security to stalled life expectancy trends in Scotland: a modelling study

Journal of Epidemiology & Community Health ◽

10.1136/jech-2020-214770 ◽

2020 ◽

pp. jech-2020-214770

Author(s):

Elizabeth Richardson ◽

Martin Taulbut ◽

Mark Robinson ◽

Andrew Pulford ◽

Gerry McCartney

Keyword(s):

Life Expectancy ◽

Social Security ◽

Tax Credits ◽

The Difference ◽

Group Sex ◽

The Uk ◽

Social Security Benefits ◽

The Impact ◽

Welfare Reforms ◽

Detrimental Impact

BackgroundLife expectancy (LE) improvements have stalled, and UK tax and welfare ‘reforms’ have been proposed as a cause. We estimated the effects of tax and welfare reforms from 2010/2011 to 2021/2022 on LE and inequalities in LE in Scotland.MethodsWe applied a published estimate of the cumulative income impact of the reforms to the households within Scottish Index of Multiple Deprivation (SIMD) quintiles. We estimated the impact on LE by applying a rate ratio for the impact of income on mortality rates (by age group, sex and SIMD quintile) and calculating the difference between inflation-only changes in benefits and the reforms.ResultsWe estimated that changes to household income resulting from the reforms would result in an additional 1041 (+3.7%) female deaths and 1013 (+3.8%) male deaths. These deaths represent an estimated reduction of female LE from 81.6 years to 81.2 years (−20 weeks), and male LE from 77.6 years to 77.2 years (−23 weeks). Cuts to benefits and tax credits were modelled to have the most detrimental impact on LE, and these were estimated to be most severe in the most deprived areas. The modelled impact on inequalities in LE was widening of the gap between the most and least deprived 20% of areas by a further 21 weeks for females and 23 weeks for males.InterpretationThis study provides further evidence that austerity, in the form of cuts to social security benefits, is likely to be an important cause of stalled LE across the UK.

Download Full-text

The one difference that ‘makes all the difference?’: schooling and the politics of identity in the UK

European Journal of Intercultural studies ◽

10.1080/0952391930030208 ◽

1993 ◽

Vol 3 (2-3) ◽

pp. 81-89 ◽

Cited By ~ 1

Author(s):

Bob Carter ◽

Marcia Green ◽

Ranjit Sondhi

Keyword(s):

Politics Of Identity ◽

The Difference ◽

The One ◽

The Uk

Download Full-text