scholarly journals Reliability of the modified lateral pillar classification for Legg Calvé Perthes disease performed by a large group of international paediatric orthopaedic surgeons

2020 ◽  
Vol 14 (6) ◽  
pp. 529-536
Author(s):  
Jennifer C. Laine ◽  
Susan A. Novotny ◽  
Stefan Huhnstock ◽  
Andrew J. Ries ◽  
John E. Tis ◽  
...  

Purpose The modified lateral pillar classification (mLPC) is used for prognostication in the fragmentation stage of Legg Calvé Perthes disease. Previous reliability assessments of mLPC range from fair to good agreement when evaluated by a small number of observers with pre-selected radiographs. The purpose of this study was to determine the inter-observer and intra-observer reliability of mLPC performed by a group of international paediatric orthopaedic surgeons. Surgeons self-selected the radiograph for mLPC assessment, as would be done clinically. Methods In total, 40 Perthes cases with serial radiographs were selected. For each case, 26 surgeons independently selected a radiograph and assigned mLPC and 21 raters re-evaluated the same 40 cases to establish intra-observer reliability. Rater performance was determined through surgeon consensus using the mode mLPC as ‘gold standard’. Inter-observer and intra-observer reliability data were analysed using weighted kappa statistics. Results The weighted kappa for inter-observer correlation for mLPC was 0.64 (95% confidence interval: 0.55 to 0.74) and was 0.82 (range: 0.35 to 0.99) for intra-observer correlation. Individual surgeon’s overall performance varied from 48% to 88% agreement. Surgeon mLPC performance was not influenced by years of experience (p = 0.51). Radiograph selection did not influence gold standard assignment of mLPC. There was greater agreement on cases of mild B hips and severe C hips. Conclusions mLPC has low good inter-observer agreement when performed by a large number of surgeons with varied experience. Surgeons frequently chose different radiographs, with no impact on mLPC agreement. Further refinement is needed to help differentiate hips on the border of group B and C. Level of evidence III

2018 ◽  
Vol 12 (2) ◽  
pp. 160-166 ◽  
Author(s):  
A. Lam ◽  
S. A. Boenerjous ◽  
Y. Lo ◽  
J. M. Abzug ◽  
J. Kurian ◽  
...  

Purpose To evaluate sensitivity, specificity and accuracy of a radiographic slipped capital femoral epiphysis (SCFE)-diagnosis among medical specialists. Methods Three paediatricians, three paediatric radiologists and three paediatric orthopaedic surgeons completed two rounds of a survey of anteroposterior and frog-leg lateral radiographs of patients with a diagnosis of SCFE (25), femoroacetabular impingement (four), Legg-Calvé-Perthes (11) or no hip pathology (ten). Intra- and interobserver agreement among specialties regarding the diagnosis of a SCFE were assessed using Cohen’s kappa coefficient (κ). Diagnostic accuracy of SCFE relative to the benchmark, a combination of the radiographic diagnosis based on Klein’s line, clinical symptoms and surgical treatment, was assessed computing sensitivity, specificity and accuracy. Results Intraobserver agreement between the surveys was moderate among paediatricians (κ-range, 0.44 to 0.52), moderate to almost perfect among orthopaedic surgeons (κ-range, 0.79 to 0.88) and almost perfect among paediatric radiologists (κ-range, 0.83 to 1.00). Interobserver agreement for survey 1 and 2 was slight among paediatricians (mean κ, 0.19), substantial among orthopaedic surgeons (mean κ, 0.77) and almost perfect among paediatric radiologists (mean κ, 0.86). Sensitivity of SCFE-diagnosis was high among radiologists and orthopaedic surgeons (88% to 100% for both specialties), but lower for paediatricians (24% to 76%). Specificity was high among radiologists and orthopaedic surgeons (72% to 84%), however, variable among paediatricians (56% to 80%). Accuracy of a SCFE-diagnosis was highest in radiologists (84% to 92%), followed by orthopaedic surgeons (80% to 88%) and paediatricians (48% to 78%). Conclusion SCFE can be detected on radiographs by different medical specialties. Intra- and interobserver agreement, specificity, sensitivity and accuracy for radiographic SCFE-diagnosis amongst paediatric radiologists and orthopaedic surgeons are better than that of general paediatricians. Level of Evidence II


Author(s):  
Andrew Z. Mo ◽  
Patricia E. Miller ◽  
Javier Pizones ◽  
Ilkka Helenius ◽  
Michael Ruf ◽  
...  

Purpose To evaluate the AOSpine Thoracolumbar Spine Injury Classification System and if it is reliable and reproducible when applied to the paediatric population globally. Methods A total of 12 paediatric orthopaedic surgeons were asked to review MRI and CT imaging of 25 paediatric patients with thoracolumbar spine traumatic injuries, in order to determine the classification of the lesions observed. The evaluators classified injuries into primary categories: A, B and C. Interobserver reliability was assessed for the initial reading by Fleiss’s kappa coefficient (kF) along with 95% confidence intervals (CI). For A and B type injuries, sub-classification was conducted including A0-A4 and B1-B2 subtypes. Interobserver reliability across subclasses was assessed using Krippendorff’s alpha (αk) along with bootstrapped 95% CIs. A second round of classification was performed one-month later. Intraobserver reproducibility was assessed for the primary classifications using Fleiss’s kappa and sub-classification reproducibility was assessed by Krippendorff’s alpha (αk) along with 95% CIs. Results In total, 25 cases were read for a total of 300 initial and 300 repeated evaluations. Adjusted interobserver reliability was almost perfect (kF = 0.74; 95% CI 0.71 to 0.78) across all observers. Sub-classification reliability was substantial (αk= 0.67; 95% CI 0.51 to 0.81), Adjusted intraobserver reproducibility was almost perfect (kF = 0.91; 95% CI 0.83 to 0.99) for both primary classifications and for sub-classifications (αk = 0.88; 95% CI 0.83 to 0.93). Conclusion The inter- and intraobserver reliability for the AOSpine Thoracolumbar Spine Injury Classification System was high amongst paediatric orthopaedic surgeons. The AOSpine Thoracolumbar Spine Injury Classification System is a promising option as a uniform fracture classification in children. Level of Evidence III


2020 ◽  
Author(s):  
Chongqing Xu ◽  
Mengchen Yin ◽  
Wen Mo

Abstract Background Neck pain, sensory disturbance and motor dysfunction in most patients suffered cervical spondylotic myelopathy (CSM). For CSM surgery, it is necessary to evaluate preoperative inter-vertebral disc degeneration (IDD) which determines whether to adopt fusion strategy, and postoperative IDD which is one of the main reasons for reoperation. Modified Pfirrmann grading system is commonly used to evaluate IDD. The objective of this study is to evaluate its reliability and reproducibility on cervical IDD in CSM patients, and to explore its clinical application value. Methods/Design: All 165 patients with CSM were enrolled. 6 physicians (3 spine surgeons and 3 radiologists) who have certain clinical experience were selected. They graded cervical inter-vertebral disc according to modified Pfirrmann grading system, we used intra-class correlation coefficient (ICC) and weighted kappa (wκ) to assess the inter- and intra-observer agreement. After 12 weeks, we repeated the analysis. Results The inter-observer reliability of modified Pfirrmann grading system was excellent with ICC value of 0.76 and near perfect with wκ value of 0.82. The intra-observer reproducibility of modified Pfirrmann grading system was excellent with ICC values ranging from 0.80–0.91, and near perfect with wκ values ranging from 0.83–0.92. Conclusion Modified Pfirrmann grading system has excellent inter-observer reliability and intra-observer reproducibility on cervical IDD in CSM. In addition, it indicates a good appliance among spine surgeons and radiologists, clinical and radiological studies applying it should be deemed accurate. Thus, modified Pfirrmann grading system can be widely used as an appropriate instrument in clinical care.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 145-145 ◽  
Author(s):  
Mohamed L. Sorror ◽  
Fabiana Ostronoff ◽  
Rainer Storb ◽  
Smita Bhatia ◽  
Richard T. Maziarz ◽  
...  

Abstract Abstract 145 In 2005, the HCT-CI was introduced as a weighted scoring system to predict mortality risk following allogeneic HCT. Since then, not all investigators were able to validate the HCT-CI after testing in their respective institutions. In 2007, a collaborative multi-institutional study was initiated to investigate 1) whether the HCT-CI was predictive of outcomes across different institutions, 2) the degree of homogeneity of outcome prediction, and 3) the reasons for lack of agreement among investigators. To this end, data were collected from 3347 consecutive patients (pts) treated with allogeneic HCT between 2000 and 2006 from HLA-matched related or unrelated donors at 5 institutions. All data were collected by a single investigator, blinded from the final outcomes of pts, to ensure consistent comorbidity coding. Numbers of pts, percentages of available comorbidity data, and other transplant and pt characteristics were statistically significantly different among institutions (Table 1). Pts missing comorbidity or other covariate data were excluded from further analyses, yielding a final sample size of 2523.Table 1:Pre-transplant risk factors among the five institutionsInstitutionsA (n=1073), %B (n=973), %C (n=336), %D (n=237), %E (n=206), %pMissing comorbidity data<1202623<0.001HCT-CI scores    02930324232<0.001    1,23428292822    ≥33743393046Donor    Unrelated5038514031<0.001Age, years    ≥504229472151<0.001Conditioning Regimens    High-dose5367796746<0.001    Reduced-intensity1329101331    Nonmyeloablative344102123ATG in regimen11431514<0.001Diagnoses    Myeloid6356595751<0.001    Lymphoid2841382546    Other cancers23131    Non-malignant diseases702154Disease risk    High5962675167<0.001Stem cell source    Marrow1919245610<0.001Pt CMV    Positive5673706551<0.001KPS    ≤802918303825<0.001Prior regimens    ≥423222420300.25 Overall, pts with HCT-CI scores of 0 vs. 1–2 vs. ≥3 had 2-year non-relapse mortality (NRM) rates of 14%, 23%, and 39% (p <0.0001), respectively, and 2-year overall survival (OS) rates of 74%, 61%, and 39% (p <0.0001), respectively. Proportional hazards models were used to estimate the hazard ratio (HR) for NRM and OS associated with HCT-CI scores in each of the 5 institutions (Table 2). The models were adjusted for covariates in Table 1. Increased HCT-CI scores were associated with increases in the HR for NRM and OS across all 5 institutions and these increases were highly statistically significant except for institution E, which had the smallest sample size. Of note, the magnitudes of increases in HRs were not entirely comparable across institutions. In a unified model including all institutions, we found a statistically significant lack of homogeneity across institutions for the HRs associated with scores 1–2 (p=0.03) and ≥3 (p=0.04) for NRM and with scores ≥3 (p=0.01) for OS but not with scores 1–2 for OS (p=0.18). We also found a statistically significant, independent impact of institution on NRM (p=0.001) and OS (p<0.001).Table 2:Multivariate risk modelInstitutionsNRM HROverall survival HRHCT-CI scores01–2≥3p01–2≥3pA1.01.42.5<0.00011.01.362.23<0.0001B1.02.884.15<0.00011.01.882.77<0.0001C1.01.33.62<0.00011.01.333.28<0.0001D1.01.656.89<0.00011.01.845.81<0.0001E1.01.762.660.091.01.132.280.09 We then assessed, among 80 pts from institution A, the inter-observer variability in scoring comorbidity between two individual investigators and between each of them and unknown individuals from a pool of other evaluators. Weighted kappa statistics were highest (0.59) between two single evaluators and lowest between each and multiple evaluators (0.43 and 0.55, respectively). The principal investigator then developed a comprehensive guideline to code comorbidities and used it to train the other single investigator in a single session. Additional evaluation of inter-observer agreement demonstrated marked improvement of the weighted kappa statistic to 0.78. The reported disagreements on the validity of the HCT-CI may be explained by different institutional experiences in managing transplant pts, small number of pts at some institutions, and inter-observer variability in score assignment. The HCT-CI is valid to discriminate relative risks of mortalities after HCT across different institutions and should be used regularly for counseling pts and clinical trial design. Efforts to improve methods for coding comorbidity are in progress. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Vol 40 (9) ◽  
pp. 931-937 ◽  
Author(s):  
Lara S van de Lande ◽  
Ben M Eyck ◽  
Jelle J Mooij ◽  
Hieronymus P Stevens ◽  
Joris A van Dongen

Abstract Background Aging of the neck results in an increased cervicomental angle, which can be treated by various surgical and nonsurgical procedures. To measure the success of these procedures, standardized validated objective photographic measurement tools are needed. However, no online standardized photographic measurement tools exist for the assessment of the cervicomental angle. Objectives The purpose of this study was to establish a validated and reliable measurement tool for the assessment of the cervicomental angle based on the Rainbow Scale. Methods A 5-point photographic rating scale was developed and created from 1 photograph with Adobe Photoshop. Fifteen reference photographs of women, 3 photographs per grade, were included for validation. Seven panelists (ie, plastic and maxillofacial surgeons) assessed the reference photographs 3 times with a minimal interval of 3 days in an online survey. Intra- and inter-observer agreements were calculated utilizing the weighted kappa coefficient. Results Mean intra-observer agreement was 0.93 (0.78-1.00). Mean interobserver agreement was 0.796 (0.574-0.961) for survey 1, 0.868 (0.690-0.960) for survey 2, and 0.820 (0.676-0.959) for survey 3. Conclusions The Rainbow Scale for the assessment of the cervicomental angle has been validated in an online fashion. The scale is reproducible and reliable and requires no learning curve. Potential applications include objective assessment of neck treatment planning and surgical outcome. Level of Evidence: 4


2019 ◽  
Vol 13 (6) ◽  
pp. 569-574 ◽  
Author(s):  
T. L. Teo ◽  
E. K. Schaeffer ◽  
E. Habib ◽  
A. Cherukupalli ◽  
A. P. Cooper ◽  
...  

Purpose The Gartland extension-type supracondylar humerus (SCH) fracture is the most common paediatric elbow fracture. Treatment options range from nonoperative treatment (taping or casting) to operative treatments (closed reduction and percutaneous pinning or open reduction). Classification variability between surgeons is a potential contributing factor to existing controversy over treatment options for type II SCH fractures. This study investigated levels of agreement in extension-type SCH fracture classification using the modified Gartland classification system. Methods A retrospective review was conducted on 60 patients aged between two and 12 years who had sustained an extension-type SCH fracture and received operative or nonoperative treatment at a tertiary children’s hospital. Baseline radiographs were provided, and surgeons were asked to classify the fractures as type I, IIA, IIB or III according to the modified Gartland classification. Respondents were then asked to complete a second round of classifications using reshuffled radiographs. Weighted kappa values were calculated to assess interobserver and intraobserver levels of agreement. Results In all, 21 paediatric orthopaedic surgeons responded to the survey and 15 completed a second round of ratings. Interobserver agreement for classification based on the Gartland criteria between surgeons was substantial with a kappa of 0.679 (95% confidence interval (CI) 0.501 to 0.873). Intraobserver agreement was substantial with a kappa of 0.796, (95% CI 0.628 to 0.864) Conclusion Radiographic classification of extension-type SCH fractures demonstrated substantial agreement both between and within surgeon raters. Therefore, classification variability may not be a major contributing factor to the treatment controversy for type II SCH fractures and treatment variability may be due to differences in surgeon preferences. Level of Evidence III


2020 ◽  
Vol 11 (1) ◽  
pp. 5
Author(s):  
Willem Paul Gielis ◽  
Harrie Weinans ◽  
Frank J. Nap ◽  
Frank W. Roemer ◽  
Wouter Foppen

A standardized method to assess structural osteoarthritis (OA) burden thorough the body lacks from literature. Such a method can be valuable in developing personalized treatments for OA. We developed a reliable scoring system to evaluate OA in large joints and the spine—the OsteoArthritis Computed Tomography (OACT) score, using a convenience sample of 197 whole-body low-dose non-contrast CTs. An atlas, containing example images as reference points for training and scoring, are presented. Each joint was graded between 0–3. The total OA burden was calculated by summing scores of individual joints. Intra- and inter-observer reliability was tested 25 randomly selected scans (N = 600 joints). Intra-observer reliability and inter-observer reliability between three observers was assessed using intraclass correlation coefficient (ICC) and square-weighted kappa statistics. The square-weighted kappa for intra-observer reliability for OACT-score at joint-level ranged from 0.79 to 0.95; the ICC for the total OA grade was 0.97 (95%-CI, 0.94 to 0.99). Square-weighted kappa for interobserver reliability ranged from 0.48 to 0.95; the ICC for the total OA grade was 0.95 (95%-CI, 0.90 to 0.98). The OACT score, a new reproducible CT-based grading system reflecting OA burden in large joints and the spine, has a satisfactory reproducibility. The atlas can be used for research purposes, training, educational purposes and systemic grading of OA on CT-scans.


2015 ◽  
Vol 62 (3) ◽  
pp. 117-121
Author(s):  
Ishita Gupta ◽  
Astha Chaudhry ◽  
Solanki Savita ◽  
Arvind Shetti

Abstract Introduction The objective of this study was to compare two radiographic methods - digital intraoral and digital panoramic radiography in assessing marginal bone level around dental implants. The study also evaluated inter-observer and intra-observer reliability during repeated assessments. Material and Methods Marginal bone around 29 implants in 17 patients was assessed using standardized digital intraoral and digital panoramic radiographs. Two observers evaluated bone level by noting the thread at which marginal bone seemed to be attached at distal and mesial surfaces of the implants. The assessments were repeated after one week. Kappa statistics was used to evaluate agreement between assessments, observers, and radiographical methods. Results The agreement rate between digital intraoral and digital panoramic radiography was fair. Intra-observer agreement was very good, while inter-observer agreement was moderate. Conclusion Digital panoramic radiographs can be used to evaluate marginal bone level in patients with multiple implants and also to supplement intraoral radiographs. However, observer variability should be considered when comparing values from follow up studies for implant maintenance.


2011 ◽  
Vol 64 (3) ◽  
pp. 257-260 ◽  
Author(s):  
Karen C Wright ◽  
Patricia Harnden ◽  
Sue Moss ◽  
Dan M Berney ◽  
Jane Melia

BackgroundKappa statistics are frequently used to analyse observer agreement for panels of experts and External Quality Assurance (EQA) schemes and generally treat all disagreements as total disagreement. However, the differences between ordered categories may not be of equal importance (eg, the difference between grades 1 vs 2 compared with 1 vs 3). Weighted kappa can be used to adjust for this when comparing a small number of readers, but this has not as yet been applied to the large number of readers typical of a national EQA scheme.AimTo develop and validate a method for applying weighted kappa to a large number of readers within the context of a real dataset: the UK National Urological Pathology EQA Scheme for prostatic biopsies.MethodsData on Gleason grade recorded by 19 expert readers were extracted from the fixed text responses of 20 cancer cases from four circulations of the EQA scheme. Composite kappa, currently used to compute an unweighted kappa for large numbers of readers, was compared with the mean kappa for all pairwise combinations of readers. Weighted kappa generalised for multiple readers was compared with the newly developed ‘pairwise-weighted’ kappa.ResultsFor unweighted analyses, the median increase from composite to pairwise kappa was 0.006 (range −0.005 to +0.052). The difference between the pairwise-weighted kappa and generalised weighted kappa for multiple readers never exceeded ±0.01.ConclusionPairwise-weighted kappa is a suitable and highly accurate approximation to weighted kappa for multiple readers.


Author(s):  
Edoardo Cipolletta ◽  
Emilio Filippucci ◽  
Andrea Di Matteo ◽  
Giulia Tesei ◽  
Micaela Ana Cosatti ◽  
...  

Abstract Purpose i) To assess the inter- and intra-observer reliability of ultrasound (US) in the evaluation of the hyaline cartilage (HC) of the metacarpal head (MH) in patients with rheumatoid arthritis (RA) and in healthy subjects (HS) both qualitatively and quantitatively. ii) To calculate the smallest detectable difference (SDD) of the MH cartilage thickness measurement. iii) To correlate the qualitative scoring system and the quantitative assessment. Materials and Methods US examination was performed on 280 MHs of 20 patients with RA and 15 HS using a very high frequency probe (up to 22 MHz). HC status was evaluated both qualitatively (using a five-grade scoring system) and quantitatively (using the average value of the longitudinal and transverse measures). The HC of MHs from II to V metacarpophalangeal joint of both hands were scanned independently on the same day by two rheumatologists to assess inter-observer reliability. All subjects were re-examined using the same scanning protocol and the same US setting by one sonographer after a week to assess intra-observer reliability. Results The inter-observer agreement and intra-observer agreement were moderate to substantial (k = 0.66 and k = 0.73) for the qualitative scoring system and high (ICC = 0.93 and ICC = 0.94) for the quantitative assessment. The SDD of the MH cartilage thickness measurement was 0.09 mm. A significant correlation between the two scoring systems was found (r = –0.35; p < 0.001). Conclusion The present study describes the main methodological issues of HC assessment. Using a standardized protocol, both the qualitative and the quantitative scoring systems can be reliable.


Sign in / Sign up

Export Citation Format

Share Document