scholarly journals CT-severity Score in COVID-19 patients, Assessment of Performance in Triage and Outcome Prediction: a Comparative Study of Different Methods

Author(s):  
Alireza Almasi Nokiani

BACKGROUND: Lung involvement in COVID-19 can be quantified by chest CT scan with some triage and prognostication value. At least 7 CT-severity score (CTSS) systems have been proposed. PURPOSE: We evaluated triage and prognostication performance of seven different CTSSs for COVID-19. MATERIALS AND METHODS: COVID-19, PCR positive patients admitted from February 20th 2020 to July 22 were included into a retrospective study. Demographic data and clinical data indicating disease severity at presentation and in peak disease severity were recorded. CT images were reviewed and scored according to seven different scoring systems (CTSS1-CTSS7) by two radiologists. Interrater reliability was determined for each CTSS. Then clinical severity of the disease at presentation (for triage) and peak disease severity (for outcome) were compared with CTSSs separately. ROC curves for performance of each CTSS in diagnosing severe/critical disease on admission, severe/critical disease at peak disease severity and critical disease at peak severity were plotted. Areas under the curve (AUCs), best thresholds and corresponding sensitivities and specificities were calculated. RESULTS: 96 patients were included with mean age of 63.6 ± 17.4 years (range: 21-88, median: 67). 57 (59,4%) were men and 39 (40.6%) were women. All CTSSs showed good interrater reliability as calculated intraclass correlation coefficients (ICCs) were 0.764-0.837 for all of the CTSSs. Only three CTSSs showed acceptable AUCs (AUC =0.7) for triage of severe/critical patients. All CTSSs showed acceptable AUCs for prognostication (AUCs=0.76-0.79). Calculated AUCs were not significantly different for triage and for prediction of severe/critical disease but some difference was shown for prediction of critical disease. CONCLUSION: Men are probably affected more frequently than women by COVID19. CTSS performance in triage was much lower than earlier reports and only three CTSSs showed acceptable AUCs. CTSS performed better for prognostic purposes than for triage as all 7 CTSSs showed acceptable AUCs in both types of prognostic ROC curves. Our results are compatible with those of recent studies. There is not much difference among performance of different CTSSs.

1991 ◽  
Vol 34 (5) ◽  
pp. 989-999 ◽  
Author(s):  
Stephanie Shaw ◽  
Truman E. Coggins

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.


Author(s):  
Jens Sörensen ◽  
Jonny Nordström ◽  
Tomasz Baron ◽  
Stellan Mörner ◽  
Sven-Olof Granstam ◽  
...  

Abstract Aim To develop a method for diagnosing left ventricular (LV) hypertrophy from cardiac perfusion 15O-water positron emission tomography (PET). Methods We retrospectively pooled data from 139 subjects in four research cohorts. LV remodeling patterns ranged from normal to severe eccentric and concentric hypertrophy. 15O-water PET scans (n = 197) were performed with three different PET devices. A low-end scanner (66 scans) was used for method development, and remaining scans with newer devices for a blinded evaluation. Dynamic data were converted into parametric images of perfusable tissue fraction for semi-automatic delineation of the LV wall and calculation of LV mass (LVM) and septal wall thickness (WT). LVM and WT from PET were compared to cardiac magnetic resonance (CMR, n = 47) and WT to 2D-echocardiography (2DE, n = 36). PET accuracy was tested using linear regression, Bland–Altman plots, and ROC curves. Observer reproducibility were evaluated using intraclass correlation coefficients. Results High correlations were found in the blinded analyses (r ≥ 0.87, P < 0.0001 for all). AUC for detecting increased LVM and WT (> 12 mm and > 15 mm) was ≥ 0.95 (P < 0.0001 for all). Reproducibility was excellent (ICC ≥ 0.93, P < 0.0001). Conclusion 15O-water PET might detect LV hypertrophy with high accuracy and precision.


2014 ◽  
Vol 138 (6) ◽  
pp. 809-813
Author(s):  
Carolyn R. Vitek ◽  
Jane C. Dale ◽  
Henry A. Homburger ◽  
Sandra C. Bryant ◽  
Amy K. Saenger ◽  
...  

Context.— Systems-based practice (SBP) is 1 of 6 core competencies required in all resident training programs accredited by the Accreditation Council for Graduate Medical Education. Reliable methods of assessing resident competency in SBP have not been described in the medical literature. Objective.— To develop and validate an analytic grading rubric to assess pathology residents' analyses of SBP problems in clinical chemistry. Design.— Residents were assigned an SBP project based upon unmet clinical needs in the clinical chemistry laboratories. Using an iterative method, we created an analytic grading rubric based on critical thinking principles. Four faculty raters used the SBP project evaluation rubric to independently grade 11 residents' projects during their clinical chemistry rotations. Interrater reliability and Cronbach α were calculated to determine the reliability and validity of the rubric. Project mean scores and range were also assessed to determine whether the rubric differentiated resident critical thinking skills related to the SBP projects. Results.— Overall project scores ranged from 6.56 to 16.50 out of a possible 20 points. Cronbach α ranged from 0.91 to 0.96, indicating that the 4 rubric categories were internally consistent without significant overlap. Intraclass correlation coefficients ranged from 0.63 to 0.81, indicating moderate to strong interrater reliability. Conclusions.— We report development and statistical analysis of a novel SBP project evaluation rubric. The results indicate the rubric can be used to reliably assess pathology residents' critical thinking skills in SBP.


2018 ◽  
Vol 25 (3) ◽  
pp. 286-290 ◽  
Author(s):  
Elif Bilgic ◽  
Madoka Takao ◽  
Pepa Kaneva ◽  
Satoshi Endo ◽  
Toshitatsu Takao ◽  
...  

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.


2020 ◽  
Author(s):  
David Høyrup Christiansen ◽  
Gareth McCray ◽  
Trine Nøhr Winding ◽  
Johan Hviid Andersen ◽  
Kent Jacob Nielsen ◽  
...  

Abstract Background: The Musculoskeletal Health Questionnaire (MSK-HQ) has been developed to measure musculoskeletal health status across musculoskeletal conditions and settings. However, the MSK-HQ needs to be further evaluated across settings and different languages.Objective: The objective of the study was to evaluate and compare measurement properties of the MSK-HQ across Danish (DK) and English (UK) cohorts of patients from primary care physiotherapy services with musculoskeletal pain. Methods: MSK-HQ was translated into Danish according to international guidelines. Measurement invariance was assessed by differential item functioning (DIF) analyses. Test-retest reliability, measurement error, responsiveness and minimal clinically important change (MCIC) were evaluated and compared between DK (n=153) and UK (n=166) cohorts. Results: The Danish version demonstrated acceptable face and construct validity. Out of the 14 MSK-HQ items, three items showed DIF for language (pain/stiffness at night, understanding condition and confidence in managing symptoms) and three items showed DIF for pain location (walking, washing/dressing and physical activity levels). Intraclass Correlation Coefficients for test-retest were 0.86 (95% CI 0.81 to 0.91) for DK cohort and 0.77 (95% CI 0.49 to 0.90) for the UK cohort. The systematic measurement error was 1.6 and 3.9 points for the DK and UK cohorts respectively, with random measurement error being 8.6 and 9.9 points. Receiver operating characteristic (ROC) curves of the change scores against patients’ own judgment at 3 months exceeded 0.70 in both cohorts. Absolute and relative MCIC estimates were 8-10 points and 26% for the DK cohort and 6-8 points and 29% for the UK cohort.Conclusions: The measurement properties of MSK-HQ were acceptable across countries, but seem more suited for group than individual level evaluation. Researchers and clinicians should be aware that some discrepancy exits and should take the observed measurement error into account when evaluating change in scores over time.


Dermatology ◽  
2019 ◽  
Vol 236 (1) ◽  
pp. 8-14 ◽  
Author(s):  
Katarzyna Włodarek ◽  
Aleksandra Stefaniak ◽  
Łukasz Matusiak ◽  
Jacek C. Szepietowski

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.


1997 ◽  
Vol 17 (4) ◽  
pp. 280-287 ◽  
Author(s):  
Margaret Wallen ◽  
Mary-Ann Bonney ◽  
Lyn Lennox

The Handwriting Speed Test (HST), a standardized, norm-referenced test, was developed to provide an objective evaluation of the handwriting speed of school students from approximately 8 to 18 years of age. Part of the test development involved an examination of interrater reliability. Two raters scored 165 (13%) of the total 1292 handwriting samples. Using intraclass correlation coefficients, the interrater reliability was found to be excellent (ICC=1.00, P<0.0001). The process of examining interrater reliability resulted in modification to the scoring criteria of the test. Excellent interrater reliability provides support for the HST as a valuable clinical and research tool.


2013 ◽  
Vol 5 (2) ◽  
pp. 252-256 ◽  
Author(s):  
Hans B. Kersten ◽  
John G. Frohna ◽  
Erin L. Giudice

Abstract Background Competence in evidence-based medicine (EBM) is an important clinical skill. Pediatrics residents are expected to acquire competence in EBM during their education, yet few validated tools exist to assess residents' EBM skills. Objective We sought to develop a reliable tool to evaluate residents' EBM skills in the critical appraisal of a research article, the development of a written EBM critically appraised topic (CAT) synopsis, and a presentation of the findings to colleagues. Methods Instrument development used a modified Delphi technique. We defined the skills to be assessed while reviewing (1) a written CAT synopsis and (2) a resident's EBM presentation. We defined skill levels for each item using the Dreyfus and Dreyfus model of skill development and created behavioral anchors using a frame-of-reference training technique to describe performance for each skill level. We evaluated the assessment instrument's psychometric properties, including internal consistency and interrater reliability. Results The EBM Critically Appraised Topic Presentation Evaluation Tool (EBM C-PET) is composed of 14 items that assess residents' EBM and global presentation skills. Resident presentations (N  =  27) and the corresponding written CAT synopses were evaluated using the EBM C-PET. The EBM C-PET had excellent internal consistency (Cronbach α  =  0.94). Intraclass correlation coefficients were used to assess interrater reliability. Intraclass correlation coefficients for individual items ranged from 0.31 to 0.74; the average intraclass correlation coefficients for the 14 items was 0.67. Conclusions We identified essential components of an assessment tool for an EBM CAT synopsis and presentation with excellent internal consistency and a good level of interrater reliability across 3 different institutions. The EBM C-PET is a reliable tool to document resident competence in higher-level EBM skills.


2002 ◽  
Vol 82 (4) ◽  
pp. 364-371 ◽  
Author(s):  
Douglas P Gross ◽  
Michele C Battié

Abstract Background and Purpose. Functional capacity evaluations (FCEs) are measurement tools used in predicting readiness to return to work following injury. The interrater and test-retest reliability of determinations of maximal safe lifting during kinesiophysical FCEs were examined in a sample of people who were off work and receiving workers' compensation. Subjects. Twenty-eight subjects with low back pain who had plateaued with treatment were enrolled. Five occupational therapists, trained and experienced in kinesiophysical methods, conducted testing. Methods. A repeated-measures design was used, with raters testing subjects simultaneously, yet independently. Subjects were rated on 2 occasions, separated by 2 to 4 days. Analyses included intraclass correlation coefficients (ICCs) and 95% confidence intervals. Results. The ICC values for interrater reliability ranged from .95 to .98. Test-retest values ranged from .78 to .94. Discussion and Conclusion. Inconsistencies in subjects' performance across sessions were the greatest source of FCE measurement variability. Overall, however, test-retest reliability was good and interrater reliability was excellent.


2019 ◽  
Vol 5 (2) ◽  
pp. 294-323 ◽  
Author(s):  
Charles Nagle

Abstract Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.


Sign in / Sign up

Export Citation Format

Share Document