Gauging the Quality of Relevance Assessments using Inter-Rater Agreement

Author(s):  
Tadele T. Damessie ◽  
Thao P. Nghiem ◽  
Falk Scholer ◽  
J. Shane Culpepper
2014 ◽  
Vol 26 (5) ◽  
pp. 825-836 ◽  
Author(s):  
Martin Nikolaus Dichter ◽  
Christian G. G. Schwab ◽  
Gabriele Meyer ◽  
Sabine Bartholomeyczik ◽  
Olga Dortmann ◽  
...  

ABSTRACTBackground:Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes.Methods:The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level.Results:None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72–0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia.Conclusions:This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.


2013 ◽  
Vol 19 (2) ◽  
pp. 75-79 ◽  
Author(s):  
Georgia A Malandraki ◽  
Vasiliki Markaki ◽  
Voula C Georgopoulos ◽  
Jaime L Bauer ◽  
Ioannis Kalogeropoulos ◽  
...  

We investigated whether an expert's consultation provided via telemedicine could improve the quality of care for patients with dysphagia. A trained clinician completed videofluoroscopic swallowing studies (VFSS) of 17 consecutive patients in a Greek hospital. The videofluoroscopic images were then stored on a website for independent review by an expert Speech and Language Pathologist in the US. An extra Rater evaluated 20% of all data for additional reliability testing. Eight diagnostic indicators of swallowing impairment and an overall subjective severity index were recorded for each study. Clinicians were also asked to choose from ten common treatment options for patients with dysphagia. There was good inter-rater agreement for most of the diagnostic indicators examined (ranging from 78% to 90%; kappa = 0.52-0.71) between all three Raters. Agreement on overall severity ratings was exact for more than half of the patients and within one-point on the 4-point scale for all other patients except one. However, the quality of care would have been substandard for more than half of the patients if teleconsultation had not been employed. In settings where a swallowing expert is not available and real-time telemedicine is not feasible, the use of asynchronous teleconsultation can produce better quality of care for patients with dysphagia.


2014 ◽  
Vol 11 (6) ◽  
Author(s):  
Pieter F Fouche ◽  
Kristina Zverinova

IntroductionArrhythmias are a significant health burden in Australia, responsible for about 1% of deaths annually. The Australian Resuscitation Council (ARC) ‘Guideline 11.9 Managing Acute Dysrhythmias’ was designed to guide doctors, paramedics and nurses in the emergency management of arrhythmias. It is important to have high quality clinical practice guidelines to aid the treatment of these arrhythmias. The AGREE II tool utilised is widely used to asses clinical practice guidelines for quality. The objective of this study was to assess the quality of the ARC clinical practice guideline ‘Guideline 11.9 Managing Acute Dysrhythmias’.MethodsTwo raters assessed the six domains of quality of the ARC arrhythmia guideline using the AGREE II tool. The inter-rater agreement between the raters was measured with the intraclass correlation coefficient (ICC 2, 1).ResultsInter-rater agreement was good at 0.73 (95% CI 0.45 to 0.88). Both raters assigned the ARC guideline 11.9 Managing Acute Dysrhythmias a score of three, for a combined score of three out of a possible seven on the AGREE II rating scale.ConclusionsThe use of the ARC guideline 11.9 Managing Acute Dysrhythmias is not recommended based on this assessment with the AGREE II tool. Emergency departments and prehospital systems should consider not using this arrhythmia guideline to guide their practice, but to look elsewhere for a higher quality guideline.


2020 ◽  
Author(s):  
Ana Carolina Cintra Nunes Mafra ◽  
João Luiz Miraglia ◽  
Fernando Antonio Basile Colugnati ◽  
Gilberto Soares Lourenço Padilha ◽  
Renata Rafaella Santos Tadeucci ◽  
...  

AbstractBackgroundThe quality of the patient’s medical records is strictly related to patient safety. Besides, its data are widely used in observational studies. However, the reliability of the information extracted from them is a matter of concern in audit processes to ensure inter-rater agreement (IRA). Thus, the objective of this study is to evaluate the IRA among members of the Patient’s Health Record Review Board (PHRRB), in routine auditing of medical records, and the impact of periodic discussions of results with raters.MethodsProspective longitudinal study conducted between July of 2015 and April of 2016 at Hospital Municipal Dr. Moysés Deutsch, a large public hospital in São Paulo. The PHRRB was composed of 12 physicians, 9 nurses and 3 physiotherapists, who audited medical records, monthly, with the number of raters changing throughout the study. It was carried out PHRRB meetings to reach a consensus on criteria that the members have to rate in the auditing process. It was created a review chart that raters should verify the registry of patient’s secondary diagnosis, chief complaint, history of presenting complaint, past medical history, medication history, physical exam and diagnostic testing. It was obtained the IRA every three months. The Gwet’s AC1 coefficient and Proportion of Agreement (PA) were calculated to evaluate the IRA for each item over time.ResultsThe study included 1884 items from 239 records with an overall full agreement among raters of 71.2%. A significant IRA increase by 16.5% (OR=1.17; 95% CI=1.03—1.32; p=0.014) was found in the routine PHRRB auditing, with no significant differences between the PA and the Gwet’s AC1, that showed a similar evolution over time. The PA decreased by 27.1% when at least one of the raters was absent from the review meeting (OR=0.73; 95% CI=0.53—1.00; p=0.048).ConclusionsMedical record quality has been associated with the quality of care and could be optimized and improved by targeted interventions. The PA and the Gwet’s AC1 are suitable agreement coefficients that are feasible to be incorporated in the routine of PHRRB evaluation process.


2008 ◽  
Vol 24 (03) ◽  
pp. 318-325 ◽  
Author(s):  
Sophie Gerkens ◽  
Ralph Crott ◽  
Irina Cleemput ◽  
Jean-Paul Thissen ◽  
Marie-Christine Closon ◽  
...  

Objectives:The increasing use of full economic evaluations has led to the development of various instruments to assess their quality. The purpose of this study was to compare the frequently usedBritish Medical Journal(BMJ) check-list and two new instruments: the Consensus Health Economic Criteria (CHEC) list and the Quality of Health Economic Studies (QHES) instrument. The analysis was based on a practical exercise on economic evaluations of the surgical treatment of obesity.Methods:The quality of nine selected studies was assessed independently by two health economists. To compare instruments, the Spearman rank correlation coefficient was calculated for each assessor. Moreover, the test–retest reliability for each instrument was assessed with the intraclass correlation coefficient (ICC) (3,1). Finally, the inter-rater agreement for each instrument was estimated at two levels: comparison of the total score of each article by the ICC(2,1) and comparison of results per item by kappa values.Results:The Spearman's rank correlation coefficient between instruments was usually high (rho > 0.70). Furthermore, test–retest reliability was good for every instruments, that is, 0.98 (95 percent CI, 0.86–0.99) for the BMJ check-list, 0.97 (95 percent CI, 0.73–0.98) for the CHEC list, and 0.95 (95 percent CI, 0.75–0.99) for the QHES instrument. However, inter-rater agreement was poor (kappa < 0.40 for most items and ICC(2,1) ≤ 0.5).Conclusions:The study shows that the results of the quality assessment of economic evaluations are not so much influenced by the instrument used but more by the assessor. Therefore, quality assessments should be performed by at least two independent experts and final scoring based on consensus.


Author(s):  
Alberto Falorni ◽  
Vittorio Bini ◽  
Corrado Betterle ◽  
Annalisa Brozzetti ◽  
Luis Castaño ◽  
...  

Abstract21-Hydroxylase autoantibodies (21OHAb) are markers of an adrenal autoimmune process that identifies individuals with autoimmune Addison’s disease (AAD). Quality and inter-laboratory agreement of various 21OHAb tests are incompletely known. The objective of the study was to determine inter-laboratory concordance for 21OHAb determinations.Sixty-nine sera from 51 patients with AAD and 51 sera from 51 healthy subjects were blindly coded by a randomization center and distributed to 14 laboratories that determined 21OHAb, either by an “in-house” assay (n=9) using in vitro-translatedIntra-assay coefficient of variation ranged from 2.6% to 5.3% for laboratories using the commercial kit and from 5.1% to 23% for laboratories using “in-house” assays. Diagnostic accuracy, expressed as area under ROC curve (AUC), varied from 0.625 to 0.947 with the commercial kit and from 0.562 to 0.978 with “in-house” methods. Cohen’s κ of inter-rater agreement was 0.603 among all 14 laboratories, 0.691 among “in-house” laboratories, and 0.502 among commercial kit users. Optimized cutoff levels, calculated on the basis of AUCs, increased the diagnostic accuracy of every laboratory (AUC >0.9 for 11/14 laboratories) and increased the Cohen’s κ of inter-rater agreement. Discrepancies in quantitation of 21OHAb levels among different laboratories increased with increasing autoantibody levels.The quality of 21OHAb analytical procedures is mainly influenced by selection of cutoff value and correct handling of assay materials. A standardization program is needed to identify common standard sera and common measuring units.


BMJ Open ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. e048370
Author(s):  
Sulekha Shrestha ◽  
Johannes Vieler ◽  
Nikolai Juliussen Haug ◽  
Jan Egil Afset ◽  
Lise Husby Høvik ◽  
...  

ObjectivesThere is a lack of data regarding the quality of peripheral intravenous catheter (PIVC)-related care from low-income and middle-income countries, even though the use of PIVCs may lead to local or severe systemic infections. Our main objective was to assess the feasibility and inter-rater agreement on the PIVC-mini Questionnaire (PIVC-miniQ) in a tertiary care hospital in Nepal.DesignWe performed an observational cross-sectional quantitative study using the PIVC-miniQ to collect information on PIVC quality.SettingSecondary care in a Nepalese hospital. All patients with PIVCs in selected wards were included in the study and PIVCs were assessed independently by two raters. Eight Nepalese nurses, one Nepalese student and three Norwegian students participated as raters.Primary and secondary outcome measuresThe intraclass correlation coefficient (ICC), positive, negative, absolute agreement, Scott’s pi and sum score were calculated using PIVC-miniQ. We also aimed to describe PIVC quality of care, as it is important to prevent PIVC-associated complications such as phlebitis or catheter-associated bloodstream infections.ResultsA total of 390 patients (409 PIVCs) were included in the study. The ICC between raters was 0.716 for Nepalese raters, 0.644 for Norwegian raters and 0.481 for the pooled data. The most frequently observed problems associated with PIVCs were blood in the intravenous line (51.5%), pain and tenderness on palpation (43.4%), and fixation with opaque tape (38.5%). The average sum score was 3.32 deviations from best practice for PIVCs fixed with non-sterile opaque tape and 2.37 for those fixed with transparent dressing (p<0.001).ConclusionThe PIVC-miniQ is a feasible and reliable tool for nurses assessing PIVC quality in hospitalised patients in Nepal. The study revealed gaps in PIVC quality and care that could be improved by providing transparent PIVC dressings for all patients and requiring all PIVC insertions to be documented in patient charts.


Sign in / Sign up

Export Citation Format

Share Document