scholarly journals Guidelines for analysis on measuring interrater reliability of nursing outcome classification

Author(s):  
Intansari Nurjannah ◽  
Sri Marga Siwi

Indicators in nursing outcome classification (NOC) need to be tested for their validity and reliability. One method to measure reliability of NOC is by using interrater reliability.  Kappa and percent agreement are common statistic analytical methods to be used together in measuring interrater reliability of an instrument. The reason for using these two methods at the same time is that those statistic analytical methods have easy reliability interpretation. Two possible conflicts may possibly emerge when there are asynchronies between kappa value and percent agreement. This article is aimed to provide guidance when a researcher faces these two possible conflicts. This guidance is referring to interrater reliability measurement using two raters.

2002 ◽  
Vol 14 (2) ◽  
pp. 219-226 ◽  
Author(s):  
Hin-Yeung Tsang ◽  
Main-Yoon Chong ◽  
Andrew T. A. Cheng

Objective: To test the validity and reliability of the Chinese Geriatric Mental State Schedule (CGMS) in Taiwanese elders. Methods: The CGMS has gone through a standardized two-way translation, a pretest phase, and consensus focus group meetings in order to modify relevant culture-related terms of the original English version. The interrater reliability of the CGMS among eight psychiatrists was conducted after a training course was given to them. Diagnoses generated by the CGMS-AGECAT (Automated Geriatric Examination for Computer Assisted Taxonomy) were compared with psychiatric diagnoses according to the DSM-III-R criteria. The sample subjects were aged 65 and over and recruited from a community (n = 36) and an “old age home” (n = 56). Results: Four of the eight diagnostic categories generated by the CGMS-AGECAT had a generalized kappa value of 1.0, and the figures for the remaining four categories were acceptable: .8 for depressive neurosis, .6 for anxiety disorder, .5 for schizophrenia, and .5 for depressive psychosis (generalized kappa = .5). The overall agreement between the CGMS-AGECAT and independent psychiatric diagnosis (based on the DSM-III-R criteria) was satisfactory. Conclusion: The CGMS has been found to be a crossculturally valid and reliable instrument for use in Taiwan.


2018 ◽  
Vol 4 (5) ◽  
pp. 448-456
Author(s):  
Sri Hartini ◽  
Novi Aprilia K ◽  
Intansari Nurjannah ◽  
Fitri Haryanti ◽  
Itsna Lutfi Kholisa ◽  
...  

Background: The common problems suffered by children with intellectual disability are difficulty to perform of daily activities or self-care including eating activity. NOC Self-care: eating is a measurement of client’s status regarding eating skill after provides nursing intervention. NOC was translated into Indonesian and its indicators were operationalized. The measurement of self-care: eating need to be evaluated, thus the reliability of this instrument was tested to determine whether the NOC is also reliable for use in Indonesia. Objective: The aim of this study was to investigate the interrater reliability of the NOC: Self-care: eating in the Indonesian Language in children with intellectual disability. Methods: Two raters assessed 124 children with intellectual disability using NOC: Self-care: eating. The study was conducted on December 2017 to January 2018. NOC was translated into Indonesian and each indicator was operated. Kappa and percent agreement were used for the analysis. Results: The overall kappa value of NOC: Self-care: eating was 0.55, while the percent agreement was 88. The highest kappa value and percent agreement was indicator of swallowing the food (0.8 and 99). Conclusions: The interrater reliability of NOC: Self-care: eating in Indonesian was at the level of great reliability.


2017 ◽  
Vol 5 (1) ◽  
pp. 59-68 ◽  
Author(s):  
Pauli Olavi Rintala ◽  
Arja Kaarina Sääkslahti ◽  
Susanna Iivonen

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.


2021 ◽  
pp. 315-328
Author(s):  
Tobias Haug ◽  
Eveline Boers-Visker ◽  
Wolfgang Mann ◽  
Geoffrey Poor ◽  
Beppie Van den Bogaerde

There exists a scarcity in signed language assessment research, especially on scoring issues and interrater reliability. This chapter describes two related assessment instruments, the SLPI and the NFA, which offer scoring criteria. Raters are provided with scales for evaluating the different components of the language production of the candidate. Through its use, the rating system has been proved successful; there is, however, hardly any data on interrater reliability. In this chapter, the authors describe reliability issues with attention to raters’ training and score resolution techniques and discuss how to identify and increase rater reliability. The dearth of knowledge on signed language assessment, and in particular its validity and reliability, indicates an urgent need for more research in this area.


2019 ◽  
Vol 23 (3) ◽  
pp. 210-214
Author(s):  
Deidre Ongaro ◽  
Jefferson Terry

Chronic intervillositis of unknown etiology (CIUE) is a rare placental inflammatory process associated with pregnancy loss and recurrence. We conducted a quality assurance study to assess the diagnostic accuracy and reproducibility of CIUE grading at our institution. Hematoxylin and eosin-stained slides from 20 CIUE cases (31 slides) were reviewed by 7 perinatal pathologists in 2 sequential rounds. Reviewers were instructed to use the diagnostic criteria they were presently following for CIUE and to grade each slide according to the Rota scheme. In the first round, 20 slides were assessed. The diagnostic accuracy was 94%, the average percent agreement of Rota grade was 79%, and the Fleiss’ kappa value for interobserver variability was 0.54. The results were reviewed by all pathologists with diagnostic and grading criteria agreed upon prior to the second round. In round 2, the remaining 11 slides were assessed. Diagnostic accuracy was 83%, the average percent agreement on Rota grade was 70%, and the Fleiss’ kappa value for interobserver variability was 0.36. Overall, diagnostic accuracy was high and agreement on Rota grade was moderate. Group review did not appear to improve accuracy. Simplifying CIUE grading to a low-grade/high-grade scheme (<50% or ≥50%) might improve grading reproducibility.


2013 ◽  
Vol 313-314 ◽  
pp. 592-595
Author(s):  
Fei Cao ◽  
Fan Wu ◽  
Fang Lu

A method based on the combination of experts’ assessments and D-S evidence theory of weight is proposed for reliability about result of quality evaluation. This method recurs to the idea of meta-evaluation, and actualizes the reliability measurement, then provides technical support for robust evaluation. Finally, we validate the method by using weapon primary evaluation result. It indicated that the method can measure the validity and reliability for primary evaluation result, and then improved commendably decision-making capability of weapon quality evaluation.


2019 ◽  
Vol 9 (19) ◽  
pp. 3975 ◽  
Author(s):  
Christoph Schärer ◽  
Luca von Siebenthal ◽  
Ishbel Lomax ◽  
Micah Gross ◽  
Wolfgang Taube ◽  
...  

In artistic gymnastics, the possibility of using 2D video analysis to measure the peak height (hpeak) and length of flight (L) during routine training in order to monitor the execution and development of difficult elements is intriguing. However, the validity and reliability of such measurements remain unclear. Therefore, in this study, the hpeak and L of 38 vaults, performed by top-level gymnasts, were assessed by 2D and 3D analysis in order to evaluate criterion validity and both intrarater and interrater reliability of the 2D method. Validity calculations showed higher accuracy for hpeak (±95% LoA: ±3.6% of average peak height) than for L (±95% LoA: ±7.6% of average length). Minor random errors, but no systematic errors, were observed in the examination of intrarater reliability (hpeak: CV% = 0.44%, p = 0.81; L: CV% = 0.87%, p = 0.14) and interrater reliability (hpeak: CV% = 0.51%, p = 0.55; L: CV% = 0.72%, p = 0.44). In conclusion, the validity and reliability of the 2D method are deemed sufficient (particularly for hpeak, but with some limitations for L) to justify its use in routine training of the vault. Due to its simplicity and low cost, this method could be an attractive monitoring tool for gymnastics coaches.


2016 ◽  
Vol 50 (3) ◽  
pp. 288-294 ◽  
Author(s):  
J.C. Carvalho ◽  
D. Declerck ◽  
E. De Vos ◽  
J. Kellen ◽  
J.P. Van Nieuwenhuysen ◽  
...  

The aims of the present study were to incorporate and to validate the electronic capture of participant-related outcomes into the Oral Survey-B System, which was originally developed for the electronic capture of clinical data. The validation process compared the performances of electronic and handwritten data captures. The hypothesis of noninferiority would be established if participants performed electronic data capture of the questionnaire survey with an effectiveness of at least 95% of that of handwritten data capture. In this multicenter, randomized, one-period crossover study design, participants (n = 261) were allocated to start with either electronic or handwritten data capture. The incorporation of the electronic self-completed questionnaire into the Oral Survey-B System was successful. The validation of the electronic questionnaire was performed by participants aged from 18 to 75 years. The interrater reliability of participants performing electronic and handwritten data capture of nonclinical assessments per questionnaire and per entry showed a kappa value of 0.72 (95% CI: 0.53-0.94). The noninferiority of electronic data capture in relation to that of the handwritten data capture and transfer was shown (p < 0.0001; 95% CI: 1.47-2.99). In conclusion, the electronic capture of participant-related outcomes with the Oral Survey-B System, originally designed for capture of clinical data, was validated. The electronic data capture was accurate and limited the number of errors. The participants were able to perform electronic data capture effectively, supporting its implementation in further National Oral Health Surveys. With the consideration of participant preference and time savings, this could lead to the implementation of electronic data capture worldwide in National Oral Health Surveys.


Sign in / Sign up

Export Citation Format

Share Document