scholarly journals Inter-rater Reliability in Clinical Assessments: Do Examiner Pairings Influence Candidate Ratings?

2019 ◽  
Author(s):  
Aileen Faherty ◽  
Yvonne Finn ◽  
Tim Counihan ◽  
Thomas Kropmans

Abstract BackgroundThe reliability of clinical assessments is known to vary considerably with inter-rater reliability a key contributor. Many of the mechanisms that contribute to inter-rater reliability however remain largely unexplained and unclear. While research in other fields suggests personality of raters can impact ratings, studies looking at personality factors in clinical assessments are few. Many schools use the approach of pairing examiners in clinical assessments and asking them to come to an agreed score. Little is known however, about what occurs when these paired examiners interact to generate a score. Could personality factors have an impact? Methods: A fully-crossed design was employed with each participant examiner observing and scoring. A quasi-experimental research design used candidate’s observed scores in a mock clinical assessment as the dependent variable. The independent variables were examiner numbers, demographics and personality with data collected by questionnaire. A purposeful sample of doctors who examine in the Final Medical examination at our institution was recruited. Results: Variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12). 75% of examiners (N=9) scored below average for neuroticism and 75% also scored high or very high for extroversion. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score. Conclusions: While the variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12), the reliability statistics for both assessments were comparable. However, using paired examiners resulted in a more accurate and robust score than simply averaging two independent examiners scores. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score. These findings could have implications for the organisation and administration of clinical assessments. Further studies with larger numbers of participants might establish if personality testing before choosing examiner pairs should be adopted.

2020 ◽  
Author(s):  
Aileen Faherty ◽  
Tim Counihan ◽  
Thomas Kropmans ◽  
Yvonne Finn

Abstract Background: The reliability of clinical assessments is known to vary considerably with inter-rater reliability a key contributor. Many of the mechanisms that contribute to inter-rater reliability however remain largely unexplained and unclear. While research in other fields suggests personality of raters can impact ratings, studies looking at personality factors in clinical assessments are few. Many schools use the approach of pairing examiners in clinical assessments and asking them to come to an agreed score. Little is known however, about what occurs when these paired examiners interact to generate a score. Could personality factors have an impact? Methods: A fully-crossed design was employed with each participant examiner observing and scoring. A quasi-experimental research design used candidate’s observed scores in a mock clinical assessment as the dependent variable. The independent variables were examiner numbers, demographics and personality with data collected by questionnaire. A purposeful sample of doctors who examine in the Final Medical examination at our institution was recruited. Results: Variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12). 75% of examiners (N=9) scored below average for neuroticism and 75% also scored high or very high for extroversion. Two-thirds scored high or very high for conscientiousness. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score. Conclusions: The reliability of clinical assessments using paired examiners is comparable to assessments with single examiners. Personality factors, such as extroversion, may influence the magnitude of change in score an individual examiner agrees to when paired up with another examiner. Further studies on personality factors and examiner behaviour are needed to test associations and determine if personality testing has a role in reducing examiner variability.


2019 ◽  
Author(s):  
Aileen Faherty ◽  
Yvonne Finn ◽  
Tim Counihan

Abstract Background The reliability of clinical assessments is known to vary considerably and inter-examiner variability is a key contributor. This may result in significant differences in scores between comparable candidates, a serious challenge in medical education. An approach frequently adopted to avoid this and improve reliability is to pair examiners and ask them to come to an agreed score. Little is known however, about what occurs when these paired examiners interact to generate a score.Methods A fully-crossed design was employed with each participant examiner observing and scoring. A quasi-experimental research design used candidate’s observed scores in a mock clinical assessment as the dependent variable. The independent variables were examiner numbers, demographics and personality. Demographic and personality data was collected by questionnaire. A purposeful sample of medical doctors who examine in the Final Medical examination at our institution was recruited.Results Variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12). 75% of examiners (N=9) scored below average for neuroticism and 75% also scored high or very high for extroversion. Two thirds scored high or very high for conscientiousness. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score.Conclusions While the variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12), the reliability statistics for both assessments were comparable. Using paired examiners resulted in a more accurate and robust score than simply averaging two independent examiners scores. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score. These findings could have implications for the organisation and administration of clinical assessments. Further studies with larger numbers of participants might establish if personality testing before choosing examiner pairs could be utilised to help pair examiners and improve examiner variability.


2020 ◽  
Vol 15 ◽  
Author(s):  
Dixon Thomas ◽  
Sherief Khalifa ◽  
Jayadevan Sreedharan ◽  
Rucha Bond

Background:: Clinical competence of pharmacy students is better evaluated at their practice sites. compared to the classroom. A clinical pharmacy competency evaluation rubric like that of the American College of Clinical Pharmacy (ACCP)is an effective assessment tool for clinical skills and can be used to show item reliability. The preceptors should be trained on how to use the rubrics as many inherent factors could influence inter-rater reliability. Objective:: To evaluate inter-rater reliability among preceptors on evaluating clinical competence of pharmacy students, before and after a group discussion intervention. Methods:: In this quasi experimental study in a United Arab Emirates teaching hospital, Seven clinical pharmacy preceptors rated clinical pharmacy competencies of ten recent PharmD graduates referring to their portfolios and preceptorship. Clinical pharmacy competencies were adopted from ACCP and mildly modified to be relevant for the local settings. Results:: Inter-rater reliability (Cronbach's Alpha) among preceptors was reasonable being practitioners at a single site for 2-4 years. At domain level, inter-rater reliability ranged from 0.79 - 0.93 before intervention and 0.94 - 0.99 after intervention. No inter-rater reliability was observed in relation to certain competency elements ranging from 0.31 – 0.61 before intervention, but improved to 0.79 – 0.97 after intervention. Intra-class correlation coefficient improved among all individual preceptors being reliable with each other after group discussion though some had no reliability with each other before group discussion. Conclusion:: Group discussion among preceptors at the training site was found to be effective in improving inter-rater reliability on all elements of the clinical pharmacy competency evaluation. Removing a preceptor from analysis did not affect inter-rater reliability after group discussion.


2019 ◽  
pp. 193-236
Author(s):  
Arvind Elangovan

Contrary to Rau’s ideas, the framers of the Indian constitution, however, were deeply influenced by the political history that preceded the meeting of the Constituent Assembly. As a result, the framers privileged not only Fundamental Rights but also the postcolonial State and the latter’s right to intervene for the cause of social justice. Interestingly, the idea that mainly underscored this act of privileging was not so much to come together to create a state by submitting individual wills (as theorized by social contract theorists, for instance) but rather there was a deep mistrust between the different political interests that were at work in the Constituent Assembly. Thus, by the time of the drafting of the Indian constitution, political history played a dominant role, with norms giving way to a history of politics.


Econometrica ◽  
2019 ◽  
Vol 87 (5) ◽  
pp. 1507-1541 ◽  
Author(s):  
Daniel Garcia-Macia ◽  
Chang-Tai Hsieh ◽  
Peter J. Klenow

Entrants and incumbents can create new products and displace the products of competitors. Incumbents can also improve their existing products. How much of aggregate productivity growth occurs through each of these channels? Using data from the U.S. Longitudinal Business Database on all nonfarm private businesses from 1983 to 2013, we arrive at three main conclusions: First, most growth appears to come from incumbents. We infer this from the modest employment share of entering firms (defined as those less than 5 years old). Second, most growth seems to occur through improvements of existing varieties rather than creation of brand new varieties. Third, own‐product improvements by incumbents appear to be more important than creative destruction. We infer this because the distribution of job creation and destruction has thinner tails than implied by a model with a dominant role for creative destruction.


Author(s):  
David L. Streiner ◽  
Geoffrey R. Norman ◽  
John Cairney

Although the goal of many clinical assessments and research studies is to measure how much people change between two occasions, the measurement of change is fraught with conceptual and methodological difficulties. One of the difficulties is that there are (at least) two different reasons to measure change: to determine if intervention had any effect, and to identify the correlates of change. These two goals work against each other, because the former requires there to be little difference in the amount of change among people in the same group, while the latter depends on inter-individual differences. The chapter also discusses various biases that exist when people are asked directly how much they think they have changed. This chapter addresses the issues of the relationship of change to the reliability of the scale, difficulties of measuring change in experimental and quasi-experimental studies, and new approaches to measuring change, such as growth curve analysis.


2019 ◽  
Vol 2 (6) ◽  
pp. 376
Author(s):  
Mentari Soviani Ageung ◽  
Lenny Nuraeni

The ability of Science will be very needed by children, remembering that children have lived in modern times where all things in their environment are related to Science, such as Rain, Television, Computers, Plant Growth and others. It does not even rule out the possibility that in the years to come not a computer is needed but the creative abilities of a child, the ability, analyzing problems, solving problems, facing problems are needed by children, therefore it is very important to learn or introduce Science Knowledge to children. The research method used is a quasi-experimental method in which there are experimental and control classes that are investigated regarding their scientific knowledge by using a scientific approach to the experimental class and ordinary learning for the control class. To find out the results of the study, statistical calculations were carried out including the normality test, the homogeneity test, and the two average difference test. So the results obtained are that science learning that uses a scientific approach is better than ordinary learning.Kemampuan Sains akan sangat dibutuhkan anak mengingat bahwa anak sudah hidup pada zaman atau era modern dimana semua hal yang berada dilingkungannya berkaitan dengan Sains, seperti Hujan, Televisi, Komputer, Pertumbuhan Tanaman serta yang lainnya. Bahkan tidak menutup kemungkinan pada tahun-tahun yang akan datang bukan komputer yang dibutuhkan melainkan kemampuan kreatif seorang anak tersebut, kemampuan ,menganalisa masalah, memecahkan masalah, menghadapi masalah sangat dibutuhkan anak, oleh karena itu sangatlah penting untuk membelajarkan atau mengenalkan Pengetahuan Sains kepada anak. Metode penelitian yang dilakukan adalah metode kuasi eksperimen dimana terdapat kelas eksperimen dan kontrol yang diteliti mengenai pengetahuan sainnya dengan menggunakan pendekatan saintifik untuk kelas eksperimen dan pembelajaran biasa untuk kelas kontrol. Untuk mengetahui hasil dari penelitian maka dilakukan perhitungan statistik yang diantaranya adalah uji normalitas, uji homogenitas, dan uji perbedaan dua rata-rata. Sehingga hasil yang didapatkan adalah bahwa pembelajaran sains yang menggunakan pendekatan saintifik lebih baik daripada pembelajaran biasa.


2017 ◽  
Author(s):  
Julian Varghese ◽  
Sarah Sandmann ◽  
Martin Dugas

BACKGROUND Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and natural language processing tools. However, abundance of ambiguous codes leads to low rates of uniform coding among different coders. OBJECTIVE To measure uniform coding among different medical experts in terms of inter-rater reliability (IR) and analyze the effect on IR by using an expert-based online code suggestion system. METHODS A quasi-experimental study was conducted. Six medical experts coded 602 medical items from structured quality assurance forms (QA) or free-text eligibility criteria (EC) of 20 different clinical trials. Medical item content was selected based on mortality-leading diseases according to WHO data. The intervention consisted of using a semi-automatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of more than 300,000 medical form items with expert-assigned semantic codes. Krippendorff’s alpha (Kalpha) with bootstrap analysis was used for IR analysis and coding times were measured before and after intervention. RESULTS The intervention improved IR in structured QA form items (from Kalpha= 0.50, 95%-CI [0.43-0.57] to Kalpha = 0.62 [0.55-0.69]) and free-text eligibility criteria (from Kalpha = 0.19 [0.14-0.24] to Kalpha = 0.43 [0.37-0.50]) while preserving or slightly reducing mean coding time per item for all six coders. Regardless of intervention, pre-coordination and structured items were associated with significant higher IR, but the proportion of items that were pre-coordinated significantly increased after intervention (EC: Odds ratio: 4.92 [2.78 - 8.72]; QA: Odds ratio: 1.96 [1.19-3.25]). CONCLUSIONS The online code suggestion mechanism improved IR towards moderate or even substantial inter coder agreement. Pre-coordination and use of structured vs. free-text data elements are key drivers for higher IR.


2010 ◽  
Vol 22 (4) ◽  
pp. 171-184 ◽  
Author(s):  
Sabine Trepte ◽  
Leonard Reinecke

Based on the model of complex entertainment experiences ( Vorderer, Klimmt, & Ritterfeld, 2004 ), the competitiveness of a computer game (media prerequisite) and the individual life satisfaction (user prerequisite) are hypothesized to influence game enjoyment. Avatar-player similarity was hypothesized to determine identification with the avatar, which in turn was suggested to enhance the enjoyment experience. In a quasi-experimental study, (N = 666) participants were asked to choose the personality features of an avatar for six different game scenarios. The results demonstrate that the games’ competitiveness as well as the participants’ life satisfaction influenced avatar choice and identification. In noncompetitive games, similar avatars were created, whereas in competitive games, dissimilar avatars were created. Participants who were well satisfied with their lives created avatars that resemble themselves in terms of personality factors, whereas dissatisfied users created dissimilar avatars. Player-avatar similarity was positively related to identification. This correlation was significantly stronger for noncompetitive games. Identification with the avatar was strongly related to game enjoyment. When controlling for the influence of identification on enjoyment, player-avatar similarity was negatively related to enjoyment, suggesting that identity play can be an independent source of enjoyment in computer games.


Sign in / Sign up

Export Citation Format

Share Document