Another Look at air Traffic Controller Performance Evaluation

Author(s):  
Earl S. Stein ◽  
Randy L. Sollenberger

This paper describes a study that evaluated the reliability of a recently developed rating form designed to assess air traffic controller performance. Six supervisors from different radar approach control facilities nationwide viewed 20 video tapes of controllers working traffic from a previously recorded simulation study. The observer/raters used a new evaluation form that consisted of 24 different rating scales measuring specific areas of controller performance. An important part of this study was observer training. The training consisted of practice rating sessions followed by group discussions. In discussion, observers established mutual evaluation criteria for each performance area. Inter-rater reliability was assessed using intraclass correlations, and intra-rater reliability was assessed using Pearson product-moment correlations on repeated video tapes. In general, the reliability of the form was quite good, however, a few rating scales were much less reliable than the others. Reasons for the differences in rating scale reliability are discussed.

2018 ◽  
Vol 35 (3) ◽  
pp. 403-426 ◽  
Author(s):  
Hyejeong Kim

This paper aims to identify what aviation experts consider to be the key features of effective communication by examining in detail their commentary on a 17-minute segment of recorded radiotelephony discourse between a Russian pilot and a Korean air traffic controller. The segment was played to three practising pilots and three air traffic controllers. Their commentary on the qualities of communication displayed in the interaction was recorded and coded thematically, using a grounded ethnography approach. The analysis revealed that although the Russian pilot was viewed as having limited English proficiency, the strategies he used to make himself understood were evaluated positively as fulfilling the requirements of the professional role. By contrast, the Korean air traffic controller, although not evaluated as having limited proficiency, was criticized for his lack of professional knowledge. The discourse analysis and the feedback given by these expert informants highlight not only the nature of the miscommunication arising in unexpected situations, but also the multiple factors that may contribute to it. While language proficiency is clearly an issue, there are many other sources of miscommunication that emerge during the exchange. These findings are used to critique the narrow, language-focused oral proficiency construct as articulated in the holistic descriptors and the rating scale stipulated by the International Civil Aviation Organization (ICAO, 2010) as the basis for tests of aviation English worldwide. Instead the paper proposes an expanded construct of oral communication incorporating elements of professional knowledge and behaviour with a focus on interactional competence specific to this context.


2015 ◽  
Vol 3 ◽  
pp. 2998-3004 ◽  
Author(s):  
Jillian Keeler ◽  
Henri Battiste ◽  
Elyse C. Hallett ◽  
Zach Roberts ◽  
Alice Winter ◽  
...  

2019 ◽  
Vol 5 (1) ◽  
pp. e000541 ◽  
Author(s):  
John Ressman ◽  
Wilhelmus Johannes Andreas Grooten ◽  
Eva Rasmussen Barr

Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people.ObjectivesTo conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests.DesignReview with meta-analysis.Data sourcesCINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018.Eligibility criteriaStudies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality.ResultsThirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments.ConclusionOur findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution.PROSPERO registration numberCRD42018077822.


2000 ◽  
Author(s):  
Carol A. Manning ◽  
Scott H. Mills ◽  
Henry J. Mogilka ◽  
Jerry W. Hedge ◽  
Kenneth W. Bruskiewicz ◽  
...  

1974 ◽  
Vol 125 (586) ◽  
pp. 248-255 ◽  
Author(s):  
John N. Hall

Psychiatrists, psychologists, and nursing staff are increasingly making direct observations and ratings of ward behaviour. Characteristically, a nurse may be asked to complete a multi-item rating scale on a group of patients during the course of a drug trial. Several factors are involved in the choice of an appropriate scale for a particular purpose. Among these factors are the number of points per item, which defines the sensitivity to change of the item, and the total number of items in the scale, which affects the time taken to complete the scale and hence the frequency of rating that can be permitted in an assessment schedule.


1976 ◽  
Vol 129 (5) ◽  
pp. 452-456 ◽  
Author(s):  
Domenic V. Cicchetti

SummaryThis paper extends the recent work of Hall (1974) by presenting the minimal sample sizes and the specific linear agreement weights required for assessing the reliability of rating scales commonly used in neuropsychiatric and other clinico-medicai settings. The weights are shown to vary as a function of (a) whether or not the rating scale contains a point of ‘absence’, and (b) the number of ordinal points on the scale.


Sign in / Sign up

Export Citation Format

Share Document