scholarly journals Understanding Variance in Pilot Performance Ratings

2013 ◽  
Vol 3 (2) ◽  
pp. 53-62 ◽  
Author(s):  
Timothy J. Mavin ◽  
Wolff-Michael Roth ◽  
Sidney Dekker

Two studies were designed to investigate how pilots of different rank evaluate flight-deck performance. In each study, the pilots were asked to assess sets of three different videotaped scenarios featuring pilots in a simulator exhibiting poor, average, and good performance. Study 1, which included 92 airline pilots of differing rank, was aimed at comparing how individuals rate performance. The subjects used a standardized assessment form, which included six criteria, each having a 5-point rating scale. Analysis of the first study revealed that there was considerable variance in the performance ratings between flight examiners, captains, and first officers. The second study was designed to better understand the variance. Eighteen pilots (six flight examiners, six captains, and six first officers) working in pairs evaluated performances, in a modified think-aloud protocol. The results showed that there were good reasons for the observed variances. The results are discussed in relation to inter-rater reliability.

2014 ◽  
Vol 4 (2) ◽  
pp. 113-121 ◽  
Author(s):  
Stephanie Chow ◽  
Stephen Yortsos ◽  
Najmedin Meshkati

This article focuses on a major human factors–related issue that includes the undeniable role of cultural factors and cockpit automation and their serious impact on flight crew performance, communication, and aviation safety. The report concentrates on the flight crew performance of the Boeing 777–Asiana Airlines Flight 214 accident, by exploring issues concerning mode confusion and autothrottle systems. It also further reviews the vital role of cultural factors in aviation safety and provides a brief overview of past, related accidents. Automation progressions have been created in an attempt to design an error-free flight deck. However, to do that, the pilot must still thoroughly understand every component of the flight deck – most importantly, the automation. Otherwise, if pilots are not completely competent in terms of their automation, the slightest errors can lead to fatal accidents. As seen in the case of Asiana Flight 214, even though engineering designs and pilot training have greatly evolved over the years, there are many cultural, design, and communication factors that affect pilot performance. It is concluded that aviation systems designers, in cooperation with pilots and regulatory bodies, should lead the strategic effort of systematically addressing the serious issues of cockpit automation, human factors, and cultural issues, including their interactions, which will certainly lead to better solutions for safer flights.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 519-520
Author(s):  
Priyanka Shrestha ◽  
Erica Husser ◽  
Diane Berish ◽  
Long Ngo ◽  
Marie Boltz ◽  
...  

Abstract Delirium is a serious and potentially life-threatening problem, but it remains clinically under-recognized. Various factors contribute to this under-recognition, including limited understanding of delirium, insufficient training and application of delirium assessments, potential stigma for the patient and increased workload for the clinician. As a part of an NIH funded study testing a rapid two-step delirium identification protocol at two hospitals in the U.S. (one urban and one rural), clinicians completed a 12-item survey to assess their knowledge and attitudes about delirium and their confidence in preventing and managing delirium. Survey response options followed a 5-point rating scale (strongly disagree, disagree, undecided, agree, strongly agree). The sample for this analysis included 399 clinicians (MDs=53; RNs=235; CNAs=111). Chi-square was used to test for group differences between clinician types. Less than half of the clinicians reported agreeing with the statement, “delirium is largely preventable” (MDs: 47%; RN: 44%; CNA: 41%, p-value=0.021). MDs and RNs indicated a high level of confidence in recognizing delirium while CNAs endorsed lower levels of confidence (MDs: 87%; RN: 81%; CNA: 65%, p-value=0.001). All types of clinicians reported lower confidence in managing delirium (MDs: 29%; RN: 36%; CNA: 44%, p-value=0.117). 47% of CNAs and 37% of RNs agreed there is a need for additional training in caring for persons with delirium while only 21% of MDs agreed (p = 0.031). Understanding how different types of clinicians think and feel about delirium will inform training and communication initiatives, clinical implementation, and research on best practices for delirium identification and management.


2005 ◽  
Vol 48 (2) ◽  
pp. 323-335 ◽  
Author(s):  
Rahul Shrivastav ◽  
Christine M. Sapienza ◽  
Vuday Nandur

Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have been interpreted to show that listeners perceive voice quality in an idiosyncratic manner. Based on psychometric theory, the present research explored an alternative explanation for the poor interlistener agreement observed in previous research. This approach suggests that poor agreement between listeners may result, in part, from measurement errors related to a variety of factors rather than true differences in the perception of voice quality. In this study, 10 listeners rated breathiness for 27 vowel stimuli using a 5-point rating scale. Each stimulus was presented to the listeners 10 times in random order. Interlistener agreement and reliability were calculated from these ratings. Agreement and reliability were observed to improve when multiple ratings of each stimulus from each listener were averaged and when standardized scores were used instead of absolute ratings. The probability of exact agreement was found to be approximately .9 when using averaged ratings and standardized scores. In contrast, the probability of exact agreement was only .4 when a single rating from each listener was used to measure agreement. These findings support the hypothesis that poor agreement reported in past research partly arises from errors in measurement rather than individual differences in the perception of voice quality.


2021 ◽  
Vol 2 (01) ◽  
pp. 10-18
Author(s):  
Dhavindra Rawal

This study aims at examining consumers’ awareness level towards labeling information of product in marketing practices based on an empirical study of college students in Tikapur  Municipality, Kailali. This study depends on a purposive sample of 180 students whichhave been selected from management, education and humanities faculties studying in graduate level at Tikapur Multiple Campus and Birendra Vidhya Mandir Campus at Tikapur, with a structured questionnaire to measure consumer buying behavior regarding the basic labeling information of packaged products, utilizing a four-point rating scale for measurement. The overall findings communicate that the aggregate consumers’ awareness level is low towards labeling information of packaged product in marketing practices. Furthermore, awareness level of management students is high in comparison to nonmanagementstudent.Similarly,maleconsumersarefoundmoreawarethanfemale.Thisstudyexploresthestatusandlevelofconsumerawarenessforthefirsttimeinstudyareaalongwiththesuggestionstoconsumers,businessmen,consumerforum,governmentunitsandpublic policymakers to improve the current status of consumer awareness, with implications for better business strategies and more useful to  consumerism.


Author(s):  
Linye Jing ◽  
Maria I. Grigos

Purpose: Forming accurate and consistent speech judgments can be challenging when working with children with speech sound disorders who produce a large number and varied types of error patterns. Rating scales offer a systematic approach to assessing the whole word rather than individual sounds. Thus, these scales can be an efficient way for speech-language pathologists (SLPs) to monitor treatment progress. This study evaluated the interrater reliability of an existing 3-point rating scale using a large group of SLPs as raters. Method: Utilizing an online platform, 30 SLPs completed a brief training and then rated single words produced by children with typical speech patterns and children with speech sound disorders. Words were closely balanced across the three rating categories of the scale. The interrater reliability of the SLPs ratings to a consensus judgment was examined. Results: The majority of SLPs (87%) reached substantial interrater reliability to a consensus judgment using the 3-point rating scale. Correct productions had the highest interrater reliability. Productions with extensive errors had higher agreement than those with minor errors. Certain error types, such as vowel distortions, were especially challenging for SLPs to judge. Conclusions: This study demonstrated substantial interrater reliability to a consensus judgment among a large majority of 30 SLPs using a 3-point rating. The clinical implications of the findings are discussed along with proposed modifications to the training procedure to guide future research.


2021 ◽  
Vol 8 (3) ◽  
pp. 672-695
Author(s):  
Thomas DeVaney

This article presents a discussion and illustration of Mokken scale analysis (MSA), a nonparametric form of item response theory (IRT), in relation to common IRT models such as Rasch and Guttman scaling. The procedure can be used for dichotomous and ordinal polytomous data commonly used with questionnaires. The assumptions of MSA are discussed as well as characteristics that differentiate a Mokken scale from a Guttman scale. MSA is illustrated using the mokken package with R Studio and a data set that included over 3,340 responses to a modified version of the Statistical Anxiety Rating Scale. Issues addressed in the illustration include monotonicity, scalability, and invariant ordering. The R script for the illustration is included.


1973 ◽  
Vol 10 (3) ◽  
pp. 235-241 ◽  
Author(s):  
Frederic B. Kraft ◽  
Donald H. Granbois ◽  
John O. Summers

An analysis is presented showing the association between a summated brand evaluation index and brands purchased over time. The summated index was no more predictive than simpler measures such as “brand last purchased” and a 7-point rating scale, although the summated index may have value as a diagnostic tool.


Sign in / Sign up

Export Citation Format

Share Document