scholarly journals Not All Information Is Equal: Effects of Disclosing Different Types of Likelihood Information on Trust, Compliance and Reliance, and Task Performance in Human-Automation Teaming

Author(s):  
Na Du ◽  
Kevin Y. Huang ◽  
X. Jessie Yang

Objective The study examines the effects of disclosing different types of likelihood information on human operators’ trust in automation, their compliance and reliance behaviors, and the human-automation team performance. Background To facilitate appropriate trust in and dependence on automation, explicitly conveying the likelihood of automation success has been proposed as one solution. Empirical studies have been conducted to investigate the potential benefits of disclosing likelihood information in the form of automation reliability, (un)certainty, and confidence. Yet, results from these studies are rather mixed. Method We conducted a human-in-the-loop experiment with 60 participants using a simulated surveillance task. Each participant performed a compensatory tracking task and a threat detection task with the help of an imperfect automated threat detector. Three types of likelihood information were presented: overall likelihood information, predictive values, and hit and correct rejection rates. Participants’ trust in automation, compliance and reliance behaviors, and task performance were measured. Results Human operators informed of the predictive values or the overall likelihood value, rather than the hit and correct rejection rates, relied on the decision aid more appropriately and obtained higher task scores. Conclusion Not all likelihood information is equal in aiding human-automation team performance. Directly presenting the hit and correct rejection rates of an automated decision aid should be avoided. Application The findings can be applied to the design of automated decision aids.

Author(s):  
Na Du ◽  
Qiaoning Zhang ◽  
X. Jessie Yang

The use of automated decision aids could reduce human exposure to dangers and enable human workers to perform more challenging tasks. However, automation is problematic when people fail to trust and depend on it appropriately. Existing studies have shown that system design that provides users with likelihood information including automation certainty, reliability, and confidence could facilitate trust- reliability calibration, the correspondence between a person’s trust in the automation and the automation’s capabilities (Lee & Moray, 1994), and improve human–automation task performance (Beller et al., 2013; Wang, Jamieson, & Hollands, 2009; McGuirl & Sarter, 2006). While revealing reliability information has been proposed as a design solution, the concrete effects of such information disclosure still vary (Wang et al., 2009; Fletcher et al., 2017; Walliser et al., 2016). Clear guidelines that would allow display designers to choose the most effective reliability information to facilitate human decision performance and trust calibration do not appear to exist. The present study, therefore, aimed to reconcile existing literature by investigating if and how different methods of calculating reliability information affect their effectiveness at different automation reliability. A human subject experiment was conducted with 60 participants. Each participant performed a compensatory tracking task and a threat detection task simultaneously with the help of an imperfect automated threat detector. The experiment adopted a 2×4 mixed design with two independent variables: automation reliability (68% vs. 90%) as a within- subject factor and reliability information as a between-subjects factor. Reliability information of the automated threat detector was calculated using different methods based on the signal detection theory and conditional probability formula of Bayes’ Theorem (H: hits; CR: correct rejections, FA: false alarms; M: misses): Overall reliability = P (H + CR | H + FA + M + CR). Positive predictive value = P (H | H + FA); negative predictive value = P (CR | CR + M). Hit rate = P (H | H + M), correct rejection rate = P (CR | CR + FA). There was also a control condition where participants were not informed of any reliability information but only told the alerts from the automated threat detector may or may not be correct. The dependent variables of interest were participants’ subjective trust in automation and objective measures of their display-switching behaviors. The results of this study showed that as the automated threat detector became more reliable, participants’ trust in and dependence on the threat detector increased significantly, and their detection performance improved. More importantly, there were significant differences in participants’ trust, dependence and dual-task performance when reliability information was calculated by different methods. Specifically, when overall reliability of the automated threat detector was 90%, revealing positive and negative predictive values of the automation significantly helped participants to calibrate their trust in and dependence on the detector, and led to the shortest reaction time for detection task. However, when overall reliability of the automated threat detector was 68%, positive and negative predictive values didn’t lead to significant difference in participants’ compliance on the detector. In addition, our result demonstrated that the disclosure of hit rate and correct rejection rate or overall reliability didn’t seem to aid human-automation team performance and trust-reliability calibration. An implication of the study is that users should be made aware of system reliability, especially of positive/negative predictive values, to engender appropriate trust in and dependence on the automation. This can be applied to the interface design of automated decision aids. Future studies should examine whether the positive and negative predictive values are still the most effective pieces of information for trust calibration when the criterion of the automated threat detector becomes liberal.


BMJ Open ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. e029412
Author(s):  
Magnus Hultin ◽  
Karin Jonsson ◽  
Maria Härgestam ◽  
Marie Lindkvist ◽  
Christine Brulin

ObjectivesThe assessment of situation awareness (SA), team performance and task performance in a simulation training session requires reliable and feasible measurement techniques. The objectives of this study were to test the Airways–Breathing–Circulation–Disability–Exposure (ABCDE) checklist and the Team Emergency Assessment Measure (TEAM) for inter-rater reliability, as well as the application of Situation Awareness Global Assessment Technique (SAGAT) for feasibility and internal consistency.DesignMethodological approach.SettingData collection during team training using full-scale simulation at a university clinical training centre. The video-recorded scenarios were rated independently by four raters.Participants55 medical students aged 22–40 years in their fourth year of medical studies, during the clerkship in anaesthesiology and critical care medicine, formed 23 different teams. All students answered the SAGAT questionnaires, and of these students, 24 answered the follow-up postsimulation questionnaire (PSQ). TEAM and ABCDE were scored by four professionals.MeasuresThe ABCDE and TEAM were tested for inter-rater reliability. The feasibility of SAGAT was tested using PSQ. SAGAT was tested for internal consistency both at an individual level (SAGAT) and a team level (Team Situation Awareness Global Assessment Technique (TSAGAT)).ResultsThe intraclass correlation was 0.54/0.83 (single/average measurements) for TEAM and 0.55/0.83 for ABCDE. According to the PSQ, the items in SAGAT were rated as relevant to the scenario by 96% of the participants. Cronbach’s alpha for SAGAT/TSAGAT for the two scenarios was 0.80/0.83 vs 0.62/0.76, and normed χ² was 1.72 vs 1.62.ConclusionTask performance, team performance and SA could be purposefully measured, and the reliability of the measurements was good.


Author(s):  
Vanessa K. Bowden ◽  
Natalie Griffiths ◽  
Luke Strickland ◽  
Shayne Loft

Objective Examine the impact of expected automation reliability on trust, workload, task disengagement, nonautomated task performance, and the detection of a single automation failure in simulated air traffic control. Background Prior research has focused on the impact of experienced automation reliability. However, many operational settings feature automation that is reliable to the extent that operators will seldom experience automation failures. Despite this, operators must remain aware of when automation is at greater risk of failing. Method Participants performed the task with or without conflict detection/resolution automation. Automation failed to detect/resolve one conflict (i.e., an automation miss). Expected reliability was manipulated via instructions such that the expected level of reliability was (a) constant or variable, and (b) the single automation failure occurred when expected reliability was high or low. Results Trust in automation increased with time on task prior to the automation failure. Trust was higher when expecting high relative to low reliability. Automation failure detection was improved when the failure occurred under low compared with high expected reliability. Subjective workload decreased with automation, but there was no improvement to nonautomated task performance. Automation increased perceived task disengagement. Conclusions Both automation reliability expectations and task experience played a role in determining trust. Automation failure detection was improved when the failure occurred at a time it was expected to be more likely. Participants did not effectively allocate any spared capacity to nonautomated tasks. Applications The outcomes are applicable because operators in field settings likely form contextual expectations regarding the reliability of automation.


Author(s):  
Karin Jonsson ◽  
Christine Brulin ◽  
Maria Härgestam ◽  
Marie Lindkvist ◽  
Magnus Hultin

Abstract Background When working in complex environments with critically ill patients, team performance is influenced by situation awareness in teams. Moreover, improved situation awareness in the teams will probably improve team and task performance. The aim of this study is to evaluate an educational programme on situation awareness for interprofessional teams at the intensive care units using team and task performance as outcomes. Method Twenty interprofessional teams from the northern part of Sweden participated in this randomized controlled intervention study conducted in situ in two intensive care units. The study was based on three cases (cases 0, 1 and 2) with patients in a critical situation. The intervention group (n = 11) participated in a two-hour educational programme in situation awareness, including theory, practice, and reflection, while the control group (n = 9) performed the training without education in situation awareness. The outcomes were team performance (TEAM instrument), task performance (ABCDE checklist) and situation awareness (Situation Awareness Global Assessment Technique (SAGAT)). Generalized estimating equation were used to analyse the changes from case 0 to case 2, and from case 1 to case 2. Results Education in situation awareness in the intervention group improved TEAM leadership (p = 0.003), TEAM task management (p = 0.018) and TEAM total (p = 0.030) when comparing cases 1 and 2; these significant improvements were not found in the control group. No significant differences were observed in the SAGAT or the ABCDE checklist. Conclusions This intervention study shows that a 2-h education in situation awareness improved parts of team performance in an acute care situation. Team leadership and task management improved in the intervention group, which may indicate that the one or several of the components in situation awareness (perception, comprehension and projection) were improved. However, in the present study this potential increase in situation awareness was not detected with SAGAT. Further research is needed to evaluate how educational programs can be used to increase situation awareness in interprofessional ICU teams and to establish which components that are essential in these programs. Trial registration This randomized controlled trial was not registered as it does not report the results of health outcomes after a health care intervention on human participants.


Author(s):  
Gregory McGowin ◽  
Zerong Xi ◽  
Olivia B. Newton ◽  
Gita Sukthankar ◽  
Stephen M. Fiore ◽  
...  

As the complexity of aircraft cockpit operations increases, training effectiveness must be improved, and learning accelerated. Virtual reality (VR) training is increasingly offered as a method for improving training efficacy given its ability to provide a rich sensory experience during learning. This paper describes a study examining how training efficacy can be improved by improving learning diagnostics. We study how varying forms of knowledge assessment are related to different types of task knowledge and task performance in a VR flight simulator. The data suggest that participants who demonstrated higher training comprehension, measured via diagnostic test questions, on conceptual (and to a lesser effect) declarative knowledge, also demonstrated superior knowledge transfer in the VR flight simulator. Findings are discussed in the context of improving cognitively diagnostic assessments that are better able to predict task performance and inform individually tailored training remediation.


Author(s):  
X. Jessie Yang ◽  
Christopher D. Wickens ◽  
Katja Hölttä-Otto

The present study examined how users adjusted their trust towards an automated decision aid. Results revealed that a valid recommendation of the decision aid increases whereas an invalid one reduces trust in automation. The magnitude of trust decrement is greater than that of trust increment. More importantly, this study showed that trust adjustment is not benchmarked strictly against predetermined objective criteria, that is, the decision aid’s recommendation quality. Rather, users’ ability of performing a task themselves and final task outcomes moderate the effects of recommendation quality. A valid recommendation is less appreciated if users are more capable of completing a task by themselves. An invalid recommendation is less penalized if the final task performance is not harmed, as if the invalid recommendation is “forgiven” to a certain degree.


2013 ◽  
Vol 28 (1) ◽  
pp. 19-42 ◽  
Author(s):  
Grant M. Beck ◽  
Rina Limor ◽  
Vairam Arunachalam ◽  
Patrick R. Wheeler

ABSTRACT Building on prior accounting research (Luft and Shields 2001; Dearman and Shields 2005), this study examines the effects of observable decision aid bias on decision aid agreement and task performance accuracy. Using a behavioral experiment, this study manipulates decision aid bias to assess the impact of a change in the level of decision aid bias on the degree to which decision makers' decisions agree with decision aid suggestions (i.e., decision aid agreement) and to which they learn to effectively adjust their decisions (i.e., task performance accuracy). Results indicate that learning subsequent to an observable change in decision aid bias is diminished, consistent with fixation on the previous aid's bias. JEL Classifications: D8; D83; M4


1987 ◽  
Author(s):  
Richard W. Foltin ◽  
Richard M. Capriotti ◽  
Margaret A. McEntee ◽  
Marian W. Fischman
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document