scholarly journals Supporting End-User Understanding of Classification Errors: Visualization and Usability Issues

2019 ◽  
Vol 7 ◽  
pp. 29
Author(s):  
Emma M.A.L. Beauxis-Aussalet ◽  
Joost Van Doorn ◽  
Lynda Hardman

Classifiers are applied in many domains where classification errors have significant implications. However, end-users may not always understand the errors and their impact, as error visualizations are typically designed for experts and for improving classifiers. We discuss the specific needs of classifiers' end-users, and a simplified visualization designed to address them. We evaluate this design with users from three levels of expertise, and compare it with ROC curves and confusion matrices. We identify key difficulties with understanding the classification errors, and how visualizations addressed or aggravated them. The main issues concerned confusions of the actual and predicted classes (e.g., confusion of False Positives and False Negatives). The machine learning terminology, complexity of ROC curves, and symmetry of confusion matrices aggravated the confusions. The end-user-oriented visualization reduced the difficulties by using several visual features to clarify the actual and predicted classes, and more tangible metrics and representation. Our results contribute to supporting end-users' understanding of classification errors, and informed decisions when choosing or tuning classifiers.

2021 ◽  
Vol 26 (1) ◽  
pp. 22-30
Author(s):  
Oksana Ņikiforova ◽  
Vitaly Zabiniako ◽  
Jurijs Kornienko ◽  
Madara Gasparoviča-Asīte ◽  
Amanda Siliņa

Abstract Improving IS (Information System) end-user experience is one of the most important tasks in the analysis of end-users behaviour, evaluation and identification of its improvement potential. However, the application of Machine Learning methods for the UX (User Experience) usability and effic iency improvement is not widely researched. In the context of the usability analysis, the information about behaviour of end-users could be used as an input, while in the output data the focus should be made on non-trivial or difficult attention-grabbing events and scenarios. The goal of this paper is to identify which data potentially can serve as an input for Machine Learning methods (and accordingly graph theory, transformation methods, etc.), to define dependency between these data and desired output, which can help to apply Machine Learning / graph algorithms to user activity records.


2017 ◽  
Author(s):  
Jianfeng Yang ◽  
Xiaofan Ding ◽  
Weidong Zhu

AbstractWith the advance of next-generation sequencing technologies, non-invasive prenatal testing (NIPT) has been developed and employed in fetal aneuploidy screening on 13-/18-/21-trisomies through detecting cell-free fetal DNA (cffDNA) in maternal blood. Although Z test is widely used in NIPT nowadays, there is still necessity to improve its accuracy for removing a) false negatives and false positives, and b) the ratio of unclassified data, so as to reduce the potential harm to patients caused by these inaccuracies as well as the induced cost of retests.Employing multiple Z tests with machine-learning algorithm could provide a better prediction on NIPT data. Combining the multiple Z values with indexes of clinical signs and quality control, features were collected from the known samples and scaled for model training in support vector machine (SVM) discrimination. The trained model was applied to predict the unknown samples, which showed significant improvement. In 4752 qualified NIPT data, our method reached 100% accuracies on all three chromosomes, including 151 data that were grouped as unclassified by one-Z-value based method. Moreover, four false positives and four false negatives were corrected by using this machine-learning model.To our knowledge, this is the first study to employ support vector machine in NIPT data analysis. It is expected to replace the current one-Z-value based NIPT analysis in clinical use.


10.2196/19200 ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. e19200
Author(s):  
Nicole Lowres ◽  
Andrew Duckworth ◽  
Julie Redfern ◽  
Aravinda Thiagalingam ◽  
Clara K Chow

Background SMS text messaging programs are increasingly being used for secondary prevention, and have been shown to be effective in a number of health conditions including cardiovascular disease. SMS text messaging programs have the potential to increase the reach of an intervention, at a reduced cost, to larger numbers of people who may not access traditional programs. However, patients regularly reply to the SMS text messages, leading to additional staffing requirements to monitor and moderate the patients’ SMS text messaging replies. This additional staff requirement directly impacts the cost-effectiveness and scalability of SMS text messaging interventions. Objective This study aimed to test the feasibility and accuracy of developing a machine learning (ML) program to triage SMS text messaging replies (ie, identify which SMS text messaging replies require a health professional review). Methods SMS text messaging replies received from 2 clinical trials were manually coded (1) into “Is staff review required?” (binary response of yes/no); and then (2) into 12 general categories. Five ML models (Naïve Bayes, OneVsRest, Random Forest Decision Trees, Gradient Boosted Trees, and Multilayer Perceptron) and an ensemble model were tested. For each model run, data were randomly allocated into training set (2183/3118, 70.01%) and test set (935/3118, 29.98%). Accuracy for the yes/no classification was calculated using area under the receiver operating characteristics curve (AUC), false positives, and false negatives. Accuracy for classification into 12 categories was compared using multiclass classification evaluators. Results A manual review of 3118 SMS text messaging replies showed that 22.00% (686/3118) required staff review. For determining need for staff review, the Multilayer Perceptron model had highest accuracy (AUC 0.86; 4.85% false negatives; and 4.63% false positives); with addition of heuristics (specified keywords) fewer false negatives were identified (3.19%), with small increase in false positives (7.66%) and AUC 0.79. Application of this model would result in 26.7% of SMS text messaging replies requiring review (true + false positives). The ensemble model produced the lowest false negatives (1.43%) at the expense of higher false positives (16.19%). OneVsRest was the most accurate (72.3%) for the 12-category classification. Conclusions The ML program has high sensitivity for identifying the SMS text messaging replies requiring staff input; however, future research is required to validate the models against larger data sets. Incorporation of an ML program to review SMS text messaging replies could significantly reduce staff workload, as staff would not have to review all incoming SMS text messages. This could lead to substantial improvements in cost-effectiveness, scalability, and capacity of SMS text messaging–based interventions.


2020 ◽  
Author(s):  
Nicole Lowres ◽  
Andrew Duckworth ◽  
Julie Redfern ◽  
Aravinda Thiagalingam ◽  
Clara K Chow

BACKGROUND SMS text messaging programs are increasingly being used for secondary prevention, and have been shown to be effective in a number of health conditions including cardiovascular disease. SMS text messaging programs have the potential to increase the reach of an intervention, at a reduced cost, to larger numbers of people who may not access traditional programs. However, patients regularly reply to the SMS text messages, leading to additional staffing requirements to monitor and moderate the patients’ SMS text messaging replies. This additional staff requirement directly impacts the cost-effectiveness and scalability of SMS text messaging interventions. OBJECTIVE This study aimed to test the feasibility and accuracy of developing a machine learning (ML) program to triage SMS text messaging replies (ie, identify which SMS text messaging replies require a health professional review). METHODS SMS text messaging replies received from 2 clinical trials were manually coded (1) into “Is staff review required?” (binary response of yes/no); and then (2) into 12 general categories. Five ML models (Naïve Bayes, OneVsRest, Random Forest Decision Trees, Gradient Boosted Trees, and Multilayer Perceptron) and an ensemble model were tested. For each model run, data were randomly allocated into training set (2183/3118, 70.01%) and test set (935/3118, 29.98%). Accuracy for the yes/no classification was calculated using area under the receiver operating characteristics curve (AUC), false positives, and false negatives. Accuracy for classification into 12 categories was compared using multiclass classification evaluators. RESULTS A manual review of 3118 SMS text messaging replies showed that 22.00% (686/3118) required staff review. For determining need for staff review, the Multilayer Perceptron model had highest accuracy (AUC 0.86; 4.85% false negatives; and 4.63% false positives); with addition of heuristics (specified keywords) fewer false negatives were identified (3.19%), with small increase in false positives (7.66%) and AUC 0.79. Application of this model would result in 26.7% of SMS text messaging replies requiring review (true + false positives). The ensemble model produced the lowest false negatives (1.43%) at the expense of higher false positives (16.19%). OneVsRest was the most accurate (72.3%) for the 12-category classification. CONCLUSIONS The ML program has high sensitivity for identifying the SMS text messaging replies requiring staff input; however, future research is required to validate the models against larger data sets. Incorporation of an ML program to review SMS text messaging replies could significantly reduce staff workload, as staff would not have to review all incoming SMS text messages. This could lead to substantial improvements in cost-effectiveness, scalability, and capacity of SMS text messaging–based interventions.


2016 ◽  
Vol 2016 ◽  
pp. 1-17
Author(s):  
E. Earl Eiland ◽  
Lorie M. Liebrock

Many problem domains utilize discriminant analysis, for example, classification, prediction, and diagnoses, by applying artificial intelligence and machine learning. However, the results are rarely perfect and errors can cause significant losses. Hence, end users are best served when they have performance information relevant to their need. Starting with the most basic questions, this study considers eight summary statistics often seen in the literature and evaluates their end user efficacy. Results lead to proposed criteria necessary for end user efficacious summary statistics. Testing the same eight summary statistics shows that none satisfy all of the criteria. Hence, two criteria-compliant summary statistics are introduced. To show how end users can benefit, measure utility is demonstrated on two problems. A key finding of this study is that researchers can make their test outcomes more relevant to end users with minor changes in their analyses and presentation.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
N Lowres ◽  
A Duckworth ◽  
C K Chow ◽  
A Thiagalingam ◽  
J Redfern

Abstract Background Cardiovascular SMS text programs are effective alternate secondary prevention programs for cardiac risk factor reduction and can be delivered as one-way or two-way communication. However, people text back regularly, leading to staffing costs to monitor replies. If you could reduce the need for staff review by 60–70%, costs and scalability of text programs would substantially improve. Purpose To develop and assess accuracy of a machine-learning (ML) program to “triage” and identify texts requiring review/action. Methods We manually reviewed and classified all replies received from two “TEXT ME” cardiovascular secondary prevention programs. Simultaneously a ML model was developed to classify texts and determine those needing a reply (figure). Comparison of ML models included “Naïve Bayes”, “random forest decision trees”, and “gradient boosted trees”, along with comparison to “convolutional neural network” and “recurrent neural network” classification approaches. “Natural language programming” was evaluated however this presented challenges in relation to text content due to non-standard English grammar, frequent use of non-standard abbreviations, and spelling errors. The ML program was trained with 70% of the data-set and accuracy was tested with 30%. Results Manual review of 3118 text replies revealed that only one text was considered urgent, and only 21% required review/action: categorisation was not straight forward due to complexity of texts often containing more than one sentiment (table). The ML program was able to correctly classify 84% of texts into the designated 12 categories. The sensitivity for correctly identifing the need for health professional review was 94% (6.4% false negatives; 3.6% false positives); but with addition of “heuristics” (e.g. searching for specified keywords, question marks etc) sensitivity increased to 97% (2.9% false negatives; 7.3% false positives). Therefore, health professionals would only have to review 27% (true + false positives) of all text replies. Table 1. SMS manual categorisation (n=3118) REVIEW REQUIRED Health Question/concern Admin request Request to STOP Ceased smoking SMS not delivered Urgent/ distress (13%) (4.5%) (3%) (0.8%) (0.4%) (0.03%) NO REVIEW REQUIRED General statement Statement of thanks Reporting good health Blank message Unrelated/ accidental Emoticon only (33%) (23%) (11%) (6%) (4%) (2.4%) Figure 1. Development process Conclusions The ML program has high sensitivity identifying text replies requiring health professional input and a low false negative rate indicating few messages needing response would be missed. Thus, introduction of the program could significantly reduce the workload of health professionals, leading to substantial improvements in scalability and capacity of text-based programs. The future implications for this technology are vast, including utilisation in other interactive mHealth interfaces and cardiovascular health “apps”. Acknowledgement/Funding National Heart Foundation Vanguard Grant; National Health and Medical Research Council Project Grant


2018 ◽  
Vol 3 (2) ◽  
pp. 10
Author(s):  
Tafseer Ahmed

This study presents an application of machine learning to predict whether the first crescent of the lunar month will be visible to naked eye on a given date. The study presents a dataset of successful and unsuccessful attempts to find the first crescent at the start of the lunar month. Previously, this problem was solved by analytically deriving the equations for visibility parameter(s) and manually fixing threshold values. However, we applied supervised machine learning on the independent variables of the problem, and the system learnt about the criteria of classification. The system gives precision of 0.88 and recall of 0.87 and hence it treats both false positives and false negatives equally well.


2020 ◽  
Vol 2020 (14) ◽  
pp. 378-1-378-7
Author(s):  
Tyler Nuanes ◽  
Matt Elsey ◽  
Radek Grzeszczuk ◽  
John Paul Shen

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.


2020 ◽  
Author(s):  
Stuart Yeates

A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...


Sign in / Sign up

Export Citation Format

Share Document