Improving high-impact bug report prediction with combination of interactive machine learning and active learning

Abstract The advances in Internet of things lead to an increased number of devices generating and streaming data. These devices can be useful data sources for activity recognition by using machine learning. However, the set of available sensors may vary over time, e.g. due to mobility of the sensors and technical failures. Since the machine learning model uses the data streams from the sensors as input, it must be able to handle a varying number of input variables, i.e. that the feature space might change over time. Moreover, the labelled data necessary for the training is often costly to acquire. In active learning, the model is given a budget for requesting labels from an oracle, and aims to maximize accuracy by careful selection of what data instances to label. It is generally assumed that the role of the oracle only is to respond to queries and that it will always do so. In many real-world scenarios however, the oracle is a human user and the assumptions are simplifications that might not give a proper depiction of the setting. In this work we investigate different interactive machine learning strategies, out of which active learning is one, which explore the effects of an oracle that can be more proactive and factors that might influence a user to provide or withhold labels. We implement five interactive machine learning strategies as well as hybrid versions of them and evaluate them on two datasets. The results show that a more proactive user can improve the performance, especially when the user is influenced by the accuracy of earlier predictions. The experiments also highlight challenges related to evaluating performance when the set of classes is changing over time.

Interactive Machine Learning for Data Exfiltration Detection: Active Learning with Human Expertise

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9282831 ◽

2020 ◽

Author(s):

Mu-Huan Chung ◽

Mark Chignell ◽

Lu Wang ◽

Alexandra Jovicic ◽

Abhay Raman

Keyword(s):

Machine Learning ◽

Active Learning ◽

Interactive Machine Learning

Dealing with Mislabeling via Interactive Machine Learning

KI - Künstliche Intelligenz ◽

10.1007/s13218-020-00630-5 ◽

2020 ◽

Vol 34 (2) ◽

pp. 271-278

Author(s):

Wanyi Zhang ◽

Andrea Passerini ◽

Fausto Giunchiglia

Keyword(s):

Machine Learning ◽

Interactive Machine Learning

How to measure uncertainty in uncertainty sampling for active learning

Machine Learning ◽

10.1007/s10994-021-06003-9 ◽

2021 ◽

Author(s):

Vu-Linh Nguyen ◽

Mohammad Hossein Shaker ◽

Eyke Hüllermeier

Keyword(s):

Machine Learning ◽

Active Learning ◽

Sampling Strategies ◽

Total Uncertainty ◽

Uncertainty Sampling ◽

Different Types ◽

Alternative Approaches ◽

Active Learner ◽

Probabilistic Nature ◽

Different Sources

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

Machine Learning Use for Prognostic Purposes in Multiple Sclerosis

Life ◽

10.3390/life11020122 ◽

2021 ◽

Vol 11 (2) ◽

pp. 122

Author(s):

Ruggiero Seccia ◽

Silvia Romano ◽

Marco Salvetti ◽

Andrea Crisanti ◽

Laura Palagi ◽

...

Keyword(s):

Machine Learning ◽

Multiple Sclerosis ◽

High Impact ◽

Prognostic Models ◽

Early Prediction ◽

Disease Course ◽

Slow Progression ◽

Relapsing Remitting ◽

Long Time ◽

Effective Drugs

The course of multiple sclerosis begins with a relapsing-remitting phase, which evolves into a secondarily progressive form over an extremely variable period, depending on many factors, each with a subtle influence. To date, no prognostic factors or risk score have been validated to predict disease course in single individuals. This is increasingly frustrating, since several treatments can prevent relapses and slow progression, even for a long time, although the possible adverse effects are relevant, in particular for the more effective drugs. An early prediction of disease course would allow differentiation of the treatment based on the expected aggressiveness of the disease, reserving high-impact therapies for patients at greater risk. To increase prognostic capacity, approaches based on machine learning (ML) algorithms are being attempted, given the failure of other approaches. Here we review recent studies that have used clinical data, alone or with other types of data, to derive prognostic models. Several algorithms that have been used and compared are described. Although no study has proposed a clinically usable model, knowledge is building up and in the future strong tools are likely to emerge.

A Review on Human–AI Interaction in Machine Learning and Insights for Medical Applications

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18042121 ◽

2021 ◽

Vol 18 (4) ◽

pp. 2121

Author(s):

Mansoureh Maadi ◽

Hadi Akbarzadeh Khorshidi ◽

Uwe Aickelin

Keyword(s):

Machine Learning ◽

Future Research ◽

Computational Power ◽

Medical Field ◽

Interactive Machine Learning ◽

Human In The Loop ◽

Human Interactions ◽

Scoping Literature Review ◽

Domain Expertise ◽

Expertise Level

Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.

Interactive Machine Learning for Embodied Interaction Design: A tool and methodology

Proceedings of the Fifteenth International Conference on Tangible, Embedded, and Embodied Interaction ◽

10.1145/3430524.3442703 ◽

2021 ◽

Author(s):

Nicola Plant ◽

Clarice Hilton ◽

Marco Gillies ◽

Rebecca Fiebrink ◽

Phoenix Perry ◽

...

Keyword(s):

Machine Learning ◽

Interaction Design ◽

Embodied Interaction ◽

Interactive Machine Learning

A transferable active-learning strategy for reactive molecular force fields

Chemical Science ◽

10.1039/d1sc01825f ◽

2021 ◽

Author(s):

Tom Young ◽

Tristan Johnston-Wood ◽

Volker L. Deringer ◽

Fernanda Duarte

Keyword(s):

Machine Learning ◽

Active Learning ◽

Learning Strategy ◽

Force Fields ◽

Molecular Simulations ◽

Quantum Mechanical ◽

Interatomic Potentials ◽

Molecular Force ◽

High Level ◽

Active Learning Strategy

Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but...

Deploying an interactive machine learning system in an evidence-based practice center

Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI '12 ◽

10.1145/2110363.2110464 ◽

2012 ◽

Cited By ~ 122

Author(s):

Byron C. Wallace ◽

Kevin Small ◽

Carla E. Brodley ◽

Joseph Lau ◽

Thomas A. Trikalinos

Keyword(s):

Machine Learning ◽

Evidence Based Practice ◽

Learning System ◽

Evidence Based ◽

Interactive Machine Learning

An Automated Machine Learning-Genetic Algorithm Framework With Active Learning for Design Optimization

Journal of Energy Resources Technology ◽

10.1115/1.4050489 ◽

2021 ◽

Vol 143 (8) ◽

Author(s):

Opeoluwa Owoyele ◽

Pinaki Pal ◽

Alvaro Vidal Torreira

Keyword(s):

Machine Learning ◽

Active Learning ◽

Design Optimization ◽

Fuel Injection ◽

A Priori ◽

Computational Cost ◽

Learning Approach ◽

Data Generation ◽

Compression Ignition Engine ◽

Hyperparameter Selection

AbstractThe use of machine learning (ML)-based surrogate models is a promising technique to significantly accelerate simulation-driven design optimization of internal combustion (IC) engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, training the ML models requires hyperparameter selection, which is often done using trial-and-error and domain expertise. Another challenge is that the data required to train these models are often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a super learner model. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization while also reducing the number of simulations needed for performing surrogate-based design optimization.