Modeling and Mitigating Human Annotation Errors to Design Efficient Stream Processing Systems with Human-in-the-loop Machine Learning

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

A Review on Human–AI Interaction in Machine Learning and Insights for Medical Applications

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18042121 ◽

2021 ◽

Vol 18 (4) ◽

pp. 2121

Author(s):

Mansoureh Maadi ◽

Hadi Akbarzadeh Khorshidi ◽

Uwe Aickelin

Keyword(s):

Machine Learning ◽

Future Research ◽

Computational Power ◽

Medical Field ◽

Interactive Machine Learning ◽

Human In The Loop ◽

Human Interactions ◽

Scoping Literature Review ◽

Domain Expertise ◽

Expertise Level

Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.

Download Full-text

Machine learning powered ellipsometry

Light Science & Applications ◽

10.1038/s41377-021-00482-0 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Jinchao Liu ◽

Di Zhang ◽

Dianqiang Yu ◽

Mengxin Ren ◽

Jingjun Xu

Keyword(s):

Machine Learning ◽

Optical Constants ◽

Optical Characterization ◽

Superior Performance ◽

Trial And Error ◽

Powerful Method ◽

Human In The Loop ◽

Ill Posed ◽

Fully Automatic

AbstractEllipsometry is a powerful method for determining both the optical constants and thickness of thin films. For decades, solutions to ill-posed inverse ellipsometric problems require substantial human–expert intervention and have become essentially human-in-the-loop trial-and-error processes that are not only tedious and time-consuming but also limit the applicability of ellipsometry. Here, we demonstrate a machine learning based approach for solving ellipsometric problems in an unambiguous and fully automatic manner while showing superior performance. The proposed approach is experimentally validated by using a broad range of films covering categories of metals, semiconductors, and dielectrics. This method is compatible with existing ellipsometers and paves the way for realizing the automatic, rapid, high-throughput optical characterization of films.

Download Full-text

Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning

Journal of Information Processing ◽

10.2197/ipsjjip.27.404 ◽

2019 ◽

Vol 27 (0) ◽

pp. 404-410

Author(s):

Yu Suzuki

Keyword(s):

Machine Learning ◽

Streaming Data ◽

Filtering Method ◽

Human In The Loop

Download Full-text

Human-in-the-loop applied machine learning

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8257900 ◽

2017 ◽

Cited By ~ 1

Author(s):

Carla E. Brodley

Keyword(s):

Machine Learning ◽

Human In The Loop ◽

Applied Machine Learning

Download Full-text

Human-in-the-Loop Wireless Communications: Machine Learning and Brain-Aware Resource Management

IEEE Transactions on Communications ◽

10.1109/tcomm.2019.2930275 ◽

2019 ◽

Vol 67 (11) ◽

pp. 7727-7743

Author(s):

Ali Taleb Zadeh Kasgari ◽

Walid Saad ◽

Merouane Debbah

Keyword(s):

Machine Learning ◽

Resource Management ◽

Wireless Communications ◽

Human In The Loop

Download Full-text

Forest Cover Types Classification Based on Online Machine Learning on Distributed Cloud Computing Platforms of Storm and SAMOA

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.955-959.3803 ◽

2014 ◽

Vol 955-959 ◽

pp. 3803-3812

Author(s):

Guang Di Li ◽

Guo Yin Wang ◽

Xue Rui Zhang ◽

Wei Hui Deng ◽

Fan Zhang

Keyword(s):

Machine Learning ◽

Forest Cover ◽

Stream Processing ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Learning Tasks ◽

Hoeffding Tree ◽

Computing Platforms ◽

Processing Platform ◽

Forest Cover Types

Storm is the most popular realtime stream processing platform, which can be used to deal with online machine learning. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. SAMOA includes distributed algorithms for the most common machine learning tasks like Mahout for Hadoop. SAMOA is both a platform and a library. In this paper, Forest cover types, a large benchmaking dataset available at the UCI KDD Archive is used as the data stream source. Vertical Hoeffding Tree, a parallelizing streaming decision tree induction for distributed enviroment, which is incorporated in SAMOA API is applied on Storm platform. This study compared stream prcessing technique for predicting forest cover types from cartographic variables with traditional classic machine learning algorithms applied on this dataset. The test then train method used in this system is totally different from the traditional train then test. The results of the stream processing technique indicated that it’s output is aymptotically nearly identical to that of a conventional learner, but the model derived from this system is totally scalable, real-time, capable of dealing with evolving streams and insensitive to stream ordering.

Download Full-text

On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

Electronics ◽

10.3390/electronics9050761 ◽

2020 ◽

Vol 9 (5) ◽

pp. 761

Author(s):

Franc Drobnič ◽

Andrej Kos ◽

Matevž Pustišek

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forests ◽

Experimental Approach ◽

Feature Selection Method ◽

Model Quality ◽

Human In The Loop ◽

Model Interpretation ◽

Small Feature ◽

Original Feature

In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of “black box” character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model’s feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.

Download Full-text

Interactive machine learning for health informatics: when do we need the human-in-the-loop?

Brain Informatics ◽

10.1007/s40708-016-0042-6 ◽

2016 ◽

Vol 3 (2) ◽

pp. 119-131 ◽

Cited By ~ 253

Author(s):

Andreas Holzinger

Keyword(s):

Machine Learning ◽

Health Informatics ◽

Interactive Machine Learning ◽

Human In The Loop

Download Full-text

Analytical Modeling of Human Choice Complexity in a Mixed Model Assembly Line Using Machine Learning-Based Human in the Loop Simulation

IEEE Access ◽

10.1109/access.2017.2706739 ◽

2017 ◽

Vol 5 ◽

pp. 10434-10444 ◽

Cited By ~ 3

Author(s):

Moise Busogi ◽

Namhun Kim

Keyword(s):

Machine Learning ◽

Analytical Modeling ◽

Assembly Line ◽

Mixed Model ◽

Human In The Loop ◽

Human Choice ◽

Choice Complexity ◽

Mixed Model Assembly Line ◽

Mixed Model Assembly

Download Full-text