eXplainable Cooperative Machine Learning with NOVA

Tobias Baur; Alexander Heimerl; Florian Lingenfelser; Johannes Wagner; Michel F. Valstar; Björn Schuller; Elisabeth André

doi:10.1007/s13218-020-00632-3

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

OSNR Prediction Based on Federal Learning in Multi-Domain Optical Networks

10.3233/faia210454 ◽

2021 ◽

Author(s):

Junhua Huang ◽

Bohan Zhu ◽

Hongxi Zhou ◽

Qiwei Zheng ◽

Zhuo Chen ◽

...

Keyword(s):

Machine Learning ◽

Optical Networks ◽

Low Cost ◽

Optical Network ◽

Main Idea ◽

Learning Model ◽

Important Indicator ◽

Machine Learning Model ◽

Network Operation ◽

Model Training

With the continuous expansion of the scale of optical communication network and the rapid increase of network traffic demand, the management form of multi-domain optical network has widely existed. OSNR is an important indicator to judge the quality of communication. It is very important to predict OSNR more accurately in a low-cost and energy-saving way in multi-domain optical networks. In this paper, a scheme of federal learning in multi-domain optical networks is proposed to improve the accuracy of the OSNR prediction. The main idea is to train hybrid machine learning model in each single domain, then the strategy of federal learning is used for optimization it in multi-domains. The performance of the proposed scheme is verified by simulation experiments. The strategy can alleviate the problems of data silos and model training set caused by multi-domain optical network. According to simulation result, when the amount of data reaches 5×103, adding this strategy will reduce the mean square error of the prediction model by about 18%. It can improve the performance of machine learning model, the ability of OSNR prediction and the reliability of network operation.

Download Full-text

Improving Medical Data Annotation Including Humans in the Machine Learning Loop

Engineering Proceedings ◽

10.3390/engproc2021007039 ◽

2021 ◽

Vol 7 (1) ◽

pp. 39

Author(s):

José Bobes-Bascarán ◽

Eduardo Mosqueira-Rey ◽

David Alonso-Ríos

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Health Service ◽

Health Informatics ◽

Great Majority ◽

Medical Data ◽

Data Annotation ◽

Human In The Loop

At present, the great majority of Artificial Intelligence (AI) systems require the participation of humans in their development, tuning, and maintenance. Particularly, Machine Learning (ML) systems could greatly benefit from their expertise or knowledge. Thus, there is an increasing interest around how humans interact with those systems to obtain the best performance for both the AI system and the humans involved. Several approaches have been studied and proposed in the literature that can be gathered under the umbrella term of Human-in-the-Loop Machine Learning. The application of those techniques to the health informatics environment could provide a great value on prognosis and diagnosis tasks contributing to develop a better health service for Cancer related diseases.

Download Full-text

Deep Learning in Virtual Screening: Recent Applications and Developments

International Journal of Molecular Sciences ◽

10.3390/ijms22094435 ◽

2021 ◽

Vol 22 (9) ◽

pp. 4435

Author(s):

Talia B. Kimber ◽

Yonghui Chen ◽

Andrea Volkamer

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Active Compound ◽

Data Sets ◽

Novel Technologies ◽

Speed Up ◽

Model Training ◽

New Compounds

Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.

Download Full-text

ActiveCrowds: A Human-in-the-Loop Machine Learning Framework

10.3233/faia210090 ◽

2021 ◽

Author(s):

Lazaros Toumanidis ◽

Panagiotis Kasnesis ◽

Christos Chatzigeorgiou ◽

Michail Feidakis ◽

Charalampos Patrikakis

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Majority Voting ◽

Practical Implementation ◽

Data Sampling ◽

Data Annotation ◽

Human In The Loop ◽

Learning Framework ◽

Cross Platform ◽

Continuous Use

A widespread practice in machine learning solutions is the continuous use of human intelligence to increase their quality and efficiency. A common problem in such solutions is the requirement of a large amount of labeled data. In this paper, we present a practical implementation of the human-in-the-loop computing practice, which includes the combination of active and transfer learning for sophisticated data sampling and weight initialization respectively, and a cross-platform mobile application for crowdsourcing data annotation tasks. We study the use of the proposed framework to a post-event building reconnaissance scenario, where we utilized the implementation of an existing pre-trained computer vision model, an image binary classification solution built on top of it, and max entropy and random sampling as uncertainty sampling methods for the active learning step. Multiple annotations with majority voting as quality assurance are required for new human-annotated images to be added on the train set and retrain the model. We provide the results and discuss our next steps.

Download Full-text

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.26065 ◽

2020 ◽

Author(s):

Aleksandra Nabożny ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Mikołaj Morzy

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Using machine learning techniques to reduce data annotation time

PsycEXTRA Dataset ◽

10.1037/e577762012-020 ◽

2006 ◽

Author(s):

Christopher Schreiner ◽

Kari Torkkola ◽

Mike Gardner ◽

Keshu Zhang

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Data Annotation ◽

Learning Techniques

Download Full-text

A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods

Current Drug Targets ◽

10.2174/1389450119666181002143355 ◽

2019 ◽

Vol 20 (5) ◽

pp. 540-550 ◽

Cited By ~ 11

Author(s):

Jiu-Xin Tan ◽

Hao Lv ◽

Fang Wang ◽

Fu-Ying Dao ◽

Wei Chen ◽

...

Keyword(s):

Machine Learning ◽

Catalytic Mechanism ◽

Biological Function ◽

Learning Methods ◽

Biochemical Processes ◽

Machine Learning Methods ◽

Enzyme Family ◽

The Family ◽

Speed Up ◽

Family Classification

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.

Download Full-text

A Review on Human–AI Interaction in Machine Learning and Insights for Medical Applications

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18042121 ◽

2021 ◽

Vol 18 (4) ◽

pp. 2121

Author(s):

Mansoureh Maadi ◽

Hadi Akbarzadeh Khorshidi ◽

Uwe Aickelin

Keyword(s):

Machine Learning ◽

Future Research ◽

Computational Power ◽

Medical Field ◽

Interactive Machine Learning ◽

Human In The Loop ◽

Human Interactions ◽

Scoping Literature Review ◽

Domain Expertise ◽

Expertise Level

Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.

Download Full-text

Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning

Future Internet ◽

10.3390/fi13040094 ◽

2021 ◽

Vol 13 (4) ◽

pp. 94

Author(s):

Haokun Fang ◽

Quan Qian

Keyword(s):

Machine Learning ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Great Success ◽

Learning Framework ◽

Computational Overhead ◽

Important Concern ◽

Speed Up ◽

Key Length ◽

Core Idea

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.

Download Full-text